The document discusses Apache Flink, an open source stream processing framework. It provides an overview of Flink, including that its core is a distributed streaming dataflow engine written in Java and Scala. Code snippets are provided to demonstrate how to set up a Flink streaming job that reads from Kafka, connects to Zookeeper, performs transformations, and writes outputs. Use cases for Flink in domains like telecommunications, automotive, finance, and healthcare are also briefly mentioned.
We present a web service named FLOW to let users do FLink On Web. FLOW aims to minimize the effort of handwriting streaming applications similar in spirit to Hortonworks Stream Analytics Manager, StreamAnalytix, and Nussknacker by letting users drag and drop graphical icons representing streaming operators on GUI.
FLOW builds on Flink Table API and lets users assemble graphical icons associated with not only basic SQL operations but also advanced SQL operations like window aggregation, temporal join, and pattern recognition (MATCH_RECOGNIZE clause). Its data preview function enables to observe how sample data changes before and after applying each operation on screen. In addition, FLOW shows the sample data as time-series charts and geographical maps by interacting with Elasticsearch and Kibana. Therefore, domain experts with basic knowledge of SQL can design their streaming applications easily on GUI without understanding of Flink DataStream API and Flink CEP library.
In this talk, we first present what motivates the development of FLOW, then show how FLOW can be used to figure out the "Popular Places" exercise in its own style, and lastly explain how FLOW leverages Flink Table API.
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
Talk I gave together with Fabian Hueske at the Berlin Buzzwords 2016 conference.
The talk demonstrates how we can combine streaming analytics and complex event processing (CEP) on the same execution engine, namely Apache Flink. This combination allows to open up a new field of applications where we can easily combine aggregations with temporal pattern detection.
This document provides an overview of Apache Flink, an open-source framework for distributed stream and batch data processing. It discusses key aspects of Flink including that it executes everything as data streams, supports iterative and cyclic data flows, allows mutable state in operators, and provides high availability and checkpointing of operator state. It also provides examples of using Flink's DataStream API to perform operations like hourly and daily tweet impression counts on a continuous stream of tweet data from Kafka.
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed.
While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the ex- isting Qthreads backend of Chapel.
Flink Connector Development Tips & TricksEron Wright
A look at some of the challenges and techniques for developing a connector for Apache Flink, covering the different types of connectors, lifecycle, metrics, event-time support, and fault tolerance.
Presentation video: https://www.youtube.com/watch?v=ZkbYO5S4z18
Beginning with MapReduce and its first popular open-source implementation in Apache Hadoop the data processing landscape has evolved quite a bit. Since then we have seen several paradigm shifts and open-source systems evolved to support new types of applications and to attract new audiences. We will follow developments using the example of the open-source stream processing system Apache Flink and in the end we will see how expressive APIs, support for event-driven applications, Flink SQL for seamless batch and stream processing, and a powerful runtime enable a wide range of applications.
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward
Flink provides fault tolerance guarantees through checkpointing and recovery mechanisms. Checkpoints take consistent snapshots of distributed state and data, while barriers mark checkpoints in the data flow. This allows Flink to recover jobs from failures and resume processing from the last completed checkpoint. Flink also implements high availability by persisting metadata like the execution graph and checkpoints to Apache Zookeeper, enabling a standby JobManager to take over if the active one fails.
Jingwei Lu and Jason Zhang (Airbnb)
AirStream is a realtime stream computation framework built on top of Spark Streaming and HBase that allows our engineers and data scientists to easily leverage HBase to get real-time insights and build real-time feedback loops. In this talk, we will introduce AirStream, and then go over a few production use cases.
We present a web service named FLOW to let users do FLink On Web. FLOW aims to minimize the effort of handwriting streaming applications similar in spirit to Hortonworks Stream Analytics Manager, StreamAnalytix, and Nussknacker by letting users drag and drop graphical icons representing streaming operators on GUI.
FLOW builds on Flink Table API and lets users assemble graphical icons associated with not only basic SQL operations but also advanced SQL operations like window aggregation, temporal join, and pattern recognition (MATCH_RECOGNIZE clause). Its data preview function enables to observe how sample data changes before and after applying each operation on screen. In addition, FLOW shows the sample data as time-series charts and geographical maps by interacting with Elasticsearch and Kibana. Therefore, domain experts with basic knowledge of SQL can design their streaming applications easily on GUI without understanding of Flink DataStream API and Flink CEP library.
In this talk, we first present what motivates the development of FLOW, then show how FLOW can be used to figure out the "Popular Places" exercise in its own style, and lastly explain how FLOW leverages Flink Table API.
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
Talk I gave together with Fabian Hueske at the Berlin Buzzwords 2016 conference.
The talk demonstrates how we can combine streaming analytics and complex event processing (CEP) on the same execution engine, namely Apache Flink. This combination allows to open up a new field of applications where we can easily combine aggregations with temporal pattern detection.
This document provides an overview of Apache Flink, an open-source framework for distributed stream and batch data processing. It discusses key aspects of Flink including that it executes everything as data streams, supports iterative and cyclic data flows, allows mutable state in operators, and provides high availability and checkpointing of operator state. It also provides examples of using Flink's DataStream API to perform operations like hourly and daily tweet impression counts on a continuous stream of tweet data from Kafka.
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
With the shift to exascale computer systems, the importance of productive programming models for distributed systems is increasing. Partitioned Global Address Space (PGAS) programming models aim to reduce the complexity of writing distributed-memory parallel programs by introducing global operations on distributed arrays, distributed task parallelism, directed synchronization, and mutual exclusion. However, a key challenge in the application of PGAS programming models is the improvement of compilers and runtime systems. In particular, one open question is how runtime systems meet the requirement of exascale systems, where a large number of asynchronous tasks are executed.
While there are various tasking runtimes such as Qthreads, OCR, and HClib, there is no existing comparative study on PGAS tasking/threading runtime systems. To explore runtime systems for PGAS programming languages, we have implemented OCR-based and HClib-based Chapel runtimes and evaluated them with an initial focus on tasking and synchronization implementations. The results show that our OCR and HClib-based implementations can improve the performance of PGAS programs compared to the ex- isting Qthreads backend of Chapel.
Flink Connector Development Tips & TricksEron Wright
A look at some of the challenges and techniques for developing a connector for Apache Flink, covering the different types of connectors, lifecycle, metrics, event-time support, and fault tolerance.
Presentation video: https://www.youtube.com/watch?v=ZkbYO5S4z18
Beginning with MapReduce and its first popular open-source implementation in Apache Hadoop the data processing landscape has evolved quite a bit. Since then we have seen several paradigm shifts and open-source systems evolved to support new types of applications and to attract new audiences. We will follow developments using the example of the open-source stream processing system Apache Flink and in the end we will see how expressive APIs, support for event-driven applications, Flink SQL for seamless batch and stream processing, and a powerful runtime enable a wide range of applications.
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward
Flink provides fault tolerance guarantees through checkpointing and recovery mechanisms. Checkpoints take consistent snapshots of distributed state and data, while barriers mark checkpoints in the data flow. This allows Flink to recover jobs from failures and resume processing from the last completed checkpoint. Flink also implements high availability by persisting metadata like the execution graph and checkpoints to Apache Zookeeper, enabling a standby JobManager to take over if the active one fails.
Jingwei Lu and Jason Zhang (Airbnb)
AirStream is a realtime stream computation framework built on top of Spark Streaming and HBase that allows our engineers and data scientists to easily leverage HBase to get real-time insights and build real-time feedback loops. In this talk, we will introduce AirStream, and then go over a few production use cases.
Modern software development is increasingly taking a “microservice” approach that has resulted in an explosion of complexity at the network level. We have more applications running distributed across different datacenters. Distributed tracing, events, and metrics are essential for observing and understanding modern microservice architectures.
This talk is a deep dive on how to monitor your distributed system. You will get tools, methodologies, and experiences that will help you to realize what your applications expose and how to get value out from all these information.
Gianluca Arbezzano, SRE at InfluxData will share how to monitor a distributed system, how to switch from a more traditional monitoring approach to observability. Stay focused on the server’s role and not on the hostname because it’s not really important anymore, our servers or containers are fast moving part and it’s easy to detach it from the right in case of trouble than call the server by name as a cute puppet. How to design a SLO for your core services and now to iterate on them. Instrument your services with tracing using tools like Zipkin or Jaeger to measure latency between in your network.
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Flink Forward
The document discusses how Pravega, an open source stream storage system, enables features like watermarking, scaling, and exactly-once processing in stream processing systems. It explains that Pravega stores streams as sequences of events across distributed segments, which allows for watermarking of event timestamps, dynamic scaling of streams, and tracking of event ingestion to enable exactly-once processing. Checkpointing and replay of events from checkpoints also allows stream processors using Pravega to recover from failures while maintaining exactly-once semantics.
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward
Failures are inevitable. How can we recover a Flink job from outage? How do we reprocess data from outage period? What are the implications to downstream consumers? These are important questions that we need to answer when running Flink for critical data processing applications. We implemented two solutions for our stream processing platform: (1) use data warehouse, like Hive, as backfill source (2) rewind Flink job using external checkpoint. We will describe both solutions in details, and discuss the pros and cons of each approach. We will also take a look at some of the caveats to watch out for.
At Yelp we run hundreds of Flink jobs to power a wide range of applications: push notifications, data replication, ETL, sessionizing and more. Routine operations like deploys, restart, and savepointing for so many jobs would take quite a bit of developers’ time without the right degree of automation. The latest addition to our toolshed is a Kubernetes operator managing the deployment and the lifetime of Flink clusters on PaaSTA, Yelp’s Platform As A Service.
We replaced our deployment framework launching Flink clusters on top of AWS EMR with a Kubernetes operator managing fully Docker-ized Flink clusters. Compared to EMR, this architecture allowed us to both drastically reduce the deployment time of our Flink clusters and to share our hardware resources more efficiently. In addition, we now offer to our developers the same interface they are used to for running REST services, batch jobs and many other workloads on PaaSTA.
This talk will give a brief overview of Yelp’s PaaSTA before diving into the details of how the Kubernetes operator has been implemented and how it has been integrated with Yelp developers’ workflow (deploy, logs, savepoints, upgrades, etc), to end with a glimpse of the future features we are planning for the operator (Flink as a library, autoscaling, etc.).
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
In diesem Vortrag wird es zunächst einen kurzen Überblick über den aktuellen Stand im Bereich der Streaming-Datenanalyse geben. Danach wird es mit einer kleinen Einführung in das Apache-Flink-System zur Echtzeit-Datenanalyse weitergehen, bevor wir tiefer in einige der interessanten Eigenschaften eintauchen werden, die Flink von den anderen Spielern in diesem Bereich unterscheidet. Dazu werden wir beispielhafte Anwendungsfälle betrachten, die entweder direkt von Nutzern stammen oder auf unserer Erfahrung mit Nutzern basieren. Spezielle Eigenschaften, die wir betrachten werden, sind beispielsweise die Unterstützung für die Zerlegung von Events in einzelnen Sessions basierend auf der Zeit, zu der ein Ereignis passierte (event-time), Bestimmung von Zeitpunkten zum jeweiligen Speichern des Zustands eines Streaming-Programms für spätere Neustarts, die effiziente Abwicklung bei sehr großen zustandsorientierten Streaming-Berechnungen und die Zugänglichkeit des Zustandes von außerhalb.
Virtual Flink Forward 2020: Data driven matchmaking streaming at Hyperconnect...Flink Forward
HyperConnect's 1to1 video matchmaking system is consist of various machine learning techniques to maximize user satisfaction. Our matchmaking system manages large user context containing actions a few seconds ago, and reacts in milliseconds to produce meaningful new results in each user session. It's difficult in traditional way. So, distributed streaming is essential to handle in this cases. Topics include: - Why our team choose Apache Flink in comparison with alternatives - Matchmaking streaming architecture with detail abstraction levels based on Flink operator - Pairwise scoring microservice management with Flink - Stateful matchmaking computation with low latency, fault-tolerance, and scalability - How to manage large-scale events: classifying feature types, collecting with a multi-window stream - Applications: personalization, multi-armed-bandit on stream.
The document discusses new features in Apache Flink 1.2, including queryable state and dynamic scaling. It provides an overview of Flink 1.2 features like security enhancements, metrics, and improvements to table API and SQL. It then examines queryable state and dynamic scaling in more detail, covering motivations and implementations for making state queryable and allowing jobs to scale resources dynamically in response to changing workloads. The document concludes by looking briefly beyond Flink 1.2 to future work on automatic scaling without restarts.
This document provides an overview of reactive programming in Java and Spring 5. It discusses reactive programming concepts like reactive streams specification, Reactor library, and operators. It also covers how to build reactive applications with Spring WebFlux, including creating reactive controllers, routing with functional endpoints, using WebClient for HTTP requests, and testing with WebTestClient.
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
DTW: Dynamic Time Warping is a well-known method to find patterns within a time-series. It has the possibility to find a pattern even if the data are distorted. It can be used to detect trends in sell, defect in machine signals in the industry, medicine for electro-cardiograms, DNA…
Most of the implementations are usually very slow, but a very efficient open source implementation (best paper SIGKDD 2012) is implemented in C. It can be easily ported in other language, as Java, so that it can be then easily used in Flink.
We present how we did some slight modifications so that we can use with Flink at even greater scale to return the TopK best matches on past data or streaming data.
Servlet vs Reactive Stacks in 5 Use CasesVMware Tanzu
ROSSEN STOYANCHEV SPRING FRAMEWORK DEVELOPER
In the past year Netflix shared a story about upgrading their main gateway serving 83 million users from Servlet-stack Zuul 1 to an async and non-blocking Netty-based Zuul 2. The results were interesting and nuanced with some major benefits as well as some trade-offs. Can mere mortal web applications make this journey and moreover should they? The best way to explore the answer is through a specific use case. In this talk we'll take 5 common use cases in web application development and explore the impact of building on Servlet and Reactive web application stacks. For reactive programming we'll use RxJava and Reactor. For the web stack we'll pit Spring MVC vs Spring WebFlux (new in Spring Framework 5.0) allowing us to move easily between the Servlet and Reactive worlds and drawing a meaningful, apples-to-apples comparison. Spring knowledge is not required and not assumed for this session.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1VhSzmy.
Robert Metzger provides an overview of the Apache Flink internals and its streaming-first philosophy, as well as the programming APIs. Filmed at qconlondon.com.
Robert Metzger is a PMC member at the Apache Flink project and a cofounder and software engineer at data Artisans. He is the author of many Flink components including the Kafka and YARN connectors.
El día 21 de Septiembre, tuvimos el placer de acoger en nuestras oficinas un Meetup impartido por nuestro compañero Paco Guerrero sobre la plataforma Apache Flink.
"Apache Flink es una plataforma open source de procesamiento en tiempo real, que está en auge al ofrecer características de las que otras tecnologías con las que compite no disponen, sin impacto en su rendimiento. En esta formación introduciremos la filosofía y motor de procesamiento que hace a Flink tan especial y potente. También recorreremos los pilares básicos que confirman a Flink como la plataforma de streaming más prometedora actualmente"
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Building data product requires having lambda architecture to bridge the batch and streaming processing. AirStream is a framework built on top of HBase to allow users to easily build data products at Airbnb. It proved HBase is impactful and useful in the production for mission critical data products.
In the talk, we will present the applications to leverage HBase to compute moving average, distinct count, window based join and etc. in the streaming computation.
Also, we will talk about how to leverage HBase to bridge the gap between batch and streaming queries, including building presto-hbase connector to serve near real time ad-hoc query.
by Liyin Tang of AirBnB
Flink currently features different APIs for bounded/batch (DataSet) and streaming (DataStream) programs. And while the DataStream API can handle batch use cases, it is much less efficient in that compared to the DataSet API. The Table API was built as a unified API on top of both, to cover batch and streaming with the same API, and under the hood delegate to either DataSet or DataStream.
In this talk, we present the latest on the Flink community's efforts to rework the APIs and the stack for better unified batch & streaming experience. We will discuss:
- The future roles and interplay of DataSet, DataStream, and Table API
- The new Flink stack and the abstractions on which these APIs will build
- The new unified batch/streaming sources
- How batch and streaming optimizations differ in the runtime, and what the future interplay of batch and streaming execution could look like.
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...Flink Forward
Over the past few years almost all data processing has moved from batch to stream processing. This isn’t simply driven by a desire for lower latency, but by a fundamental understanding that streams are a more effective primitive for data processing, providing a better impedance match to varied downstream systems and services. Splunk, like many others, has been evolving its core data infrastructure to better provide a simpler and more consistent programming model, address correctness and latency of data, and allow for a more open integration model with our data platform. Throughout this process, we’ve come to view Apache Flink as a critical backbone in our core data infrastructure. Join us to learn more about how our data infrastructure - and how we think about it - has fundamentally changed.
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
This is the slide deck that was presented at the Hadoop Users Group at LinkedIn on November 5, 2013.
The presentation covers what Samza is, why we built it, and how it works.
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
Fabian Hueske presented on stream analytics using SQL on Apache Flink. Flink provides a scalable platform for stream processing that is fast, accurate, and reliable. Its relational APIs allow querying both batch and streaming data using standard SQL or a LINQ-style Table API. Queries on streaming data produce continuously updating results. Windows can be used to compute aggregates over tumbling time intervals. The dynamic tables representing streaming data can be converted to output streams encoding updates as insertions and deletions. While not all queries can be supported, techniques like limiting state size allow bounding computational resources. Use cases like continuous ETL, dashboards, and event-driven architectures were discussed.
Aljoscha Krettek offers a very short introduction to stream processing before diving into writing code and demonstrating the features in Apache Flink that make truly robust stream processing possible, with a focus on correctness and robustness in stream processing.
All of this will be done in the context of a real-time analytics application that we’ll be modifying on the fly based on the topics we’re working though, as Aljoscha exercises Flink’s unique features, demonstrates fault recovery, clearly explains why event time is such an important concept in robust, stateful stream processing, and covers the features you need in a stream processor to do robust, stateful stream processing in production.
We’ll also use a real-time analytics dashboard to visualize the results we’re computing in real time, allowing us to easily see the effects of the code we’re developing as we go along.
Topics include:
* Apache Flink
* Stateful stream processing
* Event time versus processing time
* Fault tolerance
* State management in the face of faults
* Savepoints
* Data reprocessing
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
These are the slides of my talk at the Chicago Apache Flink Meetup on April 19, 2016. This talk explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation, marks a new era of Real-Time and Real-World streaming analytics. The talk will map Flink's capabilities to streaming analytics use cases.
Do flink on web with flow - Dongwon Kim & Haemee park, SK Telecom)Flink Forward
The document describes FLOW, an abstraction layer that allows domain experts to develop Apache Flink streaming applications without needing expertise in Flink's APIs. FLOW provides a graphical user interface where users can build streaming data pipelines visually using common SQL operations and connectors. When users save their pipelines in FLOW, it generates the underlying Flink code. This allows domain experts across various fields to directly develop real-time stream processing solutions with Flink without involving data engineers to bridge the gap in knowledge.
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
This document provides information about the first conference on Apache Flink. It summarizes key aspects of the Apache Flink streaming engine, including its improved DataStream API, support for event time processing, high availability, and integration of batch and streaming capabilities. The document also outlines Flink's progress towards version 1.0, which will focus on defining public APIs and backwards compatibility, and outlines future plans such as enhancing usability features on top of the DataStream API.
Modern software development is increasingly taking a “microservice” approach that has resulted in an explosion of complexity at the network level. We have more applications running distributed across different datacenters. Distributed tracing, events, and metrics are essential for observing and understanding modern microservice architectures.
This talk is a deep dive on how to monitor your distributed system. You will get tools, methodologies, and experiences that will help you to realize what your applications expose and how to get value out from all these information.
Gianluca Arbezzano, SRE at InfluxData will share how to monitor a distributed system, how to switch from a more traditional monitoring approach to observability. Stay focused on the server’s role and not on the hostname because it’s not really important anymore, our servers or containers are fast moving part and it’s easy to detach it from the right in case of trouble than call the server by name as a cute puppet. How to design a SLO for your core services and now to iterate on them. Instrument your services with tracing using tools like Zipkin or Jaeger to measure latency between in your network.
Virtual Flink Forward 2020: Everything is connected: How watermarking, scalin...Flink Forward
The document discusses how Pravega, an open source stream storage system, enables features like watermarking, scaling, and exactly-once processing in stream processing systems. It explains that Pravega stores streams as sequences of events across distributed segments, which allows for watermarking of event timestamps, dynamic scaling of streams, and tracking of event ingestion to enable exactly-once processing. Checkpointing and replay of events from checkpoints also allows stream processors using Pravega to recover from failures while maintaining exactly-once semantics.
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward
Failures are inevitable. How can we recover a Flink job from outage? How do we reprocess data from outage period? What are the implications to downstream consumers? These are important questions that we need to answer when running Flink for critical data processing applications. We implemented two solutions for our stream processing platform: (1) use data warehouse, like Hive, as backfill source (2) rewind Flink job using external checkpoint. We will describe both solutions in details, and discuss the pros and cons of each approach. We will also take a look at some of the caveats to watch out for.
At Yelp we run hundreds of Flink jobs to power a wide range of applications: push notifications, data replication, ETL, sessionizing and more. Routine operations like deploys, restart, and savepointing for so many jobs would take quite a bit of developers’ time without the right degree of automation. The latest addition to our toolshed is a Kubernetes operator managing the deployment and the lifetime of Flink clusters on PaaSTA, Yelp’s Platform As A Service.
We replaced our deployment framework launching Flink clusters on top of AWS EMR with a Kubernetes operator managing fully Docker-ized Flink clusters. Compared to EMR, this architecture allowed us to both drastically reduce the deployment time of our Flink clusters and to share our hardware resources more efficiently. In addition, we now offer to our developers the same interface they are used to for running REST services, batch jobs and many other workloads on PaaSTA.
This talk will give a brief overview of Yelp’s PaaSTA before diving into the details of how the Kubernetes operator has been implemented and how it has been integrated with Yelp developers’ workflow (deploy, logs, savepoints, upgrades, etc), to end with a glimpse of the future features we are planning for the operator (Flink as a library, autoscaling, etc.).
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
In diesem Vortrag wird es zunächst einen kurzen Überblick über den aktuellen Stand im Bereich der Streaming-Datenanalyse geben. Danach wird es mit einer kleinen Einführung in das Apache-Flink-System zur Echtzeit-Datenanalyse weitergehen, bevor wir tiefer in einige der interessanten Eigenschaften eintauchen werden, die Flink von den anderen Spielern in diesem Bereich unterscheidet. Dazu werden wir beispielhafte Anwendungsfälle betrachten, die entweder direkt von Nutzern stammen oder auf unserer Erfahrung mit Nutzern basieren. Spezielle Eigenschaften, die wir betrachten werden, sind beispielsweise die Unterstützung für die Zerlegung von Events in einzelnen Sessions basierend auf der Zeit, zu der ein Ereignis passierte (event-time), Bestimmung von Zeitpunkten zum jeweiligen Speichern des Zustands eines Streaming-Programms für spätere Neustarts, die effiziente Abwicklung bei sehr großen zustandsorientierten Streaming-Berechnungen und die Zugänglichkeit des Zustandes von außerhalb.
Virtual Flink Forward 2020: Data driven matchmaking streaming at Hyperconnect...Flink Forward
HyperConnect's 1to1 video matchmaking system is consist of various machine learning techniques to maximize user satisfaction. Our matchmaking system manages large user context containing actions a few seconds ago, and reacts in milliseconds to produce meaningful new results in each user session. It's difficult in traditional way. So, distributed streaming is essential to handle in this cases. Topics include: - Why our team choose Apache Flink in comparison with alternatives - Matchmaking streaming architecture with detail abstraction levels based on Flink operator - Pairwise scoring microservice management with Flink - Stateful matchmaking computation with low latency, fault-tolerance, and scalability - How to manage large-scale events: classifying feature types, collecting with a multi-window stream - Applications: personalization, multi-armed-bandit on stream.
The document discusses new features in Apache Flink 1.2, including queryable state and dynamic scaling. It provides an overview of Flink 1.2 features like security enhancements, metrics, and improvements to table API and SQL. It then examines queryable state and dynamic scaling in more detail, covering motivations and implementations for making state queryable and allowing jobs to scale resources dynamically in response to changing workloads. The document concludes by looking briefly beyond Flink 1.2 to future work on automatic scaling without restarts.
This document provides an overview of reactive programming in Java and Spring 5. It discusses reactive programming concepts like reactive streams specification, Reactor library, and operators. It also covers how to build reactive applications with Spring WebFlux, including creating reactive controllers, routing with functional endpoints, using WebClient for HTTP requests, and testing with WebTestClient.
FlinkDTW: Time-series Pattern Search at Scale Using Dynamic Time Warping - Ch...Flink Forward
DTW: Dynamic Time Warping is a well-known method to find patterns within a time-series. It has the possibility to find a pattern even if the data are distorted. It can be used to detect trends in sell, defect in machine signals in the industry, medicine for electro-cardiograms, DNA…
Most of the implementations are usually very slow, but a very efficient open source implementation (best paper SIGKDD 2012) is implemented in C. It can be easily ported in other language, as Java, so that it can be then easily used in Flink.
We present how we did some slight modifications so that we can use with Flink at even greater scale to return the TopK best matches on past data or streaming data.
Servlet vs Reactive Stacks in 5 Use CasesVMware Tanzu
ROSSEN STOYANCHEV SPRING FRAMEWORK DEVELOPER
In the past year Netflix shared a story about upgrading their main gateway serving 83 million users from Servlet-stack Zuul 1 to an async and non-blocking Netty-based Zuul 2. The results were interesting and nuanced with some major benefits as well as some trade-offs. Can mere mortal web applications make this journey and moreover should they? The best way to explore the answer is through a specific use case. In this talk we'll take 5 common use cases in web application development and explore the impact of building on Servlet and Reactive web application stacks. For reactive programming we'll use RxJava and Reactor. For the web stack we'll pit Spring MVC vs Spring WebFlux (new in Spring Framework 5.0) allowing us to move easily between the Servlet and Reactive worlds and drawing a meaningful, apples-to-apples comparison. Spring knowledge is not required and not assumed for this session.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1VhSzmy.
Robert Metzger provides an overview of the Apache Flink internals and its streaming-first philosophy, as well as the programming APIs. Filmed at qconlondon.com.
Robert Metzger is a PMC member at the Apache Flink project and a cofounder and software engineer at data Artisans. He is the author of many Flink components including the Kafka and YARN connectors.
El día 21 de Septiembre, tuvimos el placer de acoger en nuestras oficinas un Meetup impartido por nuestro compañero Paco Guerrero sobre la plataforma Apache Flink.
"Apache Flink es una plataforma open source de procesamiento en tiempo real, que está en auge al ofrecer características de las que otras tecnologías con las que compite no disponen, sin impacto en su rendimiento. En esta formación introduciremos la filosofía y motor de procesamiento que hace a Flink tan especial y potente. También recorreremos los pilares básicos que confirman a Flink como la plataforma de streaming más prometedora actualmente"
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Building data product requires having lambda architecture to bridge the batch and streaming processing. AirStream is a framework built on top of HBase to allow users to easily build data products at Airbnb. It proved HBase is impactful and useful in the production for mission critical data products.
In the talk, we will present the applications to leverage HBase to compute moving average, distinct count, window based join and etc. in the streaming computation.
Also, we will talk about how to leverage HBase to bridge the gap between batch and streaming queries, including building presto-hbase connector to serve near real time ad-hoc query.
by Liyin Tang of AirBnB
Flink currently features different APIs for bounded/batch (DataSet) and streaming (DataStream) programs. And while the DataStream API can handle batch use cases, it is much less efficient in that compared to the DataSet API. The Table API was built as a unified API on top of both, to cover batch and streaming with the same API, and under the hood delegate to either DataSet or DataStream.
In this talk, we present the latest on the Flink community's efforts to rework the APIs and the stack for better unified batch & streaming experience. We will discuss:
- The future roles and interplay of DataSet, DataStream, and Table API
- The new Flink stack and the abstractions on which these APIs will build
- The new unified batch/streaming sources
- How batch and streaming optimizations differ in the runtime, and what the future interplay of batch and streaming execution could look like.
Virtual Flink Forward 2020: Keynote: The Evolution of Data Infrastructure at ...Flink Forward
Over the past few years almost all data processing has moved from batch to stream processing. This isn’t simply driven by a desire for lower latency, but by a fundamental understanding that streams are a more effective primitive for data processing, providing a better impedance match to varied downstream systems and services. Splunk, like many others, has been evolving its core data infrastructure to better provide a simpler and more consistent programming model, address correctness and latency of data, and allow for a more open integration model with our data platform. Throughout this process, we’ve come to view Apache Flink as a critical backbone in our core data infrastructure. Join us to learn more about how our data infrastructure - and how we think about it - has fundamentally changed.
Apache Incubator Samza: Stream Processing at LinkedInChris Riccomini
This is the slide deck that was presented at the Hadoop Users Group at LinkedIn on November 5, 2013.
The presentation covers what Samza is, why we built it, and how it works.
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
Fabian Hueske presented on stream analytics using SQL on Apache Flink. Flink provides a scalable platform for stream processing that is fast, accurate, and reliable. Its relational APIs allow querying both batch and streaming data using standard SQL or a LINQ-style Table API. Queries on streaming data produce continuously updating results. Windows can be used to compute aggregates over tumbling time intervals. The dynamic tables representing streaming data can be converted to output streams encoding updates as insertions and deletions. While not all queries can be supported, techniques like limiting state size allow bounding computational resources. Use cases like continuous ETL, dashboards, and event-driven architectures were discussed.
Aljoscha Krettek offers a very short introduction to stream processing before diving into writing code and demonstrating the features in Apache Flink that make truly robust stream processing possible, with a focus on correctness and robustness in stream processing.
All of this will be done in the context of a real-time analytics application that we’ll be modifying on the fly based on the topics we’re working though, as Aljoscha exercises Flink’s unique features, demonstrates fault recovery, clearly explains why event time is such an important concept in robust, stateful stream processing, and covers the features you need in a stream processor to do robust, stateful stream processing in production.
We’ll also use a real-time analytics dashboard to visualize the results we’re computing in real time, allowing us to easily see the effects of the code we’re developing as we go along.
Topics include:
* Apache Flink
* Stateful stream processing
* Event time versus processing time
* Fault tolerance
* State management in the face of faults
* Savepoints
* Data reprocessing
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
These are the slides of my talk at the Chicago Apache Flink Meetup on April 19, 2016. This talk explains how Apache Flink 1.0 announced on March 8th, 2016 by the Apache Software Foundation, marks a new era of Real-Time and Real-World streaming analytics. The talk will map Flink's capabilities to streaming analytics use cases.
Do flink on web with flow - Dongwon Kim & Haemee park, SK Telecom)Flink Forward
The document describes FLOW, an abstraction layer that allows domain experts to develop Apache Flink streaming applications without needing expertise in Flink's APIs. FLOW provides a graphical user interface where users can build streaming data pipelines visually using common SQL operations and connectors. When users save their pipelines in FLOW, it generates the underlying Flink code. This allows domain experts across various fields to directly develop real-time stream processing solutions with Flink without involving data engineers to bridge the gap in knowledge.
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
This document provides information about the first conference on Apache Flink. It summarizes key aspects of the Apache Flink streaming engine, including its improved DataStream API, support for event time processing, high availability, and integration of batch and streaming capabilities. The document also outlines Flink's progress towards version 1.0, which will focus on defining public APIs and backwards compatibility, and outlines future plans such as enhancing usability features on top of the DataStream API.
Apache Flink is an open source platform for distributed stream and batch data processing. It provides two APIs - a DataStream API for real-time streaming and a DataSet API for batch processing. The document introduces Flink's core concepts like sources, sinks, transformations, and windows. It also provides instructions on setting up a Flink project and describes some use cases like processing Twitter feeds. Additional resources like tutorials, documentation and mailing lists are referenced to help users get started with Flink.
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...Zhenzhong Xu
Netflix is obsessed with customer joy, we relentlessly focus on product experience and high-quality content. In recent years, we have been making heavy investments in the tech-driven studio and content production. As a result, a lot of unique challenges arise in the real-time data infrastructure space. For example, in a microservices architecture, domain entities are spread in different applications and persistence storages, this made low latency consistent operational reporting and entity searching especially challenging.
In this talk, we’ll talk about some interesting use cases, the various challenges lay in the fundamentals of distributed systems, and how did we solve them. We will also discuss the learnings, things we could’ve done differently, and the new vision towards an open self-serving Data Mesh platform that empowers our partners and users to build flexible real-time data pipelines.
Kubernetes is exploding in popularity right now and has all the buzz and cargo-culting that Docker enjoyed just a few years ago. But what even is Kubernetes? How do I run my PHP apps in it? Should I run my PHP apps in it ?
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
This document provides an overview and agenda for a presentation on Apache Flink. It begins with an introduction to Apache Flink and how it fits into the big data ecosystem. It then explains why Flink is considered the "4th generation" of big data analytics frameworks. Finally, it outlines next steps for those interested in Flink, such as learning more or contributing to the project. The presentation covers topics such as Flink's APIs, libraries, architecture, programming model and integration with other tools.
This document provides an overview of Apache Flink, an open-source stream processing framework. It discusses Flink's capabilities in supporting streaming, batch, and iterative processing natively through a streaming dataflow model. It also describes Flink's architecture including the client, job manager, task managers, and various execution setups like local, remote, YARN, and embedded. Finally, it compares Flink to other stream and batch processing systems in terms of their APIs, fault tolerance guarantees, and strengths.
BigDataFest_ Building Modern Data Streaming Appsssuser73434e
https://sessionize.com/big-data-fest-by-softserve/
The Big Data Fest 2023 is a two-day online event that brings together experts, enthusiasts, and members of the community to discuss the latest developments, trending technologies, and tools, and make an impact on the future of Big Data and Data Engineering.
Attendees will have the opportunity to hear from keynote speakers, attend panel discussions and live Q&As, and participate in hands-on workshops.
The event will also feature a charity component aimed at raising money for Open Eyes Fund to buy ambulances for the hottest spots in Ukraine. We invite everyone to support this event and help make a difference in saving lives.
Participation in the event is free, but we encourage attendees to make donations to support this important initiative.
The conference will include a variety of activities divided into cloud streams, such as:
Keynote speeches from leading experts in the field of Big Data
Live Q&As
Panel discussions on the future of Data Engineering
Hands-on workshops on data management and analytics
Networking opportunities with top professionals and leading experts in the field.
Our main goal is to influence the future shape of Data Engineering and promote the use of Big Data for the greater good.
In my session, I will show you some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and more.
In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar and/or Apache Kafka. From there we build streaming ETL with Apache Spark and enhance events with serverless functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. We will stream data into Iceberg and other data stores.
We use the best streaming tools for the current applications with FLiPN and FLaNK. https://www.datainmotion.dev/
https://www.youtube.com/watch?v=qW9CP8Xngk4&ab_channel=SoftServeCareer
Apache NiFi
Apache Flink
Apache Kafka
Apache iceberg
Streams Messaging Manager
SQL Stream Builder
Cloudera DataFlow Designer
NiFi Registry
Cloudera Schema Registry
big data fest building modern data streaming appsTimothy Spann
big data fest building modern data streaming apps
25 May 2023
softtserver
flank stack
apache nifi
apache flink
apache kafka
minifi
java
apache iceberg
cloudera
tim spann
Software is changing the world. CGC is a Common Gateway Coding as the name says, it is a "common" language approach for almost everything. I want to show how a multi-language approach to infrastructure as code using general purpose programming languages lets cloud engineers and code producers unlocking the same software engineering techniques commonly used for applications.
The document describes an automated tool called ipsnapshoter that detects misconfigured HTTP services. It scans IP addresses and ports, uses Nmap to find available hosts, takes screenshots of server responses using EyeWitness, and publishes results in an HTML report. The tool is designed to help security testers identify vulnerabilities by visually exploring misconfigurations before malicious actors. It is written in Python and uses libraries like Nmap, EyeWitness, and a simple HTTP server to efficiently scan thousands of addresses and generate consolidated reports.
BigDataFest Building Modern Data Streaming Appsssuser73434e
BigDataFest: Building Modern Data Streaming Apps
2023
https://app.softserveinc.com/apply/big_data_fest/
CONFERENCE FOR
•DATA ENGINEERS•DATA SCIENTISTS•DATA ARCHITECTS
•DATA AND BUSINESS ANALYSTS•SOFTWARE DEVELOPERS
•ANYONE INTERESTED IN LEARNING MORE ABOUT DATA
Description
In my session, I will show you some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and more.
In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar and/or Apache Kafka. From there we build streaming ETL with Apache Spark and enhance events with serverless functions for ML and enrichment. We build continuous queries against our topics with Flink SQL. We will stream data into Iceberg and other data stores.
We use the best streaming tools for the current applications with FLiPN and FLaNK. https://www.datainmotion.dev/
Tim Spann is a Principal Developer Advocate at Cloudera where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
https://www.datainmotion.dev/p/about-me.html
https://dzone.com/users/297029/bunkertor.html
https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/185963
eBPF Powered Distributed Kubernetes Performance Analysis - Lorenzo Fontana, I...InfluxData
Since the Linux kernel 4.x series a lot of enanchements reached mainline to the eBPF ecosystem giving us the capability to do a lot more than just network stuff.
The purpose of this talk is to give an initial understanding on what eBPF programs are and how to hook them to programs running inside Kubernetes clusters in order to answer targeted questions at cluster level but about very specific fine-grained situations happening in our programs and systems, like:
- Had that function in my program been called ?
- For a given function which arguments have been passed to it? And what it did return?
- Which TCP packets are being retransmitted?
- What are the queries running slow?
- Insights on programming language events/gc
- Had that file been opened?
Imagine a programmable Kubernetes performance analysis tool that runs at cluster level without performance implications how would you it to be?
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...DataWorks Summit
This document discusses using Apache Flume to stream data from various sources to Hadoop for telecommunications operators. It introduces Flume, describing its key components like agents, sources, channels, and sinks. It provides an end-to-end architecture example showing data flowing from external sources through Flume into Hadoop and then into an EDW for analysis and user reports. Finally, it discusses next generation architectures using technologies like Spark, machine learning, and real-time analytics.
OSDC 2018 | Distributed Monitoring by Gianluca ArbezzanoNETWAYS
Modern software development is increasingly taking a “microservice” approach that has resulted in an explosion of complexity at the network level. We have more applications running distributed across different datacenters. Distributed tracing, events, and metrics are essential for observing and understanding modern microservice architectures.
This talk is a deep dive on how to monitor your distributed system. You will get tools, methodologies, and experiences that will help you to realize what your applications expose and how to get value out from all these information.
Gianluca Arbezzano, SRE at InfluxData will share how to monitor a distributed system, how to switch from a more traditional monitoring approach to observability. Stay focused on the server’s role and not on the hostname because it’s not really important anymore, our servers or containers are fast moving part and it’s easy to detach it from the right in case of trouble than call the server by name as a cute puppet. How to design a SLO for your core services and now to iterate on them. Instrument your services with tracing using tools like Zipkin or Jaeger to measure latency between in your network.
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...HostedbyConfluent
Time series data is everywhere -- connected IoT devices, application monitoring & observability platforms, and more. What makes time series datastreams challenging is that they often have orders of magnitude more data than other workloads, with millions of time series datapoints being quite common. Given its ability to ingest high volumes of data, Kafka is a natural part of any data architecture handling large volumes of time series telemetry, specifically as an intermediate buffer before that data is persisted in InfluxDB for processing, analysis, and use in other applications. In this session, we will show you how you can stream time series data to your IoT application using Kafka queues and InfluxDB, drawing upon deployments done at Hulu and Wayfair that allow both to ingest 1 million metrics per second. Once this session is complete, you’ll be able to connect a Kafka queue to an InfluxDB instance as the beginning of your own time series data pipeline.
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkFlink Forward
This document discusses Google Cloud Dataflow and how it can be executed using Apache Flink. It provides an overview of Dataflow and its API, which is similar to batch and streaming concepts in Flink. It then describes how a Dataflow program is translated to an Abstract Syntax Tree (AST) and how the AST is converted to a Flink execution graph by implementing translators for specific Dataflow transforms like ParDo and Combine. Finally, it mentions the FlinkPipelineRunner that is available on GitHub to execute Dataflow pipelines using Flink.
Similar to Stateful stream processing made easy with Apache Flink (20)
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
5. Apache Flink is an open source stream
processing framework developed by the
Apache Software Foundation.
The core of Apache Flink is a distributed
streaming dataflow engine written in
Java and Scala.
5
TL;DR
6. Flink's pipelined runtime system enables
the execution of bulk/batch and stream
processing programs.
6
TL;DR
- Native Stream
- Low Latency
- High Throughput
- Stateful
- Exactly-one guarantees
- Distributed
- Expressive Apis
- …
Main Features
12. A bit of History
Actually flink was born as a sort of spin-off
from the project Stratosphere.
“Stratosphere is a research project whose
goal is to develop the next generation Big
Data Analytics platform.
aimed at Next Generation Big Data
Analytics Platform
The project includes universities from the
area of Berlin, namely, TU Berlin, Humboldt
University and the Hasso Plattner Institute.” 12
TL;DR
13. 13
TL;DR (Flink use cases)
uses Flink for real-time
process monitoring and ETL.
Telefónica NEXT's TÜV-certified
Data Anonymization Platform
is powered by Flink.
uses a fork of Flink called Blink
to optimize search rankings in
real time.
14. 14
TL;DR (Flink use cases)
Ericsson used Flink to build a
real-time anomaly detector
over large infrastructures.
MediaMath uses Flink to power
its real-time reporting
infrastructure.
uses Flink to surface near
real-time intelligence from SaaS
application activity.
https://flink.apache.org/poweredby.html
21. That's a lot of fun but
let’s start
from the very beginning
…
21
22. BigData is
…
well, big!
IBM (2014): every day about 2.5 trillion (1018
) of data
bytes are created and 90% of the data has been
created only in the last two years
Each year about EXABYTE (10^18, 2^60) of data.
22
29. Government
- smarter surveillance: analyze data from vehicles and cameras to alert
law enforcement of potential issues
HealthCare
- proactive treatment: continuously improve care based on
personalized data streams
Finance
- manage risk: continuously monitor trades and calculate derivative
values in real-time
Automotive
- improved quality and functionalities: detect problems sooner and
predict breakdowns
Telco
- processing call data: predictive spam and fraud detection
29
areas of application
cf. use-cases-streaming-analytics
30. Government
- smarter surveillance: analyze data from vehicles and cameras to alert
law enforcement of potential issues
HealthCare
- proactive treatment: continuously improve care based on
personalized data streams
Finance
- manage risk: continuously monitor trades and calculate derivative
values in real-time
Automotive
- improved quality and functionalities: detect problems sooner and
predict breakdowns
Telco
- processing call data: predictive spam and fraud detection
30
areas of application
cf. use-cases-streaming-analytics
34. - state management
- fault tolerance and recovery
- performance and scalability
- programming model
- ecosystem
34
Challenges
35. only a few problems can be solved without keeping some sort of
application state
state is needed for any kind of aggregation or counting
35
State Management
36. only a few problems can be solved without keeping some sort of
application state
● on a single node (and neglecting
threading) keeping state seems easy,
just keep it in the local memory but
with threads in the picture, even
on a single node, keeping state
consistent requires careful
synchronization;
● on a multi node/multi thread/long running application it may
will end up in a mess.
36
State Management
37. only a few problems can be solved without keeping some sort of
application state
Classical solution: keep state in an external database (KV-stores are
frequently the best fit).
● yet another system to manage
● yet another bottleneck to avoid
● yet another syncronization point to
care about
37
State Management
38. Only a few problems can be solved without keeping some sort of
application state
Flink keeps state for your application: synchronize, distribute and even
rescale.
38
State Management
39. Flink state comes in two flavors.
https://www.slideshare.net/dataArtisans/apache-flink-training-working-with-state
39
State Management
40. Flink state backends are threefold:
- MemoryStateBackend
holds data internally as objects on the Java heap, then
collected in the JobManager (master)
- FsStateBackend
holds in-flight data in the TaskManager’s memory then on
filesystem (hdfs or s3 for instance)
- RocksDBStateBackend
holds in-flight data in a RocksDB data base per task then the
whole RocksDB data base is stored on disk
40
State Management
44. Savepoints
“Savepoints are externally stored self-contained checkpoints that you
can use to stop-and-resume or update your Flink programs. They use
Flink’s checkpointing mechanism to create a (non-incremental)
snapshot of the state of your streaming program and write the
checkpoint data and meta data out to an external file system.”
caveat: you must consider serializers evolution for objects stored in the
state
44
fault tolerance and recovery
48. Q. Isn’t easier to use just Kafka ?
A. Well, No.
Kafka, or exactly Kafka Streams API
is a library that any standard Java application can embed and
hence does not attempt to dictate a deployment method;
whereas
Flink is a cluster framework, which means that the framework
takes care of deploying the application
https://www.confluent.io/blog/apache-flink-apache-kafka-streams-comparison-guideline-users/
48
49. Q. Did You write “Real Time” ?
A. Well, yes … actually, i meant …
Rigorously
“Real-time programs must guarantee response
within specified time constraints”
Here Real-time means (as in Cambridge
Dictionary)
“communicated, shown, presented, etc. at the
same time as events actually happen”
49
50. Q. What about Apache Storm ?
A. There is actually a compatiblility suite that let’s
you
● Run unmodified Storm topologies
● Embed Storm code (spouts and bolts) as
operators inside Flink DataStream programs.
50
51. Q. any kind of comparison chart ?
https://www.gmv.com/blog_gmv/future-streaming-technologies-apache-flink/
51