The document summarizes information about windows in Apache Flink stream processing. It discusses different types of windows like count-based and time-based windows. It also covers key concepts like event time vs processing time and the use of watermarks to handle out-of-order data when using event time semantics. The document compares Flink to other stream processing systems and discusses tradeoffs around using event time vs processing time.
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...Ververica
Learn how the combination of Apache Kafka and Apache Flink is making stateful stream processing even more expressive and flexible to support applications in streaming that were previously not considered streamable.
The new world of applications and fast data architectures has broken up the database: Raw data persistence comes in the form of event logs, and the state of the world is computed by a stream processor. Apache Kafka provides a strong solution for the event log, while Apache Flink forms a powerful foundation for the computation over the event streams.
In this talk we discuss how Flink’s abstraction and management of application state have evolved over time and how Flink’s snapshot persistence model and Kafka’s log work together to form a base to build ‘versioned applications’. We will also show how end-to-end exactly-once processing works through a smart integration of Kafka’s transactions and Flink’s checkpointing mechanism.
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Ververica
Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays.
The fact that stream processing is gaining rapid adoption is also due to more powerful and maturing technology (much of it open source at the ASF) that has solved many of the hard technical challenges.
We discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays.
The fact that stream processing is gaining rapid adoption is also due to more powerful and maturing technology (much of it open source at the ASF) that has solved many of the hard technical challenges.
We discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.
Single-Pass Graph Stream Analytics with Apache FlinkParis Carbone
A presentation motivating graph stream processing as a paradigm for large-scale complex analytics and gelly-streaming, our new framework based on Apache Flink.
Data Stream Analytics - Why they are importantParis Carbone
Streaming is cool and it can help us do quick analytics and make profit but what about tsunamis? This is a motivation talk presented at the SeRC Big Data Workshop in Sweden during spring 2016. It motivates the streaming paradigm and provides examples on Apache Flink.
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...Flink Forward
Pattern matching over event streams is increasingly being employed in many areas including financial services and click stream analysis. Flink, as a true stream processing engine, emerges as a natural candidate for these usecases. In this talk, we will present FlinkCEP, a library for Complex Event Processing (CEP) based on Flink. At the conceptual level, we will see the different patterns the library can support, we will present the main building blocks we implemented to support them, and we will discuss possible future additions that will further enhance the coverage of the library. At the practical level, we will show how the integration of FlinkCEP with Flink allows the former to take advantage of Flink's rich ecosystem (e.g. connectors) and its stream processing capabilities, such as support for event-time processing, exactly-once state semantics, fault-tolerance, savepoints and high throughput.
As more and more organizations and individual users turn to Apache Flink for their streaming workloads, there is a bigger demand for additional functionality out-of-the-box. On one hand, there is demand for more low-level APIs that allow for more control, while on the other, users ask for more high-level additions that make the common cases easier to express. This talk will present the new concepts added to the Datastream API in Flink-1.2 and for the upcoming Flink-1.3 release that tried to consolidate the aforementioned goals. We will talk, among others, about the ProcessFunction, a new low level stream processing primitive that gives the user full control over how each event is processed and can register and react to timers, changes in the windowing logic that allow for more flexible windowing strategies, side outputs, and new features concerning the Flink connectors.
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...Ververica
Learn how the combination of Apache Kafka and Apache Flink is making stateful stream processing even more expressive and flexible to support applications in streaming that were previously not considered streamable.
The new world of applications and fast data architectures has broken up the database: Raw data persistence comes in the form of event logs, and the state of the world is computed by a stream processor. Apache Kafka provides a strong solution for the event log, while Apache Flink forms a powerful foundation for the computation over the event streams.
In this talk we discuss how Flink’s abstraction and management of application state have evolved over time and how Flink’s snapshot persistence model and Kafka’s log work together to form a base to build ‘versioned applications’. We will also show how end-to-end exactly-once processing works through a smart integration of Kafka’s transactions and Flink’s checkpointing mechanism.
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...Ververica
Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays.
The fact that stream processing is gaining rapid adoption is also due to more powerful and maturing technology (much of it open source at the ASF) that has solved many of the hard technical challenges.
We discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
Stream Processing is emerging as a popular paradigm for data processing architectures, because it handles the continuous nature of most data and computation and gets rid of artificial boundaries and delays.
The fact that stream processing is gaining rapid adoption is also due to more powerful and maturing technology (much of it open source at the ASF) that has solved many of the hard technical challenges.
We discuss Apache Flink's approach to high performance stream processing with state, strong consistency, low latency, and sophisticated handling of time. With such building blocks, Apache Flink can handle classes of problems previously considered out of reach for stream processing. We also take a sneak preview at the next steps for Flink.
Single-Pass Graph Stream Analytics with Apache FlinkParis Carbone
A presentation motivating graph stream processing as a paradigm for large-scale complex analytics and gelly-streaming, our new framework based on Apache Flink.
Data Stream Analytics - Why they are importantParis Carbone
Streaming is cool and it can help us do quick analytics and make profit but what about tsunamis? This is a motivation talk presented at the SeRC Big Data Workshop in Sweden during spring 2016. It motivates the streaming paradigm and provides examples on Apache Flink.
Flink Forward Berlin 2017: Kostas Kloudas - Complex Event Processing with Fli...Flink Forward
Pattern matching over event streams is increasingly being employed in many areas including financial services and click stream analysis. Flink, as a true stream processing engine, emerges as a natural candidate for these usecases. In this talk, we will present FlinkCEP, a library for Complex Event Processing (CEP) based on Flink. At the conceptual level, we will see the different patterns the library can support, we will present the main building blocks we implemented to support them, and we will discuss possible future additions that will further enhance the coverage of the library. At the practical level, we will show how the integration of FlinkCEP with Flink allows the former to take advantage of Flink's rich ecosystem (e.g. connectors) and its stream processing capabilities, such as support for event-time processing, exactly-once state semantics, fault-tolerance, savepoints and high throughput.
As more and more organizations and individual users turn to Apache Flink for their streaming workloads, there is a bigger demand for additional functionality out-of-the-box. On one hand, there is demand for more low-level APIs that allow for more control, while on the other, users ask for more high-level additions that make the common cases easier to express. This talk will present the new concepts added to the Datastream API in Flink-1.2 and for the upcoming Flink-1.3 release that tried to consolidate the aforementioned goals. We will talk, among others, about the ProcessFunction, a new low level stream processing primitive that gives the user full control over how each event is processed and can register and react to timers, changes in the windowing logic that allow for more flexible windowing strategies, side outputs, and new features concerning the Flink connectors.
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
Batch and Streaming Data Processing and Vizualize 300Tb in 5 Seconds meetup on April 18th, 2016 (http://www.meetup.com/Big-things-are-happening-here/events/229532500)
Stream processing with Apache Flink - Maximilian Michels Data ArtisansEvention
Apache Flink is an open source platform for distributed stream and batch data processing. At its core, Flink is a streaming dataflow engine which provides data distribution, communication, and fault tolerance for distributed computations over data streams. On top of this core, APIs make it easy to develop distributed data analysis programs. Libraries for graph processing or machine learning provide convenient abstractions for solving large-scale problems. Apache Flink integrates with a multitude of other open source systems like Hadoop, databases, or message queues. Its streaming capabilities make it a perfect fit for traditional batch processing as well as state of the art stream processing.
The AMIDST toolbox is a software used for analysis of large-scale data sets using probabilistic machine learning models. AMIDST runs algorithms in a distributed fashion for learning and inference in a wide spectrum of latent variable models such as Gaussian mixtures, (probabilistic) principal component analysis, Hidden Markov Models, Kalman Filters, Latent Dirichlet Allocation, etc. This toolbox is able to perform Bayesian parameter learning on any user-defined probabilistic (graphical) model with billions of nodes using novel distributed message passing algorithms.
We give an overview of the AMIDST toolbox (Java open source), some details about the API and the integration with Flink, and an analysis of the scalability of our learning algorithms. All this in the context of a real use case scenario in the financial domain (BCC group), where the profile of millions of customers is analyzed using Flink and the Amazon Web Services.
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...Flink Forward
http://flink-forward.org/kb_sessions/amidst-toolbox-scalable-probabilistic-machine-learning-with-flink/
In this session we would like to present our AMIDST toolbox for analysis of large-scale data sets using probabilistic machine learning models. AMIDST runs algorithms in a distributed fashion for learning and
inference in a wide spectrum of latent variable models such as Gaussian mixtures, (probabilistic) principal component analysis, Hidden Markov Models, Kalman Filters, Latent Dirichlet Allocation, etc. This toolbox is
able to perform Bayesian parameter learning on any user-defined probabilistic (graphical) model with billions of nodes using novel distributed message passing algorithms.
We plan to give an overview of the AMIDST toolbox (Java open source), some details about the API and the integration with Flink, and an analysis of the scalability of our learning algorithms. All this in the context of a real use case scenario in the financial domain (BCC group), where the profile of millions of customers is analyzed using Flink and the Amazon Web Services.
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
“In Spark 2.0, we have extended DataFrames and Datasets to handle real time streaming data. This not only provides a single programming abstraction for batch and streaming data, it also brings support for event-time based processing, out-or-order/delayed data, sessionization and tight integration with non-streaming data sources and sinks. In this talk, I will take a deep dive into the concepts and the API and show how this simplifies building complex “Continuous Applications”.” - T.D.
Databricks Blog: "Structured Streaming In Apache Spark 2.0: A new high-level API for streaming"
https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html
// About the Presenter //
Tathagata Das is an Apache Spark Committer and a member of the PMC. He’s the lead developer behind Spark Streaming, and is currently employed at Databricks. Before Databricks, you could find him at the AMPLab of UC Berkeley, researching datacenter frameworks and networks with professors Scott Shenker and Ion Stoica.
Follow T.D. on -
Twitter: https://twitter.com/tathadas
LinkedIn: https://www.linkedin.com/in/tathadas
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overviewFlink Forward
Apache Beam lets you write data pipelines over unbounded, out-of-order, global-scale data that are portable across diverse backends including Apache Flink, Apache Apex, Apache Spark, and Google Cloud Dataflow. But not all use cases are pipelines of simple "map" and "combine" operations. Beam's new State API adds scalability and consistency to fine-grained stateful processing, all with Beam's usual portability. Examples of new use cases unlocked include: * Microservice-like streaming applications * Aggregations that aren't natural/efficient as an associative combiner * Fine control over retrieval and storage of intermediate values during aggregation * Output based on customized conditions, such as limiting to only "significant" changes in a learned model (resulting in potentially large cost savings in subsequent processing) This talk will introduce the new state and timer features in Beam and show how to use them to express common real-world use cases in a backend-agnostic manner.
El día 21 de Septiembre, tuvimos el placer de acoger en nuestras oficinas un Meetup impartido por nuestro compañero Paco Guerrero sobre la plataforma Apache Flink.
"Apache Flink es una plataforma open source de procesamiento en tiempo real, que está en auge al ofrecer características de las que otras tecnologías con las que compite no disponen, sin impacto en su rendimiento. En esta formación introduciremos la filosofía y motor de procesamiento que hace a Flink tan especial y potente. También recorreremos los pilares básicos que confirman a Flink como la plataforma de streaming más prometedora actualmente"
Describes some differences and similarities of Apache Flink and Apache Storm. Gives a introduction into Flink's compatibility layer that allows to run Storm topologies in Flink and to embed spouts and bolts in Flink streaming programs.
Time Series Analysis… using an Event Streaming Platformconfluent
Time Series Analysis… using an Event Streaming Platform, Mirko Kämpf, Solutions Architect, Confluent
Meetup Link: https://www.meetup.com/Apache-Kafka-Germany-Munich/events/272827528/
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
More Related Content
Similar to Feeding a Squirrel in Time---Windows in Flink
Cloud Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation. Cloud Dataflow frees you from operational tasks like resource management and performance optimization.
Dataflow - A Unified Model for Batch and Streaming Data ProcessingDoiT International
Batch and Streaming Data Processing and Vizualize 300Tb in 5 Seconds meetup on April 18th, 2016 (http://www.meetup.com/Big-things-are-happening-here/events/229532500)
Stream processing with Apache Flink - Maximilian Michels Data ArtisansEvention
Apache Flink is an open source platform for distributed stream and batch data processing. At its core, Flink is a streaming dataflow engine which provides data distribution, communication, and fault tolerance for distributed computations over data streams. On top of this core, APIs make it easy to develop distributed data analysis programs. Libraries for graph processing or machine learning provide convenient abstractions for solving large-scale problems. Apache Flink integrates with a multitude of other open source systems like Hadoop, databases, or message queues. Its streaming capabilities make it a perfect fit for traditional batch processing as well as state of the art stream processing.
The AMIDST toolbox is a software used for analysis of large-scale data sets using probabilistic machine learning models. AMIDST runs algorithms in a distributed fashion for learning and inference in a wide spectrum of latent variable models such as Gaussian mixtures, (probabilistic) principal component analysis, Hidden Markov Models, Kalman Filters, Latent Dirichlet Allocation, etc. This toolbox is able to perform Bayesian parameter learning on any user-defined probabilistic (graphical) model with billions of nodes using novel distributed message passing algorithms.
We give an overview of the AMIDST toolbox (Java open source), some details about the API and the integration with Flink, and an analysis of the scalability of our learning algorithms. All this in the context of a real use case scenario in the financial domain (BCC group), where the profile of millions of customers is analyzed using Flink and the Amazon Web Services.
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...Flink Forward
http://flink-forward.org/kb_sessions/amidst-toolbox-scalable-probabilistic-machine-learning-with-flink/
In this session we would like to present our AMIDST toolbox for analysis of large-scale data sets using probabilistic machine learning models. AMIDST runs algorithms in a distributed fashion for learning and
inference in a wide spectrum of latent variable models such as Gaussian mixtures, (probabilistic) principal component analysis, Hidden Markov Models, Kalman Filters, Latent Dirichlet Allocation, etc. This toolbox is
able to perform Bayesian parameter learning on any user-defined probabilistic (graphical) model with billions of nodes using novel distributed message passing algorithms.
We plan to give an overview of the AMIDST toolbox (Java open source), some details about the API and the integration with Flink, and an analysis of the scalability of our learning algorithms. All this in the context of a real use case scenario in the financial domain (BCC group), where the profile of millions of customers is analyzed using Flink and the Amazon Web Services.
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
“In Spark 2.0, we have extended DataFrames and Datasets to handle real time streaming data. This not only provides a single programming abstraction for batch and streaming data, it also brings support for event-time based processing, out-or-order/delayed data, sessionization and tight integration with non-streaming data sources and sinks. In this talk, I will take a deep dive into the concepts and the API and show how this simplifies building complex “Continuous Applications”.” - T.D.
Databricks Blog: "Structured Streaming In Apache Spark 2.0: A new high-level API for streaming"
https://databricks.com/blog/2016/07/28/structured-streaming-in-apache-spark.html
// About the Presenter //
Tathagata Das is an Apache Spark Committer and a member of the PMC. He’s the lead developer behind Spark Streaming, and is currently employed at Databricks. Before Databricks, you could find him at the AMPLab of UC Berkeley, researching datacenter frameworks and networks with professors Scott Shenker and Ion Stoica.
Follow T.D. on -
Twitter: https://twitter.com/tathadas
LinkedIn: https://www.linkedin.com/in/tathadas
Flink Forward SF 2017: Kenneth Knowles - Back to Sessions overviewFlink Forward
Apache Beam lets you write data pipelines over unbounded, out-of-order, global-scale data that are portable across diverse backends including Apache Flink, Apache Apex, Apache Spark, and Google Cloud Dataflow. But not all use cases are pipelines of simple "map" and "combine" operations. Beam's new State API adds scalability and consistency to fine-grained stateful processing, all with Beam's usual portability. Examples of new use cases unlocked include: * Microservice-like streaming applications * Aggregations that aren't natural/efficient as an associative combiner * Fine control over retrieval and storage of intermediate values during aggregation * Output based on customized conditions, such as limiting to only "significant" changes in a learned model (resulting in potentially large cost savings in subsequent processing) This talk will introduce the new state and timer features in Beam and show how to use them to express common real-world use cases in a backend-agnostic manner.
El día 21 de Septiembre, tuvimos el placer de acoger en nuestras oficinas un Meetup impartido por nuestro compañero Paco Guerrero sobre la plataforma Apache Flink.
"Apache Flink es una plataforma open source de procesamiento en tiempo real, que está en auge al ofrecer características de las que otras tecnologías con las que compite no disponen, sin impacto en su rendimiento. En esta formación introduciremos la filosofía y motor de procesamiento que hace a Flink tan especial y potente. También recorreremos los pilares básicos que confirman a Flink como la plataforma de streaming más prometedora actualmente"
Describes some differences and similarities of Apache Flink and Apache Storm. Gives a introduction into Flink's compatibility layer that allows to run Storm topologies in Flink and to embed spouts and bolts in Flink streaming programs.
Time Series Analysis… using an Event Streaming Platformconfluent
Time Series Analysis… using an Event Streaming Platform, Mirko Kämpf, Solutions Architect, Confluent
Meetup Link: https://www.meetup.com/Apache-Kafka-Germany-Munich/events/272827528/
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
Advanced time series analysis (TSA) requires very special data preparation procedures to convert raw data into useful and compatible formats.
In this presentation you will see some typical processing patterns for time series based research, from simple statistics to reconstruction of correlation networks.
The first case is relevant for anomaly detection and to protect safety.
Reconstruction of graphs from time series data is a very useful technique to better understand complex systems like supply chains, material flows in factories, information flows within organizations, and especially in medical research.
With this motivation we will look at typical data aggregation patterns. We investigate how to apply analysis algorithms in the cloud. Finally we discuss a simple reference architecture for TSA on top of the Confluent Platform or Confluent cloud.
Similar to Feeding a Squirrel in Time---Windows in Flink (20)
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
In the ever-evolving landscape of technology, enterprise software development is undergoing a significant transformation. Traditional coding methods are being challenged by innovative no-code solutions, which promise to streamline and democratize the software development process.
This shift is particularly impactful for enterprises, which require robust, scalable, and efficient software to manage their operations. In this article, we will explore the various facets of enterprise software development with no-code solutions, examining their benefits, challenges, and the future potential they hold.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
1. dbisINSTITUT FÜR INFORMATIK
HUMBOLDT−UNIVERSITÄT ZU ERLINB
Feeding a Squirrel in Time—Windows in Flink
Apache Flink Meetup Munich
Matthias J. Sax
mjsax@{informatik.hu-berlin.de|apache.org}
@MatthiasJSax
Humboldt-Universit¨at zu Berlin
Department of Computer Science
November 11st
2015
2. –MatthiasJ.Sax–WindowsinApacheFlink
1/21
About Me
Ph. D. student in CS, DBIS Group, HU Berlin
involved in Stratosphere research project
working on data stream processing and optimization
Aeolus: build on top of Apache Storm
(https://github.com/mjsax/aeolus)
Committer at Apache Flink
3. –MatthiasJ.Sax–WindowsinApacheFlink
2/21
Stream Processing
Processing data in motion:
external sources create data constantly
data is pushed to the system
need to keep up with incoming data rate
usage of ingestion buffers (e. g., Apache Kafka)
handle data peaks
back pressure, dynamic scaling (or even load-shedding)
low processing latency (milliseconds)
no micro-batching
4. –MatthiasJ.Sax–WindowsinApacheFlink
3/21
Other Systems
Apache Storm
widely used in industry
different processing guarantees
no guarantee
at-least-once
exactly-once (not for external writes)
no ordering guarantees
no type system
dynamic scaling (to some extent)
some high-level abstractions using Trident
windows, state, exactly-once-processing
52. –MatthiasJ.Sax–WindowsinApacheFlink
14/21
Streaming Tradeoffs
Processing Time
no late data / no skew
windows are simple to build
low latency
inherently non-deterministic
Event Time (external)
late data / skew
out-of-order data (windowing more difficult)
simpler to reason about semantics (deterministic)
increased latency
53. –MatthiasJ.Sax–WindowsinApacheFlink
14/21
Streaming Tradeoffs
Processing Time
no late data / no skew
windows are simple to build
low latency
inherently non-deterministic
Event Time (external)
late data / skew
out-of-order data (windowing more difficult)
simpler to reason about semantics (deterministic)
increased latency
Event Time (ingestion)
no late data / no skew
no out-of-order
simplified watermarking
56. –MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alternatives : ProcessingTime / IngestionTime
env. setStreamTimeCharacteristic (
TimeCharacteristic .EventTime );
DataStream <Tuple > input = ...
input. assignTimestamps (
57. –MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alternatives : ProcessingTime / IngestionTime
env. setStreamTimeCharacteristic (
TimeCharacteristic .EventTime );
DataStream <Tuple > input = ...
input. assignTimestamps (new TimestampExtractor <Tuple > {
58. –MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alternatives : ProcessingTime / IngestionTime
env. setStreamTimeCharacteristic (
TimeCharacteristic .EventTime );
DataStream <Tuple > input = ...
input. assignTimestamps (new TimestampExtractor <Tuple > {
public long extractTimestamp (Tuple element ,
long currentTimestamp ) {
return /* extract from element */;
}
59. –MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alternatives : ProcessingTime / IngestionTime
env. setStreamTimeCharacteristic (
TimeCharacteristic .EventTime );
DataStream <Tuple > input = ...
input. assignTimestamps (new TimestampExtractor <Tuple > {
public long extractTimestamp (Tuple element ,
long currentTimestamp ) {
return /* extract from element */;
}
public long extractWatermark (Tuple element ,
long currentTimestamp ) {
return /* extract from element */;
}
60. –MatthiasJ.Sax–WindowsinApacheFlink
15/21
Time Based Windows
Timestamp Example
StreamExecutionEnviroment env = ...
// alternatives : ProcessingTime / IngestionTime
env. setStreamTimeCharacteristic (
TimeCharacteristic .EventTime );
DataStream <Tuple > input = ...
input. assignTimestamps (new TimestampExtractor <Tuple > {
public long extractTimestamp (Tuple element ,
long currentTimestamp ) {
return /* extract from element */;
}
public long extractWatermark (Tuple element ,
long currentTimestamp ) {
return /* extract from element */;
}
public long getCurrentWatermark () {
return Long.MIN_VALUE;
}
});
61. –MatthiasJ.Sax–WindowsinApacheFlink
16/21
Time Based Windows (cont.)
Sliding Time Window Example
DataStream <...> input = ...
input.keyBy (...)
// size = 5s; slide = 1s
.timeWindow(Time.of(5, TimeUnit.SECONDS),
Time.of(1, TimeUnit.SECONDS ))
.reduce (...);
62. –MatthiasJ.Sax–WindowsinApacheFlink
16/21
Time Based Windows (cont.)
Sliding Time Window Example
DataStream <...> input = ...
input.keyBy (...)
// size = 5s; slide = 1s
.timeWindow(Time.of(5, TimeUnit.SECONDS),
Time.of(1, TimeUnit.SECONDS ))
.reduce (...);
General Window Example
DataStream <...> input = ...
input.keyBy (...)
.window (...)
.apply(new WindowsFunction <... >() {
// ...
});
65. –MatthiasJ.Sax–WindowsinApacheFlink
17/21
Advanced Windowing Concepts
global windows (non-parallelized)
Triggers:
closes a window (i. e., fires)
processing time
watermark
count
delta
... (with different discarding strategies)
Evict:
removes tuple from window before function gets applied
time, count, delta
66. –MatthiasJ.Sax–WindowsinApacheFlink
17/21
Advanced Windowing Concepts
global windows (non-parallelized)
Triggers:
closes a window (i. e., fires)
processing time
watermark
count
delta
... (with different discarding strategies)
Evict:
removes tuple from window before function gets applied
time, count, delta
mix different windows/triggers/evictors
67. –MatthiasJ.Sax–WindowsinApacheFlink
18/21
Stateful Stream Processing
Flink can handle arbitrary user state:
state is store reliably
distributed snapshots algorithm
Example
public class CounterSum
implements RichReduceFunction <Long > {
private OperatorState <Long > counter;
public void open( Configuration config) {
counter = getRuntimeContext ()
. getOperatorState ("myCnt", Long.class , 0L);
}
public Long reduce(Long v1 , Long v2) throws Exception {
counter.update(counter.value () + 1);
return v1 + v2;
}
}
71. –MatthiasJ.Sax–WindowsinApacheFlink
20/21
Summary (cont.)
Flink provides a rich API (Java/Scala) to express different
semantics
state handling for arbitrary UDF code
fault-tolerance with exactly-once guarantees
exaclty-once sink available
What else?
Python API is coming (right now DataSet only)
Google Dataflow on Flink
Storm on Flink
Apache SAMOA on Flink