The benefits of fine-grained synchronization in deterministic and efficient ...Vincenzo Gulisano
This talk, given by Vincenzo Gulisano and Yiannis Nikolakopoulos at Yahoo! discusses some of their latest research results in the field of deterministic and efficient parallelization of data streaming operators. It also present ScaleGate, the abstract data type at the core of their research and whose java-based lock-free implementation is available at https://github.com/dcs-chalmers/ScaleGate_Java
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinVincenzo Gulisano
This is the presentation of the paper "ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join", presented by Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou and Philippas Tsigas at the IEEE Big Data conference held in Santa Clara, 2015.
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
Invited lecture at the University of Trieste.
The lecture covers (briefly) the data streaming processing paradigm, research challenges related to distributed, parallel and deterministic streaming analysis and the research of the DCS (Distributed Computing and Systems) groups at Chalmers University of Technology.
Tutorial: The Role of Event-Time Analysis Order in Data StreamingVincenzo Gulisano
Slides for our tutorial, titled “The Role of Event-Time Analysis Order in Data Streaming”, presented at the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS) conference. We have recorded the tutorial, and you can find the videos at the following links:
Part 1: https://youtu.be/SW_WS6ULsdY
Part 2: https://youtu.be/bq3ECNvPwOU
You can find this slides, as well as the code examples, at https://github.com/vincenzo-gulisano/debs2020_tutorial_event_time and at SlideS
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
These are the slides I used for a crash course (4 hours) on data streaming. It contains both theory / research aspects as well as examples based on Apache Flink (DataStream API)
Slides for my Associate Professor (oavlönad docent) lecture.
The lecture is about Data Streaming (its evolution and basic concepts) and also contains an overview of my research.
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
These are the slides for the lecture I gave at the EBSIS Summer School about data streaming and its challenges and trade-offs for data analysis in Fog architectures.
Presentation for the Softskills Seminar course @ Telecom ParisTech. Topic is the paper by Domings Hulten "Mining high speed data streams". Presented by me the 30/11/2017
The benefits of fine-grained synchronization in deterministic and efficient ...Vincenzo Gulisano
This talk, given by Vincenzo Gulisano and Yiannis Nikolakopoulos at Yahoo! discusses some of their latest research results in the field of deterministic and efficient parallelization of data streaming operators. It also present ScaleGate, the abstract data type at the core of their research and whose java-based lock-free implementation is available at https://github.com/dcs-chalmers/ScaleGate_Java
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinVincenzo Gulisano
This is the presentation of the paper "ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join", presented by Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou and Philippas Tsigas at the IEEE Big Data conference held in Santa Clara, 2015.
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
Invited lecture at the University of Trieste.
The lecture covers (briefly) the data streaming processing paradigm, research challenges related to distributed, parallel and deterministic streaming analysis and the research of the DCS (Distributed Computing and Systems) groups at Chalmers University of Technology.
Tutorial: The Role of Event-Time Analysis Order in Data StreamingVincenzo Gulisano
Slides for our tutorial, titled “The Role of Event-Time Analysis Order in Data Streaming”, presented at the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS) conference. We have recorded the tutorial, and you can find the videos at the following links:
Part 1: https://youtu.be/SW_WS6ULsdY
Part 2: https://youtu.be/bq3ECNvPwOU
You can find this slides, as well as the code examples, at https://github.com/vincenzo-gulisano/debs2020_tutorial_event_time and at SlideS
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
These are the slides I used for a crash course (4 hours) on data streaming. It contains both theory / research aspects as well as examples based on Apache Flink (DataStream API)
Slides for my Associate Professor (oavlönad docent) lecture.
The lecture is about Data Streaming (its evolution and basic concepts) and also contains an overview of my research.
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
These are the slides for the lecture I gave at the EBSIS Summer School about data streaming and its challenges and trade-offs for data analysis in Fog architectures.
Presentation for the Softskills Seminar course @ Telecom ParisTech. Topic is the paper by Domings Hulten "Mining high speed data streams". Presented by me the 30/11/2017
This is a talk given by Badrish Chandramouli at Portland State University on May 30, 2017, and overviews his recent and ongoing research directions in the space of stream processing and big data analytics.
Il tempo vola: rappresentare e manipolare sequenze di eventi e time series co...Codemotion
Rappresentare lo scorrere del tempo non è un'impresa semplice, specialmente con strumenti "tradizionali". Purtroppo però la dimensione temporale è fondamentale in mille contesti diversi, dall'analisi statistica alla rappresentazione dei rapporti di causa-effetto, dal forecasting al controllo automatico. In questo talk vedremo come utilizzare al meglio OrientDB, un Document-Graph Database, per il salvataggio, l'elaborazione e l'interrogazione di questo tipo di informazioni.
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2rtxaMm.
Tyler Akidau explores the relationship between the Beam Model and stream & table theory. He explains what is required to provide robust stream processing support in SQL and discusses concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, compare to other offerings such as Apache Kafka’s KSQL and Apache Spark’s Structured streaming. Filmed at qconlondon.com.
Tyler Akidau is a senior staff software engineer at Google, where he is the technical lead for the Data Processing Languages & Systems group, responsible for Google's Apache Beam efforts, Google Cloud Dataflow, and internal data processing tools like Google Flume, MapReduce, and MillWheel. His also a founding member of the Apache Beam PMC.
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.
The presentation deals with the Importance of resilience in transportation systems: factors that influence its relevance, the trade-off between robustness and efficiency, and the relation of resilience and evacuation management.
High-Performance Analysis of Streaming GraphsJason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STING (Spatio-Temporal Interaction Networks and Graphs).
Building Conclave: a decentralized, real-time collaborative text editorSun-Li Beatteay
Conclave is an Open Source real time, collaborative text editor for the browser.
I worked in a remote, three person team to:
- Design and build a custom CRDT (conflict-free replicated data type) to increase the throughput speed of operations by over 1000% and guarantee consistency across all users.
- Reduce network latency by utilizing WebRTC to create a distributed, peer-to-peer architecture by upto 3000%.
- Implement a load-balancing algorithm to scale the application to dozens of concurrent users
- Built a Version Vector to guarantee causality and merge non-commutative operations.
- Give users complete control over their content by removing the need for a central data store and allowing users to download their content directly to their computer.
- Write an extensive case study (http://bit.ly/conclave-site) and Medium article (http://bit.ly/conclave-post) that has garnered more than 20K views.
Traffic Modeling for Aggregated Periodic IoT DataTobias Hoßfeld
The prevalence of IoT is driven by industrial requirements and scales, but also by community curiosity and tinkering in participatory crowdsensing endeavours. This tutorial first explores the practical requirements and options of modern IoT appliances and projects, including all aspects of the diverse stack, from PHY to application. With that as base, traffic models can now be derived and evaluated for these IoT topologies that might provide a better fit than traditional approaches.
The slides discuss the second part dedicated to traffic modeling for Aggregated Periodic IoT Data.
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Paolo Missier
Much of the knowledge produced through data-intensive computations is liable to decay over time, as the underlying data drifts, and the algorithms, tools, and external data sources used for processing change and evolve. Your genome, for example, does not change over time, but our understanding of it does. How often should be look back at it, in the hope to gain new insight e.g. into genetic diseases, and how much does that cost when you scale re-analysis to an entire population?
The "total cost of ownership” of knowledge derived from data (TCO-DK) includes the cost of refreshing the knowledge over time in addition to the initial analysis, but is often not a primary consideration.
The ReComp project aims to provide models, algorithms, and tools to help humans understand TCO-DK, i.e., the nature and impact of changes in data, and assess the cost and benefits of knowledge refresh.
In this talk we try and map the scope of ReComp, by giving a number of patterns that cover typical analytics scenarios where re-computation is appropriate. We specifically describe two such scenarios, where we are conducting small scale, proof-of-concept ReComp experiments to help us sketch the general ReComp architecture. This initial exercise reveals a multiplicity of problems and research challenges, which will inform the rest of the project
Processing data from social media streams and sensors in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination of both will bring the most value.
These slides were designed for Apache Hadoop + Apache Apex workshop (University program).
Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.
I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.
Advanced users/experts may not find this relevant.
Real-Time Event & Stream Processing on MS AzureKhalid Salama
These slides discuss the main concepts of event & stream processing, as well as the related technologies on Microsoft Azure. We start by giving and overview of what Event & Stream Processing is. Then we describe the canonical architecture of a Stream Processing solution. We will delve into Message Queuing part of the solution. After that, we Introduce Apache Storm on HDInsight, as well as Azure Stream Analytics. We compare Apache Storm to Azure Stream Analytics, and finally conclude with useful resources
This is a talk given by Badrish Chandramouli at Portland State University on May 30, 2017, and overviews his recent and ongoing research directions in the space of stream processing and big data analytics.
Il tempo vola: rappresentare e manipolare sequenze di eventi e time series co...Codemotion
Rappresentare lo scorrere del tempo non è un'impresa semplice, specialmente con strumenti "tradizionali". Purtroppo però la dimensione temporale è fondamentale in mille contesti diversi, dall'analisi statistica alla rappresentazione dei rapporti di causa-effetto, dal forecasting al controllo automatico. In questo talk vedremo come utilizzare al meglio OrientDB, un Document-Graph Database, per il salvataggio, l'elaborazione e l'interrogazione di questo tipo di informazioni.
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2rtxaMm.
Tyler Akidau explores the relationship between the Beam Model and stream & table theory. He explains what is required to provide robust stream processing support in SQL and discusses concrete efforts that have been made in this area by the Apache Beam, Calcite, and Flink communities, compare to other offerings such as Apache Kafka’s KSQL and Apache Spark’s Structured streaming. Filmed at qconlondon.com.
Tyler Akidau is a senior staff software engineer at Google, where he is the technical lead for the Data Processing Languages & Systems group, responsible for Google's Apache Beam efforts, Google Cloud Dataflow, and internal data processing tools like Google Flume, MapReduce, and MillWheel. His also a founding member of the Apache Beam PMC.
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.
The presentation deals with the Importance of resilience in transportation systems: factors that influence its relevance, the trade-off between robustness and efficiency, and the relation of resilience and evacuation management.
High-Performance Analysis of Streaming GraphsJason Riedy
Graph-structured data in social networks, finance, network security, and others not only are massive but also under continual change. These changes often are scattered across the graph. Stopping the world to run a single, static query is infeasible. Repeating complex global analyses on massive snapshots to capture only what has changed is inefficient. We discuss requirements for single-shot queries on changing graphs as well as recent high-performance algorithms that update rather than recompute results. These algorithms are incorporated into our software framework for streaming graph analysis, STING (Spatio-Temporal Interaction Networks and Graphs).
Building Conclave: a decentralized, real-time collaborative text editorSun-Li Beatteay
Conclave is an Open Source real time, collaborative text editor for the browser.
I worked in a remote, three person team to:
- Design and build a custom CRDT (conflict-free replicated data type) to increase the throughput speed of operations by over 1000% and guarantee consistency across all users.
- Reduce network latency by utilizing WebRTC to create a distributed, peer-to-peer architecture by upto 3000%.
- Implement a load-balancing algorithm to scale the application to dozens of concurrent users
- Built a Version Vector to guarantee causality and merge non-commutative operations.
- Give users complete control over their content by removing the need for a central data store and allowing users to download their content directly to their computer.
- Write an extensive case study (http://bit.ly/conclave-site) and Medium article (http://bit.ly/conclave-post) that has garnered more than 20K views.
Traffic Modeling for Aggregated Periodic IoT DataTobias Hoßfeld
The prevalence of IoT is driven by industrial requirements and scales, but also by community curiosity and tinkering in participatory crowdsensing endeavours. This tutorial first explores the practical requirements and options of modern IoT appliances and projects, including all aspects of the diverse stack, from PHY to application. With that as base, traffic models can now be derived and evaluated for these IoT topologies that might provide a better fit than traditional approaches.
The slides discuss the second part dedicated to traffic modeling for Aggregated Periodic IoT Data.
Your data won’t stay smart forever:exploring the temporal dimension of (big ...Paolo Missier
Much of the knowledge produced through data-intensive computations is liable to decay over time, as the underlying data drifts, and the algorithms, tools, and external data sources used for processing change and evolve. Your genome, for example, does not change over time, but our understanding of it does. How often should be look back at it, in the hope to gain new insight e.g. into genetic diseases, and how much does that cost when you scale re-analysis to an entire population?
The "total cost of ownership” of knowledge derived from data (TCO-DK) includes the cost of refreshing the knowledge over time in addition to the initial analysis, but is often not a primary consideration.
The ReComp project aims to provide models, algorithms, and tools to help humans understand TCO-DK, i.e., the nature and impact of changes in data, and assess the cost and benefits of knowledge refresh.
In this talk we try and map the scope of ReComp, by giving a number of patterns that cover typical analytics scenarios where re-computation is appropriate. We specifically describe two such scenarios, where we are conducting small scale, proof-of-concept ReComp experiments to help us sketch the general ReComp architecture. This initial exercise reveals a multiplicity of problems and research challenges, which will inform the rest of the project
Processing data from social media streams and sensors in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination of both will bring the most value.
These slides were designed for Apache Hadoop + Apache Apex workshop (University program).
Audience was mainly from third year engineering students from Computer, IT, Electronics and telecom disciplines.
I tried to keep it simple for beginners to understand. Some of the examples are using context from India. But, in general this would be good starting point for the beginners.
Advanced users/experts may not find this relevant.
Real-Time Event & Stream Processing on MS AzureKhalid Salama
These slides discuss the main concepts of event & stream processing, as well as the related technologies on Microsoft Azure. We start by giving and overview of what Event & Stream Processing is. Then we describe the canonical architecture of a Stream Processing solution. We will delve into Message Queuing part of the solution. After that, we Introduce Apache Storm on HDInsight, as well as Azure Stream Analytics. We compare Apache Storm to Azure Stream Analytics, and finally conclude with useful resources
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of continuously changing data in real time? The answer is stream processing, and one system that has become a core hub for streaming data is Apache Kafka.
This presentation will give a brief introduction to Apache Kafka and describe its usage as a platform for streaming data. It will explain how Kafka serves as a foundation for both streaming data pipelines and applications that consume and process real-time data streams. It will introduce some of the newer components of Kafka that help make this possible, including Kafka Connect, a framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library.
This is talk 1 out of 6 from the Kafka Talk Series.
http://www.confluent.io/apache-kafka-talk-series/introduction-to-stream-processing-with-apache-kafka
Real Time Analytics with Apache Cassandra - Cassandra Day MunichGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
This tutorial was presented in KDD 2016 conference in San Francisco, CA. You can find the main presentation at http://www.slideshare.net/NeeraAgarwal2/streaming-analytics
Real-time Stream Processing with Apache Flink @ Hadoop SummitGyula Fóra
Apache Flink is an open source project that offers both batch and stream processing on top of a common runtime and exposing a common API. This talk focuses on the stream processing capabilities of Flink.
RBea: Scalable Real-Time Analytics at KingGyula Fóra
This talk introduces RBEA (Rule-Based Event Aggregator), the scalable real-time analytics platform developed by King’s Streaming Platform team. We have built RBEA to make real-time analytics easily accessible to game teams across King without having to worry about operational details. RBEA is built on top of Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. We will talk about how we have built a high-level DSL on the abstractions provided by Flink and how we tackled different technical challenges that have come up while developing the system.
Real Time Analytics with Apache Cassandra - Cassandra Day BerlinGuido Schmutz
Time series data is everywhere: IoT, sensor data or financial transactions. The industry has moved to databases like Cassandra to handle the high velocity and high volume of data that is now common place. In this talk I will present how we have used Cassandra to store time series data. I will highlight both the Cassandra data model as well as the architecture we put in place for collecting and ingesting data into Cassandra, using Apache Kafka and Apache Storm.
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
Distributed stream processing is one of the hot topics in big data analytics today. An increasing number of applications are shifting from traditional static data sources to processing the incoming data in real-time. Performing large scale stream processing or analysis requires specialized tools and techniques which have become publicly available in the last couple of years.
This talk will give a deep, technical overview of the top-level Apache stream processing landscape. We compare several frameworks including Spark, Storm, Samza and Flink. Our goal is to highlight the strengths and weaknesses of the individual systems in a project-neutral manner to help selecting the best tools for the specific applications. We will touch on the topics of API expressivity, runtime architecture, performance, fault-tolerance and strong use-cases for the individual frameworks.
Real-time analytics as a service at King Gyula Fóra
This talk introduces RBea, our scalable real-time analytics platform at King built on top of Apache Flink. The design goal of RBea is to make stream analytics easily accessible to game teams across King. RBea is powered by Apache Flink and uses the framework’s capabilities to it’s full potential in order to provide highly scalable stateful and windowed processing logic for the analytics applications. RBea provides a high-level scripting DSL that is more approachable to developers without stream-processing experience and uses code-generation to execute user-scripts efficiently at scale.
In this talk I will cover the technical details of the RBea architecture and will also look at what real-time analytics brings to the table from the business perspective. If time permits I will also give some outlook on our future plans to generalise and further grow the platform.
Let Spark Fly: Advantages and Use Cases for Spark on HadoopMapR Technologies
http://bit.ly/1BTaXZP – Apache Spark is currently one of the most active projects in the Hadoop ecosystem, and as such, there’s been plenty of hype about it in recent months, but how much of the discussion is marketing spin? And what are the facts? MapR and Databricks, the company that created and led the development of the Spark stack, will cut through the noise to uncover practical advantages for having the full set of Spark technologies at your disposal and reveal the benefits for running Spark on Hadoop
This presentation was given at a webinar hosted by Data Science Central and co-presented by MapR + Databricks.
To see the webinar, please go to: http://www.datasciencecentral.com/video/let-spark-fly-advantages-and-use-cases-for-spark-on-hadoop
This presentation examines some of the top stream analytic platforms in the enterprise. The slide deck explores the characteristics of enterprise stream analytic solutions and discusses the capabilties of some of the top stream analytic platform in the current market.
Reliable Data Intestion in BigData / IoTGuido Schmutz
Many of the Big Data and IoT use cases are based on combing data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Till Rohrmann
Talk which I gave at the first Apache Flink Meetup in Paris on the 29th of October.
It gives an introduction into Apache Flink's streaming and batch API. Furthermore, it is explained how Flink jobs are deployed. Flink's checkpointing mechanism is presented which gives exactly-once processing guarantees.
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
Tinder’s Quickfire Pipeline powers all things data at Tinder. It was originally built using AWS Kinesis Firehoses and has since been extended to use both Kafka and other event buses. It is the core of Tinder’s data infrastructure. This rich data flow of both client and backend data has been extended to service a variety of needs at Tinder, including Experimentation, ML, CRM, and Observability, allowing backend developers easier access to shared client side data. We perform this using many systems, including Kafka, Spark, Flink, Kubernetes, and Prometheus. Many of Tinder’s systems were natively designed in an RPC first architecture.
Things we’ll discuss decoupling your system at scale via event-driven architectures include:
– Powering ML, backend, observability, and analytical applications at scale, including an end to end walk through of our processes that allow non-programmers to write and deploy event-driven data flows.
– Show end to end the usage of dynamic event processing that creates other stream processes, via a dynamic control plane topology pattern and broadcasted state pattern
– How to manage the unavailability of cached data that would normally come from repeated API calls for data that’s being backfilled into Kafka, all online! (and why this is not necessarily a “good” idea)
– Integrating common OSS frameworks and libraries like Kafka Streams, Flink, Spark and friends to encourage the best design patterns for developers coming from traditional service oriented architectures, including pitfalls and lessons learned along the way.
– Why and how to avoid overloading microservices with excessive RPC calls from event-driven streaming systems
– Best practices in common data flow patterns, such as shared state via RocksDB + Kafka Streams as well as the complementary tools in the Apache Ecosystem.
– The simplicity and power of streaming SQL with microservices
Introduction of streaming data, difference between batch processing and stream processing, Research issues in streaming data processing, Performance evaluation metrics , tools for stream processing.
This presentation describes a intelligent IT monitoring solution that uses Nagios as source of information, Esper as the CEP engine and a PCA algorithm.
Presentation by Steffen Zeuch, Researcher at German Research Center for Artificial Intelligence (DFKI) and Post-Doc at TU Berlin (Germany), at the FogGuru Boot Camp training in September 2018.
Keynote talk at the International Conference on Supercoming 2009, at IBM Yorktown in New York. This is a major update of a talk first given in New Zealand last January. The abstract follows.
The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.
From Simulation to Online Gaming: the need for adaptive solutions Gabriele D'Angelo
In many fields such as distributed simulation and online gaming the missing piece is adaptivity. There is a strong need for dynamic and adaptive solutions that can improve performances and react to problems.
Imagine that self-driving cars now exist and are becoming widespread around the world. To facilitate the transition, it's necessary to set up central service to monitor traffic conditions nationwide, deploy sensors throughout the interstate system that monitor traffic conditions including car speeds, pavement and weather conditions, as well as accidents, construction, and other sources of traffic tie ups.
MongoDB has been selected as the database for this application. In this webinar, we will walk through designing the application’s schema that will both support the high update and read volumes as well as the data aggregation and analytics queries.
Similar to Data Streaming (in a Nutshell) ... and Spark's window operations (20)
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
5. At our research team:
Research expertise & projects
Cyber
Security
Efficient
parallel &
stream
computing
Distributed
systems
IoT &Sensor
Networks
5
6. Agenda
• Who am I?
• Introduction
– Motivation
– System Model
• Spark’s window operations
• References
6
7. Motivation
• Since the year 2000, applications such as:
– Sensor networks
– Network Traffic Analysis
– Financial tickers
– Transaction Log Analysis
– Fraud Detection
• Require:
– Continuous processing of data streams
– Real Time Fashion
7
8. Motivation
• Relying 100% on store and process (i.e., DBs) is not feasible
– high-speed networks, nanoseconds to handle a packet
– ISP router: gigabytes of headers every hour,…
• Data Streaming:
– In memory
– Bounded resources
– Efficient one-pass analysis
8
9. Main Memory
Motivation
• DBMS vs. DSMS
Disk
1 Data
Query Processing
3 Query
results
2 Query
Main Memory
Query Processing
Continuous
Query
Data
Query
results
9
What about
?
10. 10
Stonebraker, Michael, Uǧur Çetintemel and Stan Zdonik. The 8
requirements of real-time stream processing. (2005)
1. Keep the data moving
2. Query interface, e.g., extended SQL
3. Handle imperfections
4. Generate predictable outcomes
5. Integrate stored and streaming data
6. Guarantee data safety and availability
7. Partition and scale applications automatically
8. Process and respond instantaneously
11. System Model
• Data Stream: unbounded sequence of tuples
– Example: Call Description Record (CDR)
time
Field Field
Caller text
Callee text
Time (secs) int
Price (€) double
A B 8:00 3 C D 8:20 7 A E 8:35 6
11
13. Stateless Operators
Map: transform tuples schema
Example: convert price € $
Filter: discard / route tuples
Example: route depending on price
Union: merge multiple streams
(sharing the same schema)
Example: merge CDRs from
different sources
System Model
13
Map
Filter
Union
…
…
14. Stateful Operators
Aggregate: compute aggregate
functions (group-by)
Example: compute avg. call duration
Join: match tuples from 2 streams
(equality predicate)
Example: match CDRs with prices in the
same range
System Model
14
Aggregate
Join2
15. System Model
• Continuous Query: graph operators/streams
Convert
€ $
Only
> 10$
Count calls
made by each
Caller number
Map Filter Agg
15
Field
Caller
Callee
Time (secs)
Price (€)
Field
Caller
Callee
Time (secs)
Price ($)
Field
Caller
Callee
Time (secs)
Price ($)
Field
Caller
Calls
Time (secs)
16. System Model
• Infinite sequence of tuples / bounded memory
windows
• Example: 1 hour windows
time
[8:00,9:00)
[8:20,9:20)
[8:40,9:40)
16
17. System Model
• Infinite sequence of tuples / bounded memory
windows
• Example: count tuples - 1 hour windows
time
[8:00,9:00)
8:05 8:15 8:22 8:45 9:05
Output: 4
17
[8:20,9:20)
What about
out-of-order tuples?
18. Agenda
• Who am I?
• Introduction
– Motivation
– System Model
• Spark’s window operations
• References
18
20. 20
Spark’s window operations
(source: http://spark.apache.org/docs/latest/streaming-programming-guide.html)
// Reduce function adding two integers, defined separately for clarity
Function2<Integer, Integer, Integer> reduceFunc = new Function2<Integer, Integer, Integer>() {
@Override public Integer call(Integer i1, Integer i2) {
return i1 + i2;
}
};
// Reduce last 30 seconds of data, every 10 seconds JavaPairDStream<String, Integer>
windowedWordCounts = pairs.reduceByKeyAndWindow(reduceFunc, Durations.seconds(30), Durations.seconds(10));
# Reduce last 30 seconds of data, every 10 seconds windowedWordCounts =
pairs.reduceByKeyAndWindow(lambda x, y: x + y, lambda x, y: x - y, 30, 10)
21. 21
Spark’s window operations
(source: http://spark.apache.org/docs/latest/streaming-programming-guide.html)
countByWindow(windowLength,slideInterval) Return a sliding window count of elements in the stream.
reduceByWindow(func, windowLength,slideInterval) Return a new single-element stream, created by aggregating
elements in the stream over a sliding interval using func. The
function should be associative so that it can be computed
correctly in parallel.
reduceByKeyAndWindow(func,windowLength,
slideInterval, [numTasks])
When called on a DStream of (K, V) pairs, returns a new
DStream of (K, V) pairs where the values for each key are
aggregated using the given reduce function func over batches in a
sliding window [...]
reduceByKeyAndWindow(func, invFunc,windowLength,
slideInterval, [numTasks])
A more efficient version of the
above reduceByKeyAndWindow() where the reduce value of
each window is calculated incrementally using the reduce values
of the previous window. This is done by reducing the new data
that enters the sliding window, and “inverse reducing” the old
data that leaves the window. An example would be that of
“adding” and “subtracting” counts of keys as the window slides.
However, it is applicable only to “invertible reduce functions”
[...]
22. Maintaining tuples or windows?
22
time
[8:00,9:00)
8:05 8:15 8:22 8:45 9:05
[8:20,9:20)
Maintain tuples
When the window shifts:
1. Remove contribution of stale tuples
2. Go on adding new incoming tuples
Need to maintain a
single window instance
Need to maintain all
the tuples (how many?)
23. Maintaining tuples or windows?
23
time
[8:00,9:00) – 3 (so far...)
8:05 8:15 8:22 8:45 9:05
[8:20,9:20) – 1 (so far...)
Maintain windows
When a tuple arrives:
1. Add its contribution to all the
windows it falls in
No need to maintain
tuples
Need to maintain all
windows to which each
tuple contributes to
24. Agenda
• Who am I?
• Introduction
– Motivation
– System Model
• Spark’s window operations
• References (non exhaustive list)
24
25. References (non exhaustive list)
Bed time reading about Data Streaming
1. Gulisano, Vincenzo. StreamCloud: An Elastic Parallel-Distributed Stream
Processing Engine. Ph.D. Thesis. Polytechnic University Madrid, 2012.
Shared-nothing parallelism / Elasticity
1. StreamCloud: A Large Scale Data Streaming System. Vincenzo Gulisano,
Ricardo Jimenez-Peris, Marta Patiño-Martinez, Patrick Valduriez. 30th
International Conference on Distributed Computing Systems (ICDCS) 2010
2. StreamCloud: An Elastic and Scalable Data Streaming System. Vincenzo
Gulisano, Ricardo Jimenez-Peris, Marta Patiño-Martinez, Claudio Soriente,
Patrick Valduriez. IEEE Transactions on Parallel and Distributed Processing
(TPDS)
25
26. References (non exhaustive list)
Shared-memory parallelism / fine-grained synchronization
1. ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join. Vincenzo Gulisano, Yiannis
Nikolakopoulos, Marina Papatriantafilou, Philippas Tsigas. IEEE International Conference on Big Data
(IEEE Big Data 2015)
2. DEBS Grand Challenge: Deterministic Real-Time Analytics of Geospatial Data Streams through ScaleGate
Objects. Vincenzo Gulisano, Yiannis Nikolakopoulos, Ivan Walulya, Marina Papatriantafilou, Philippas
Tsigas. The 9th ACM International Conference on Distributed Event-Based Systems (DEBS 2015)
3. Concurrent Data Structures for Efficient Streaming Aggregation (brief announcement). Daniel Cederman,
Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, Philippas Tsigas. The 26th Annual
ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) 2014
Streaming + Security / Privacy / Cyber-physical systems
1. Understanding the Data-Processing Challenges in Intelligent Vehicular Systems. Stefania Costache, Vincenzo
Gulisano, Marina Papatriantafilou. 2016 IEEE Intelligent Vehicles Symposium (IV16)
2. BES – Differentially Private and Distributed Event Aggregation in Advanced Metering
Infrastructures. Vincenzo Gulisano, Valentin Tudor, Magnus Almgren and Marina Papatriantafilou. 2nd
ACM Cyber-Physical System Security Workshop (CPSS 2016) [held in conjunction with ACM AsiaCCS’16],
2016.
3. METIS: a Two-Tier Intrusion Detection System for Advanced Metering Infrastructures. Vincenzo Gulisano,
Magnus Almgren, Marina Papatriantafilou. 10th International Conference on Security and Privacy in
Communication Networks (SecureComm) 2014
26
27. References (non exhaustive list)
• Motivation / System Model
1. Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues
in data stream systems. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART
symposium on Principles of database systems, PODS ’02, New York, NY, USA, 2002. ACM.
2. Michael Stonebraker, Uǧur Çetintemel, and Stan Zdonik. The 8 requirements of real-time stream
processing. SIGMOD Rec., 34(4), December 2005.
3. Nesime Tatbul. QoS-Driven load shedding on data streams. In Proceedings of the Workshops XMLDM,
MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers,
EDBT ’02, London, UK, UK, 2002. Springer-Verlag.
27
28. References (non exhaustive list)
• Centralized Stream Processing Engines
1. Arvind Arasu, Brian Babcock, Shivnath Babu, John Cieslewicz, Keith Ito, Rajeev Motwani, Utkarsh
Srivastava, and Jennifer Widom. Stream: The Stanford data stream management system. Springer, 2004.
2. Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL continuous query language: semantic
foundations and query execution. The VLDB Journal, 15(2), June 2006.
3. Daniel J. Abadi, Don Carney, Uǧur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee,
Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. Aurora: a new model and architecture for data
stream management. The VLDB Journal, 12(2), August 2003.
4. Nesime Tatbul and Stan Zdonik. Window-aware load shedding for aggregation queries over data
streams. In Proceedings of the 32nd international conference on Very large data bases, VLDB ’06.
VLDB Endowment, 2006.
28
29. References (non exhaustive list)
• Distributed Stream Processing Engines
1. Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Uǧur Çetintemel, Mitch Cherniack, Jeong-Hyon
Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and
Stanley B. Zdonik. The design of the borealis stream processing engine. In CIDR, pages 277–289, 2005.
2. Magdalena Balazinska, Hari Balakrishnan, Samuel R Madden, and Michael Stonebraker. Fault-tolerance
in the borealis distributed stream processing system. ACM Trans. Database Syst., 33(1), March 2008.
ACM ID: 1331907.
3. Philippe Bonnet, Johannes Gehrke, and Praveen Seshadri. Towards sensor database systems. In
Proceedings of the Second International Conference on Mobile Data Management, MDM ’01, London,
UK, UK, 2001. Springer-Verlag.
4. Jeong-hyon Hwang, Magdalena Balazinska, Alexander Rasin, Uǧur Çetintemel, Michael Stonebraker, and
Stan Zdonik. A comparison of stream-oriented high availability algorithms. Technical report, Brown CS,
2003.
5. Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin, Uǧur Çetintemel, Michael Stonebraker,
and Stan Zdonik. High-Availability algorithms for distributed stream processing. In Data Engineering,
International Conference on, volume 0, Los Alamitos, CA, USA, 2005. IEEE Computer Society.
29
30. References (non exhaustive list)
• Parallel Stream Processing Engines
1. Vincenzo Gulisano, Ricardo Jiménez-Peris, Marta Patiño-Martínez, and Patrick Valduriez. Streamcloud:
A large scale data streaming system. In ICDCS 2010: International Conference on Distributed
Computing Systems, pages 126–137, June 2010.
2. Mehul Shah Joseph, Joseph M. Hellerstein, Sirish Ch, and Michael J. Franklin. Flux: An adaptive
partitioning operator for continuous query systems. In In ICDE, 2002.
30
31. References (non exhaustive list)
• Elastic Stream Processing Engines
1. Vincenzo Gulisano, Ricardo Jimenez-Peris, Marta Patiño-Martinez, Claudio Soriente, and Patrick
Valduriez. Streamcloud: An elastic and scalable data streaming system. IEEE Transactions on Parallel
and Distributed Systems, 99(PrePrints), 2012.
2. Thomas Heinze. Elastic complex event processing. In Proceedings of the 8th Middleware Doctoral
Symposium, MDS ’11, New York, NY, USA, 2011. ACM.
3. Simon Loesing, Martin Hentschel, Tim Kraska, and Donald Kossmann. Stormy: an elastic and highly
available streaming service in the cloud. In Proceedings of the 2012 Joint EDBT/ICDT Workshops,
EDBT-ICDT ’12, New York, NY, USA, 2012. ACM.
4. Scott Schneider, Henrique Andrade, Bugra Gedik, Alain Biem, and Kun-Lung Wu. Elastic scaling of
data parallel operators in stream processing. In Proceedings of the 2009 IEEE International Symposium
on Parallel&Distributed Processing, IPDPS ’09, Washington, DC, USA, 2009. IEEE Computer Society.
31
Editor's Notes
These are the original definitions / evolved – modified over time
Interesting: one or two functions?
Hortoghonal thing: when to compute the final results
Hortoghonal thing: when to compute the final results