Development of a Distributed Stream Processing System (DSPS) in node.js and ZeroMQ and demonstration of an application of trending topics with a dataset from Twitter.
NS2 - the network simulator which is proved useful in studying the dynamic nature of communication networks. Simulation of wired as well as wireless network functions and protocols( e.g. routing algorithms, TCP, UDP ) can be done using NS2
Design Of A PI Rate Controller For Mitigating SIP OverloadYang Hong
Recent collapses of SIP servers in the carrier networks indicate that the built-in SIP overload control mechanism cannot mitigate overload effectively. In this paper, by employing a control-theoretic approach that models the interaction between an overloaded downstream server and its upstream server as a feedback control system, we investigate the root cause of SIP server crash by studying the impact of the retransmission on the queuing delay of the overloaded server. Then we design a PI rate controller to mitigate the overload by regulating the retransmission rate based on the round trip delay. We derive the guidelines for choosing PI controller gains to ensure the system stability. Our OPNET simulation results demonstrate that our proposed control theoretic approach can cancel the short-term SIP overload effectively, thus preventing widespread SIP network failure.
Survey on SIP overload control algorithms:
Y. Hong, C. Huang, and J. Yan, “A Comparative Study of SIP Overload Control Algorithms,” Network and Traffic Engineering in Emerging Distributed Computing Applications, Edited by J. Abawajy, M. Pathan, M. Rahman, A.K. Pathan, and M.M. Deris, IGI Global, 2012, pp. 1-20.
http://www.igi-global.com/chapter/comparative-study-sip-overload-control/67496
http://www.researchgate.net/publication/231609451_A_Comparative_Study_of_SIP_Overload_Control_Algorithms
NS2 - the network simulator which is proved useful in studying the dynamic nature of communication networks. Simulation of wired as well as wireless network functions and protocols( e.g. routing algorithms, TCP, UDP ) can be done using NS2
Design Of A PI Rate Controller For Mitigating SIP OverloadYang Hong
Recent collapses of SIP servers in the carrier networks indicate that the built-in SIP overload control mechanism cannot mitigate overload effectively. In this paper, by employing a control-theoretic approach that models the interaction between an overloaded downstream server and its upstream server as a feedback control system, we investigate the root cause of SIP server crash by studying the impact of the retransmission on the queuing delay of the overloaded server. Then we design a PI rate controller to mitigate the overload by regulating the retransmission rate based on the round trip delay. We derive the guidelines for choosing PI controller gains to ensure the system stability. Our OPNET simulation results demonstrate that our proposed control theoretic approach can cancel the short-term SIP overload effectively, thus preventing widespread SIP network failure.
Survey on SIP overload control algorithms:
Y. Hong, C. Huang, and J. Yan, “A Comparative Study of SIP Overload Control Algorithms,” Network and Traffic Engineering in Emerging Distributed Computing Applications, Edited by J. Abawajy, M. Pathan, M. Rahman, A.K. Pathan, and M.M. Deris, IGI Global, 2012, pp. 1-20.
http://www.igi-global.com/chapter/comparative-study-sip-overload-control/67496
http://www.researchgate.net/publication/231609451_A_Comparative_Study_of_SIP_Overload_Control_Algorithms
The present and future of serverless observability (QCon London)Yan Cui
As engineers, we’re empowered by advancements in cloud platforms to build ever more complex systems that can achieve amazing feats at a scale previously only possible for the elite few. The monitoring tools have evolved over the years to accommodate our growing needs with these increasingly complex systems, but the emergence of serverless technologies like AWS Lambda has shifted the landscape and broken some of the underlying assumptions that existing tools are built upon - eg. you can no longer access the underlying host to install monitoring agents/daemons, and it’s no longer feasible to use background threads to send monitoring data outside the critical path.
Furthermore, event-driven architectures has become easily accessible and widely adopted by those adopting serverless technologies, and this trend has added another layer of complexity with how we monitor and debug our systems as it involves tracing executions that flow through async invocations, and often fan’d-out and fan’d-in via various event processing patterns.
Join us in this talk as Yan Cui gives us an overview of the challenges with observing a serverless architecture (ephemerality, no access to host OS, no background thread for sending monitoring data, etc.), the tradeoffs to consider, and the state of the tooling for serverless observability.
The present and future of Serverless observabilityYan Cui
As engineers, we're empowered by advancements in cloud platforms to build ever more complex systems that can achieve amazing feats at a scale previously only possible for the elite few. The monitoring tools have evolved over the years to accommodate our growing needs with these increasingly complex systems, but the emergence of serverless technologies like AWS Lambda has shifted the landscape and broken some of the underlying assumptions that existing tools are built upon - eg. you can no longer access the underlying host to install monitoring agents/daemons, and it's no longer feasible to use background threads to send monitoring data outside the critical path.
Furthermore, event-driven architectures has become easily accessible and widely adopted by those adopting serverless technologies, and this trend has added another layer of complexity with how we monitor and debug our systems as it involves tracing executions that flow through async invocations, and often fan'd-out and fan'd-in via various event processing patterns.
Join us in this talk as serverless expert Yan Cui gives us an overview of the challenges with observing a serverless architecture, the tradeoffs to consider, the current state of the tooling for serverless observability and a sneak peek at some of the new and coming tools that will hopefully inform us what the future of serverless observability might look like.
The present and future of Serverless observabilityYan Cui
As engineers, we’re empowered by advancements in cloud platforms to build ever more complex systems that can achieve amazing feats at a scale previously only possible for the elite few. The monitoring tools have evolved over the years to accommodate our growing needs with these increasingly complex systems, but the emergence of serverless technologies like AWS Lambda has shifted the landscape and broken some of the underlying assumptions that existing tools are built upon - eg. you can no longer access the underlying host to install monitoring agents/daemons, and it’s no longer feasible to use background threads to send monitoring data outside the critical path.
Furthermore, event-driven architectures has become easily accessible and widely adopted by those adopting serverless technologies, and this trend has added another layer of complexity with how we monitor and debug our systems as it involves tracing executions that flow through async invocations, and often fan’d-out and fan’d-in via various event processing patterns.
Join us in this talk as Yan Cui gives us an overview of the challenges with observing a serverless architecture (ephemerality, no access to host OS, no background thread for sending monitoring data, etc.), the tradeoffs to consider, and the state of the tooling for serverless observability.
Tutorial: The Role of Event-Time Analysis Order in Data StreamingVincenzo Gulisano
Slides for our tutorial, titled “The Role of Event-Time Analysis Order in Data Streaming”, presented at the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS) conference. We have recorded the tutorial, and you can find the videos at the following links:
Part 1: https://youtu.be/SW_WS6ULsdY
Part 2: https://youtu.be/bq3ECNvPwOU
You can find this slides, as well as the code examples, at https://github.com/vincenzo-gulisano/debs2020_tutorial_event_time and at SlideS
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1yyaHb8.
The authors discuss Netflix's new stream processing system that supports a reactive programming model, allows auto scaling, and is capable of processing millions of messages per second. Filmed at qconsf.com.
Danny Yuan is an architect and software developer in Netflix’s Platform Engineering team. Justin Becker is Senior Software Engineer at Netflix.
Monitoring Clojure Applications with PrometheusJoachim Draeger
How do you know your Clojure web service is doing what it is supposed to do? In this talk I give a quick introduction to Metrics and Monitoring before showing how you can easily add Prometheus Metrics to Clojure applications. Just give it a try and fork: https://github.com/joachimdraeger/clojure-prometheus-demo
In this talk about Apache Flink we will touch on three main things, an introductory look at Flink, a look under the hood and a demo.
* In the introduction we will briefly look at the history of Flink and then go on to the API and different use cases. Here we will also see how it can be deployed in practice and what some of the pitfalls in a cluster setting can be.
* In the second section we will look at the streaming execution engine that lies at the heart of Flink. Here we will see what makes it tick and also what distinguishes it from other approaches, such as the mini-batch execution model.
Ufuk Celebi - PMC member at Apache Flink and co-founder and software engineer at data Artisans
* In the final section we will see a live demo of a fault-tolerant streaming job that performs analysis of the wikipedia edit-stream.
Intelligent Network Services through Active Flow ManipulationTal Lavian Ph.D.
Active Flow Manipulation Abstractions:
Aggregate data into traffic flows
Flows whose characteristics can be identified in real-time
E.g., “all UDP packets to a particular service”, “all TCP packets from a particular machine”.
Actions to be performed in the traffic flows
Actions that can be performed in real-time
E.g., “Change the priority of all traffic destined to a particular service on a particular machine”, “Stop all traffic out of a particular link of a router”.
Key Performance Findings Report.
The report focuses on the key indicators of the monitored case and significantly reduces the time required to analyze collected information and identify bottlenecks.
It features the areas that influence performance of the application the most, whether it is individual execution or cumulative impact on the system due to high frequency of
use, and/or areas that create abnormally high load on the system as a whole.
System Alerts - Alerts section lists events with execution time or resource utilization exceeding predefined thresholds.
Resources Utilization - The CPU graph shows total general CPU utilization by Java Virtual Machine (JVM), as well as consumption by the most demanding individual threads within JVM, as the percentage of the total server's
CPU.
Resources Utilization - The Memory graph shows memory utilization by the Java Virtual Machine. Memory consumption is expected to grow until it is released by the garbage collector (GC). Up and down fluctuations are normal and indicate that the system is healthy. However, behavior when memory drops to increasingly higher level and oscillations grow shorter, while top line lingers around maximal available memory, may indicate memory leak in the system, or simply higher demand that JVM can
provide. In such case, the execution of the application is constrained by total available memory and JVM might eventually crash.
User Experience - Additional factors that may impact user experience are network latency and browser-side processing time. The execution time shown on the graph represents minimum user wait time which could be
achieved when user has adequately equipped workstation and negligible network delays.
Heavy Methods - The section lists twenty methods with the longest execution time, along with the details pinpointing the reasons.
For each of the listed in the table methods, report offers additional supporting information about repeatedly executed sub-methods, methods with highest net execution time, methods causing the highest CPU utilization, total time spent on database queries and the Heaviest SQL queries.
Heavy Methods by Total Net - The section lists the methods with longest total net time, which are directly responsible for the duration of the execution.
Net time is calculated as full duration time minus full duration time of methods that are called directly from the method in question. The list could contain low-level methods, including I/O methods and other
event waits.
All Queries Totals - The section highlights portion of the overall execution time that was spend on database queries.
The data could be one of the quick indicators on whether the performance issues are database related.
Heavy Queries by Total Duration - While one execution of the query doesn't take long time to execute, the queries listed in this section were executed enough times to account for the significant total execution time.
A Practical Deep Dive into Observability of Streaming Applications with Kosta...HostedbyConfluent
"You build your streaming applications and event-driven microservices using Apache Kafka. Are your systems observable enough without depending only on the broker-side metrics and application logs? Can you track down the root cause during incidents, or do you hope everything will be fine after a restart? In this talk, Tim & Kosta will take you on their observability journey by sharing pitfalls and knowledge our team gained over the last couple of years.
We are going to answer questions like:
• Do you understand how to expose and use your client-side Kafka metrics?
• JMX, Metric interceptors, Micrometer where to start?
• Why is there a difference between the values of client-side and broker-side metrics?
• Learn how client-side consumer lag metrics can differ from the lag calculated on the cluster.
• What is the right way to use and interpret them?
• Can you measure latency through your complete stack using distributed tracing?
• OpenTelemetry, Jaeger & Zipkin, what to pick?
During a step-by-step demo, we will look into different real-life examples and scenarios to demonstrate how to bring the observability of your Kafka applications to the next level."
Who: Karthik Ramasamy (@karthikz)
Date: September 20, 2016
Event: #TwitterRealTime
This slide deck consists of presentations from various teams about Twitter's real time infrastructure, the components it uses, and how they function. It includes presentations from David Rusek (@davidrusek), Maosong Fu (@Louis_Fumaosong), Sandy Strong (@st5are), and Yimin Tan (@YiminTan_Kevin).
With Lakehouse as the future of data architecture, Delta becomes the de facto data storage format for all the data pipelines. By using delta, to build the curated data lakes, users achieve efficiency and reliability end-to-end. Curated data lakes involve multiple hops in the end-to-end data pipeline, which are executed regularly (mostly daily) depending on the need. As data travels through each hop, its quality improves and becomes suitable for end-user consumption. On the other hand real-time capabilities are key for any business and an added advantage, luckily Delta has seamless integration with structured streaming which makes it easy for users to achieve real-time capability using Delta. Overall, Delta Lake as a streaming source is a marriage made in heaven for various reasons and we are already seeing the rise in adoption among our users.
In this talk, we will discuss various functional components of structured streaming with Delta as a streaming source. Deep dive into Query Progress Logs(QPL) and their significance for operating streams in production. How to track the progress of any streaming job and map it with the source Delta table using QPL. What exactly gets persisted in the checkpoint directory and its details. Mapping the contents of the checkpoint directory with the QPL metrics and understanding the significance of contents in the checkpoint directory with respect to Delta streams.
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleSean Zhong
Gearpump is a Akka based realtime streaming engine, it use Actor to model everything. It has super performance and flexibility. It has performance of 18000000 messages/second and latency of 8ms on a cluster of 4 machines.
Network visibility and control using industry standard sFlow telemetrypphaal
• Find out about the sFlow instrumentation built into commodity data center network and server infrastructure.
• Understand how sFlow fits into the broader ecosystem of NetFlow, IPFIX, SNMP and DevOps monitoring technologies.
• Case studies demonstrate how sFlow telemetry combined with automation can lower costs, increase performance, and improve security of cloud infrastructure and applications.
Uma Arquitetura de Stream Processing e ETL Serverless na AWSMaycon Viana Bordin
Esta apresentação aborda a trajetória percorrida na implementação de uma arquitetura de stream processing e ETL serverless na AWS para ingestão, processamento e armazenamento de dados em tempo real e em micro-batch utilizando Kinesis, Lambda e S3.
Ela mostra os caminhos que levaram até a arquitetura atual, bem como os próximos passos na evolução de uma arquitetura serverless, trade-offs feitos na construção dela, e como essa infraestrutura se encaixa dentro do Data Lake como um todo (batch vs real-time).
Sendo o sistema operacional com maior presenc¸a entre os smartphones em todo o mundo, desenvolver aplicac¸oes para o Android tem se tor- ˜ nado cada vez mais interessante. Entretanto, para tirar o melhor proveito desta plataforma e importante conhecer como ela funciona internamente. Este ar- ´ tigo aborda os principais componentes que fazem parte do software stack do Android, comec¸ando pelo kernel Linux e a Dalvik VM ate os componentes prin- ´ cipais de uma aplicac¸ao. E mostra as estrat ˜ egias adotadas pela plataforma ´ para lidar com caracter´ısticas inerentes aos dispositivos moveis, como o tempo ´ de bateria e a baixa capacidade de memoria.
More Related Content
Similar to Development of a Distributed Stream Processing System
The present and future of serverless observability (QCon London)Yan Cui
As engineers, we’re empowered by advancements in cloud platforms to build ever more complex systems that can achieve amazing feats at a scale previously only possible for the elite few. The monitoring tools have evolved over the years to accommodate our growing needs with these increasingly complex systems, but the emergence of serverless technologies like AWS Lambda has shifted the landscape and broken some of the underlying assumptions that existing tools are built upon - eg. you can no longer access the underlying host to install monitoring agents/daemons, and it’s no longer feasible to use background threads to send monitoring data outside the critical path.
Furthermore, event-driven architectures has become easily accessible and widely adopted by those adopting serverless technologies, and this trend has added another layer of complexity with how we monitor and debug our systems as it involves tracing executions that flow through async invocations, and often fan’d-out and fan’d-in via various event processing patterns.
Join us in this talk as Yan Cui gives us an overview of the challenges with observing a serverless architecture (ephemerality, no access to host OS, no background thread for sending monitoring data, etc.), the tradeoffs to consider, and the state of the tooling for serverless observability.
The present and future of Serverless observabilityYan Cui
As engineers, we're empowered by advancements in cloud platforms to build ever more complex systems that can achieve amazing feats at a scale previously only possible for the elite few. The monitoring tools have evolved over the years to accommodate our growing needs with these increasingly complex systems, but the emergence of serverless technologies like AWS Lambda has shifted the landscape and broken some of the underlying assumptions that existing tools are built upon - eg. you can no longer access the underlying host to install monitoring agents/daemons, and it's no longer feasible to use background threads to send monitoring data outside the critical path.
Furthermore, event-driven architectures has become easily accessible and widely adopted by those adopting serverless technologies, and this trend has added another layer of complexity with how we monitor and debug our systems as it involves tracing executions that flow through async invocations, and often fan'd-out and fan'd-in via various event processing patterns.
Join us in this talk as serverless expert Yan Cui gives us an overview of the challenges with observing a serverless architecture, the tradeoffs to consider, the current state of the tooling for serverless observability and a sneak peek at some of the new and coming tools that will hopefully inform us what the future of serverless observability might look like.
The present and future of Serverless observabilityYan Cui
As engineers, we’re empowered by advancements in cloud platforms to build ever more complex systems that can achieve amazing feats at a scale previously only possible for the elite few. The monitoring tools have evolved over the years to accommodate our growing needs with these increasingly complex systems, but the emergence of serverless technologies like AWS Lambda has shifted the landscape and broken some of the underlying assumptions that existing tools are built upon - eg. you can no longer access the underlying host to install monitoring agents/daemons, and it’s no longer feasible to use background threads to send monitoring data outside the critical path.
Furthermore, event-driven architectures has become easily accessible and widely adopted by those adopting serverless technologies, and this trend has added another layer of complexity with how we monitor and debug our systems as it involves tracing executions that flow through async invocations, and often fan’d-out and fan’d-in via various event processing patterns.
Join us in this talk as Yan Cui gives us an overview of the challenges with observing a serverless architecture (ephemerality, no access to host OS, no background thread for sending monitoring data, etc.), the tradeoffs to consider, and the state of the tooling for serverless observability.
Tutorial: The Role of Event-Time Analysis Order in Data StreamingVincenzo Gulisano
Slides for our tutorial, titled “The Role of Event-Time Analysis Order in Data Streaming”, presented at the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS) conference. We have recorded the tutorial, and you can find the videos at the following links:
Part 1: https://youtu.be/SW_WS6ULsdY
Part 2: https://youtu.be/bq3ECNvPwOU
You can find this slides, as well as the code examples, at https://github.com/vincenzo-gulisano/debs2020_tutorial_event_time and at SlideS
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1yyaHb8.
The authors discuss Netflix's new stream processing system that supports a reactive programming model, allows auto scaling, and is capable of processing millions of messages per second. Filmed at qconsf.com.
Danny Yuan is an architect and software developer in Netflix’s Platform Engineering team. Justin Becker is Senior Software Engineer at Netflix.
Monitoring Clojure Applications with PrometheusJoachim Draeger
How do you know your Clojure web service is doing what it is supposed to do? In this talk I give a quick introduction to Metrics and Monitoring before showing how you can easily add Prometheus Metrics to Clojure applications. Just give it a try and fork: https://github.com/joachimdraeger/clojure-prometheus-demo
In this talk about Apache Flink we will touch on three main things, an introductory look at Flink, a look under the hood and a demo.
* In the introduction we will briefly look at the history of Flink and then go on to the API and different use cases. Here we will also see how it can be deployed in practice and what some of the pitfalls in a cluster setting can be.
* In the second section we will look at the streaming execution engine that lies at the heart of Flink. Here we will see what makes it tick and also what distinguishes it from other approaches, such as the mini-batch execution model.
Ufuk Celebi - PMC member at Apache Flink and co-founder and software engineer at data Artisans
* In the final section we will see a live demo of a fault-tolerant streaming job that performs analysis of the wikipedia edit-stream.
Intelligent Network Services through Active Flow ManipulationTal Lavian Ph.D.
Active Flow Manipulation Abstractions:
Aggregate data into traffic flows
Flows whose characteristics can be identified in real-time
E.g., “all UDP packets to a particular service”, “all TCP packets from a particular machine”.
Actions to be performed in the traffic flows
Actions that can be performed in real-time
E.g., “Change the priority of all traffic destined to a particular service on a particular machine”, “Stop all traffic out of a particular link of a router”.
Key Performance Findings Report.
The report focuses on the key indicators of the monitored case and significantly reduces the time required to analyze collected information and identify bottlenecks.
It features the areas that influence performance of the application the most, whether it is individual execution or cumulative impact on the system due to high frequency of
use, and/or areas that create abnormally high load on the system as a whole.
System Alerts - Alerts section lists events with execution time or resource utilization exceeding predefined thresholds.
Resources Utilization - The CPU graph shows total general CPU utilization by Java Virtual Machine (JVM), as well as consumption by the most demanding individual threads within JVM, as the percentage of the total server's
CPU.
Resources Utilization - The Memory graph shows memory utilization by the Java Virtual Machine. Memory consumption is expected to grow until it is released by the garbage collector (GC). Up and down fluctuations are normal and indicate that the system is healthy. However, behavior when memory drops to increasingly higher level and oscillations grow shorter, while top line lingers around maximal available memory, may indicate memory leak in the system, or simply higher demand that JVM can
provide. In such case, the execution of the application is constrained by total available memory and JVM might eventually crash.
User Experience - Additional factors that may impact user experience are network latency and browser-side processing time. The execution time shown on the graph represents minimum user wait time which could be
achieved when user has adequately equipped workstation and negligible network delays.
Heavy Methods - The section lists twenty methods with the longest execution time, along with the details pinpointing the reasons.
For each of the listed in the table methods, report offers additional supporting information about repeatedly executed sub-methods, methods with highest net execution time, methods causing the highest CPU utilization, total time spent on database queries and the Heaviest SQL queries.
Heavy Methods by Total Net - The section lists the methods with longest total net time, which are directly responsible for the duration of the execution.
Net time is calculated as full duration time minus full duration time of methods that are called directly from the method in question. The list could contain low-level methods, including I/O methods and other
event waits.
All Queries Totals - The section highlights portion of the overall execution time that was spend on database queries.
The data could be one of the quick indicators on whether the performance issues are database related.
Heavy Queries by Total Duration - While one execution of the query doesn't take long time to execute, the queries listed in this section were executed enough times to account for the significant total execution time.
A Practical Deep Dive into Observability of Streaming Applications with Kosta...HostedbyConfluent
"You build your streaming applications and event-driven microservices using Apache Kafka. Are your systems observable enough without depending only on the broker-side metrics and application logs? Can you track down the root cause during incidents, or do you hope everything will be fine after a restart? In this talk, Tim & Kosta will take you on their observability journey by sharing pitfalls and knowledge our team gained over the last couple of years.
We are going to answer questions like:
• Do you understand how to expose and use your client-side Kafka metrics?
• JMX, Metric interceptors, Micrometer where to start?
• Why is there a difference between the values of client-side and broker-side metrics?
• Learn how client-side consumer lag metrics can differ from the lag calculated on the cluster.
• What is the right way to use and interpret them?
• Can you measure latency through your complete stack using distributed tracing?
• OpenTelemetry, Jaeger & Zipkin, what to pick?
During a step-by-step demo, we will look into different real-life examples and scenarios to demonstrate how to bring the observability of your Kafka applications to the next level."
Who: Karthik Ramasamy (@karthikz)
Date: September 20, 2016
Event: #TwitterRealTime
This slide deck consists of presentations from various teams about Twitter's real time infrastructure, the components it uses, and how they function. It includes presentations from David Rusek (@davidrusek), Maosong Fu (@Louis_Fumaosong), Sandy Strong (@st5are), and Yimin Tan (@YiminTan_Kevin).
With Lakehouse as the future of data architecture, Delta becomes the de facto data storage format for all the data pipelines. By using delta, to build the curated data lakes, users achieve efficiency and reliability end-to-end. Curated data lakes involve multiple hops in the end-to-end data pipeline, which are executed regularly (mostly daily) depending on the need. As data travels through each hop, its quality improves and becomes suitable for end-user consumption. On the other hand real-time capabilities are key for any business and an added advantage, luckily Delta has seamless integration with structured streaming which makes it easy for users to achieve real-time capability using Delta. Overall, Delta Lake as a streaming source is a marriage made in heaven for various reasons and we are already seeing the rise in adoption among our users.
In this talk, we will discuss various functional components of structured streaming with Delta as a streaming source. Deep dive into Query Progress Logs(QPL) and their significance for operating streams in production. How to track the progress of any streaming job and map it with the source Delta table using QPL. What exactly gets persisted in the checkpoint directory and its details. Mapping the contents of the checkpoint directory with the QPL metrics and understanding the significance of contents in the checkpoint directory with respect to Delta streams.
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleSean Zhong
Gearpump is a Akka based realtime streaming engine, it use Actor to model everything. It has super performance and flexibility. It has performance of 18000000 messages/second and latency of 8ms on a cluster of 4 machines.
Network visibility and control using industry standard sFlow telemetrypphaal
• Find out about the sFlow instrumentation built into commodity data center network and server infrastructure.
• Understand how sFlow fits into the broader ecosystem of NetFlow, IPFIX, SNMP and DevOps monitoring technologies.
• Case studies demonstrate how sFlow telemetry combined with automation can lower costs, increase performance, and improve security of cloud infrastructure and applications.
Similar to Development of a Distributed Stream Processing System (20)
Uma Arquitetura de Stream Processing e ETL Serverless na AWSMaycon Viana Bordin
Esta apresentação aborda a trajetória percorrida na implementação de uma arquitetura de stream processing e ETL serverless na AWS para ingestão, processamento e armazenamento de dados em tempo real e em micro-batch utilizando Kinesis, Lambda e S3.
Ela mostra os caminhos que levaram até a arquitetura atual, bem como os próximos passos na evolução de uma arquitetura serverless, trade-offs feitos na construção dela, e como essa infraestrutura se encaixa dentro do Data Lake como um todo (batch vs real-time).
Sendo o sistema operacional com maior presenc¸a entre os smartphones em todo o mundo, desenvolver aplicac¸oes para o Android tem se tor- ˜ nado cada vez mais interessante. Entretanto, para tirar o melhor proveito desta plataforma e importante conhecer como ela funciona internamente. Este ar- ´ tigo aborda os principais componentes que fazem parte do software stack do Android, comec¸ando pelo kernel Linux e a Dalvik VM ate os componentes prin- ´ cipais de uma aplicac¸ao. E mostra as estrat ˜ egias adotadas pela plataforma ´ para lidar com caracter´ısticas inerentes aos dispositivos moveis, como o tempo ´ de bateria e a baixa capacidade de memoria.
Desenvolvimento de uma Rede Social Baseada em GeolocalizaçãoMaycon Viana Bordin
Fóruns foram utilizados por muito tempo na Internet como principal ferramenta para criação de comunidades online e discussões sobre determinados assuntos. Com o surgimento das redes sociais, o foco de grande parte da Internet passou a ser o indivíduo e suas relações com outras pessoas. Com elas também foram introduzidas novas funcionalidades que melhoraram a experiência de seus usuários e possibilitaram uma melhor comunicação com outras pessoas. Este trabalho buscou unir algumas destas funcionalidades na tentativa de criar um fórum que se adequasse a realidade atual sem, entretanto perder as características básicas de um fórum. O serviço focou primeiramente dispositivos móveis, mantendo uma interface de usuário simples, reunindo todos os interesses em um único lugar e permitindo que usuários sigam interesses e filtrem conversas de acordo com a sua localização. Essas ações tornaram possível a criação de um fórum diferente e que pode ser útil e de fácil uso para as pessoas, mesmo com relação as redes sociais.
A Benchmark Suite for Distributed Stream Processing SystemsMaycon Viana Bordin
Recently a new application domain characterized by the continuous and low-latency processing of large volumes of data has been gaining attention. The growing number of applications of such genre has led to the creation of Stream Processing Systems (SPSs), systems that abstract the details of real-time applications from the developer. More recently, the ever increasing volumes of data to be processed gave rise to distributed SPSs.
Currently there are in the market several distributed SPSs, however the existing benchmarks designed for the evaluation this kind of system covers only a few applications and workloads, while these systems have a much wider set of applications. In this work a benchmark for stream processing systems is proposed. Based on a survey of several papers with real-time and stream applications, the most used applications and areas were outlined, as well as the most used metrics in the performance evaluation of such applications.
With these information the metrics of the benchmark were selected as well as a list of possible application to be part of the benchmark. Those passed through a workload characterization in order to select a diverse set of applications. To ease the evaluation of SPSs a framework was created with an API to generalize the application development and collect metrics, with the possibility of extending it to support other platforms in the future. To prove the usefulness of the benchmark, a subset of the applications were executed on Storm and Spark using the Azure Platform and the results have demonstrated the usefulness of the benchmark suite in comparing these systems.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
Development of a Distributed Stream Processing System
1. Development of a Distributed
Stream Processing System
Maycon Viana Bordin
Final Assignment
Instituto de Informática
Universidade Federal do Rio Grande do Sul
CMP157 – PDP 2013/2, Claudio Geyer
71. Test Environment
GridRS - PUCRS
3 nodes
4 x 3.52 GHz (Intel Xeon)
2 GB RAM
Linux 2.6.32-5-amd64
Gigabit Ethernet
72. Metrics
Runtime
Latency: time to a tuple traverse the graph
Throughput: no. of tuples processed per sec.
Loss of Tuples
Methodology
5 runs per test.
Every 3s each operator sends its status with
no. of tuples processed.
The PerfMon sink collects a tuple every
100ms, and sends the average latency every
3s (and cleans up the collected tuples).
Variables
Number of nodes
Number of operator instances
Window size
88. References
Chakravarthy, Sharma. Stream data processing: a quality of
service perspective: modeling, scheduling, load shedding, and
complex event processing. Vol. 36. Springer, 2009.
Cormode, Graham, and S. Muthukrishnan. "An improved data
stream summary: the count-min sketch and its applications."
Journal of Algorithms 55.1 (2005): 58-75.
Gulisano, Vincenzo Massimiliano, Ricardo Jiménez Peris, and
Patrick Valduriez. StreamCloud: An Elastic Parallel-Distributed
Stream Processing Engine. Diss. Informatica, 2012.
Source code @ github.com/mayconbordin/tempest