This document describes Twitter's Days In Green (DIG) methodology for forecasting the lifespan of a healthy service before it exceeds a predefined capacity threshold. It involves collecting time series data on a service's key performance metric, detecting anomalies and breakouts, fitting an ARIMA model to capture trends and seasonality, and forecasting the number of days before the threshold is breached to determine capacity needs. The methodology has been deployed at Twitter to help plan capacity for hundreds of services and detect those nearing disaster recovery thresholds.
Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...Piyush Kumar
Frequent deployments, large set of in-flight A/B tests, new product launches etc. directly impact the profile of application metrics as well as system metrics. Specifically, the above can induce sudden breakouts – which manifest themselves as a mean-shift or a rampup (these are different from an anomaly) – in the time series of a given metric. Further, the profile on the incoming traffic may also experience a breakout due to a variety of reasons such as, but not limited to, roll out of a new feature or roll out for a new platform; this in turn results in breakouts in application and/or system metrics.
Breakouts can potentially impact performance of the corresponding service and consequently impact the end user experience. To alleviate the impact of breakouts – in other words, preventing user experience from ‘Breaking Bad’ – we developed statistically rigorous techniques to automatically detect breakouts in a timely fashion. The breakouts detected are used to guide capacity planning. In particular, there are two scenarios:
Positive breakout: Depending on the magnitude, deploy additionally capacity
Negative breakout: Depending on the magnitude, scale down the current capacity
We shall walk the audience through how the techniques are being at Twitter using REAL data.
Summer Internship with NDBC - A real-time automatic data quality checking system for Tropical Atmosphere Ocean(TAO) array in MATLAB and Python. (Resulted as a publication in NOAA's 38th Climate Diagnostics and Prediction Workshop, October 2013 College Park, MD)
On Unified Stream Reasoning - The RDF Stream Processing realmDaniele Dell'Aglio
The presentation of my talk at WU Vienna on 18/2/2016. I discuss the problem of unifying existing solutions to process semantic streams - with a particular focus on the ones that perform continuous query answering over RDF streams
Mitigating User Experience from 'Breaking Bad': The Twitter Approach [Velocit...Piyush Kumar
Frequent deployments, large set of in-flight A/B tests, new product launches etc. directly impact the profile of application metrics as well as system metrics. Specifically, the above can induce sudden breakouts – which manifest themselves as a mean-shift or a rampup (these are different from an anomaly) – in the time series of a given metric. Further, the profile on the incoming traffic may also experience a breakout due to a variety of reasons such as, but not limited to, roll out of a new feature or roll out for a new platform; this in turn results in breakouts in application and/or system metrics.
Breakouts can potentially impact performance of the corresponding service and consequently impact the end user experience. To alleviate the impact of breakouts – in other words, preventing user experience from ‘Breaking Bad’ – we developed statistically rigorous techniques to automatically detect breakouts in a timely fashion. The breakouts detected are used to guide capacity planning. In particular, there are two scenarios:
Positive breakout: Depending on the magnitude, deploy additionally capacity
Negative breakout: Depending on the magnitude, scale down the current capacity
We shall walk the audience through how the techniques are being at Twitter using REAL data.
Summer Internship with NDBC - A real-time automatic data quality checking system for Tropical Atmosphere Ocean(TAO) array in MATLAB and Python. (Resulted as a publication in NOAA's 38th Climate Diagnostics and Prediction Workshop, October 2013 College Park, MD)
On Unified Stream Reasoning - The RDF Stream Processing realmDaniele Dell'Aglio
The presentation of my talk at WU Vienna on 18/2/2016. I discuss the problem of unifying existing solutions to process semantic streams - with a particular focus on the ones that perform continuous query answering over RDF streams
Finding bad apples early: Minimizing performance impactArun Kejariwal
The big data era is characterized by the ever-increasing velocity and volume of data. In order to store and analyze the ever-growing data, the operational footprint of data stores and Hadoop have also grown over time. (As per a recent report from IDC, the spending on big data infrastructure is expected to reach $41.5 billion by 2018.) The clusters comprise several thousands of nodes. The high performance of such clusters is vital for delivering the best user experience and productivity of teams.
The performance of such clusters is often limited by slow/bad nodes. Finding slow nodes in large clusters is akin to finding a needle in a haystack; hence, manual identification of slow/bad nodes is not practical. To this end, we developed a novel statistical technique to automatically detect slow/bad nodes in clusters comprising hundreds to thousands of nodes. We modeled the problem as a classification problem and employed a simple, yet very effective, distance measure to determine slow/bad nodes. The key highlights of the proposed technique are the following:
# Robustness against anomalies (note that anomalies may occur, for example, due to an ad-hoc heavyweight job on a Hadoop cluster)
# Given the varying data characteristics of different services, no one model fits all. Consequently, we parameterized the threshold used for classification
The proposed technique works well with both hourly and daily data, and has been in use in production by multiple services. This has not only eliminated manual investigation efforts, but has also mitigated the impact of slow nodes, which used to get detected after several weeks/months of lag!
We shall walk the audience through how the techniques are being used with REAL data.
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
The big data era is characterized by ever-increasing velocity and volume of data. Over the last two or three years, several talks at Velocity have explored how to analyze operations data at scale, focusing on anomaly detection, performance analysis, and capacity planning, to name a few topics. Knowledge sharing of the techniques for the aforementioned problems helps the community to build highly available, performant, and resilient systems.
A key aspect of operations data is that data may be missing—referred to as “holes”—in the time series. This may happen for a wide variety of reasons, including (but not limited to):
# Packets being dropped due to unresponsive downstream services
# A network hiccup
# Transient hardware or software failure
# An issue with the data collection service
“Holes” in the time series on data analysis can potentially skew the analysis of data. This in turn can materially impact decision making. Arun Kejariwal presents approaches for analyzing operations data in the presence of “holes” in the time series, highlighting how missing data impacts common data analysis such as anomaly detection and forecasting, discussing the implications of missing data on time series of different granularities, such as minutely and hourly, and exploring a gamut of techniques that can be used to address the missing data issue (e.g., approximate the data using interpolation, regression, ensemble methods, etc.). Arun then walks you through how the techniques can be leveraged using real data.
Real Time Analytics: Algorithms and SystemsArun Kejariwal
In this tutorial, an in-depth overview of streaming analytics -- applications, algorithms and platforms -- landscape is presented. We walk through how the field has evolved over the last decade and then discuss the current challenges -- the impact of the other three Vs, viz., Volume, Variety and Veracity, on Big Data streaming analytics.
Anomaly detection in real-time data streams using HeronArun Kejariwal
Twitter has become the de facto medium for consumption of news in real time, and billions of events are generated and analyzed on a daily basis. To analyze these events, Twitter designed its own next-generation streaming system, Heron. Arun Kejariwal and Karthik Ramasamy walk you through how Heron is used to detect anomalies in real-time data streams. Although there’s been over 75 years of prior work in anomaly detection, most of the techniques cannot be used off the shelf because they’re not suitable for high-velocity data streams. Arun and Karthik explain how to make trade-offs between accuracy and speed and discuss incremental approaches that marry sampling with robust measures such as median and MCD for anomaly detection.
Shallow Survey 2018 - Applications of Sonar Detection Uncertainty for Survey ...Giuseppe Masetti
Authors: Giuseppe Masetti1*, Jean-Marie Augustin2, Xavier Lurton2, Brian R. Calder3
1. CCOM/JHC, University of New Hampshire, Durham, NH, USA, gmasetti@ccom.unh.edu
2. Institut Français de Recherche pour l’Exploitation de la Mer (Ifremer), Brest, France
3. CCOM/JHC, University of New Hampshire, Durham, NH, USA
An objective measurement of the bathymetric uncertainty introduced by sonar bottom detection has been proposed (Lurton and Augustin, 2009) to overcome the sonar-specific heuristic solutions proposed by constructors. This approach pairs each sounding with an estimation of sonar detection uncertainty (SDU) based on the width of the signal envelope (amplitude detection) or the noise level of the phase ramp (phase detection), thus capturing the intrinsic quality of the received signal and any applied signal-processing step.
Along with the environment characterization and the motion sensor accuracy, the SDU represents a major contributor to the total vertical uncertainty (TVU). As such, the monitoring of the SDU statistics by detection types, acquisition modes, and transmission sectors (when available) provides an effective way to alert the surveyor about ongoing issues in the data collection. It also has potential application in the evaluation of the health status of the sonar - for example, by comparing SDU-derived performance of repeated surveys on the same seafloor area and estimating the uncertainty contributions from environment and motion. Finally, the SDU may be integrated in multiple stages of the data processing workflow, from data pre-filtering to hydrographic uncertainty modeling, up to more advanced applications like hypotheses disambiguation in statistical gridding algorithms (e.g., CUBE).
Based on such considerations, we conducted a study to explore possible applications of the estimated SDU values for survey quality control and data processing. The results of the analysis applied to real data – collected using multibeam echosounders from manufacturers who are early adopters of this metric (i.e., Kongsberg Maritime and Teledyne Reson) – provide evidence that SDU is a useful tool for survey monitoring.
Finding bad apples early: Minimizing performance impactArun Kejariwal
The big data era is characterized by the ever-increasing velocity and volume of data. In order to store and analyze the ever-growing data, the operational footprint of data stores and Hadoop have also grown over time. (As per a recent report from IDC, the spending on big data infrastructure is expected to reach $41.5 billion by 2018.) The clusters comprise several thousands of nodes. The high performance of such clusters is vital for delivering the best user experience and productivity of teams.
The performance of such clusters is often limited by slow/bad nodes. Finding slow nodes in large clusters is akin to finding a needle in a haystack; hence, manual identification of slow/bad nodes is not practical. To this end, we developed a novel statistical technique to automatically detect slow/bad nodes in clusters comprising hundreds to thousands of nodes. We modeled the problem as a classification problem and employed a simple, yet very effective, distance measure to determine slow/bad nodes. The key highlights of the proposed technique are the following:
# Robustness against anomalies (note that anomalies may occur, for example, due to an ad-hoc heavyweight job on a Hadoop cluster)
# Given the varying data characteristics of different services, no one model fits all. Consequently, we parameterized the threshold used for classification
The proposed technique works well with both hourly and daily data, and has been in use in production by multiple services. This has not only eliminated manual investigation efforts, but has also mitigated the impact of slow nodes, which used to get detected after several weeks/months of lag!
We shall walk the audience through how the techniques are being used with REAL data.
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
The big data era is characterized by ever-increasing velocity and volume of data. Over the last two or three years, several talks at Velocity have explored how to analyze operations data at scale, focusing on anomaly detection, performance analysis, and capacity planning, to name a few topics. Knowledge sharing of the techniques for the aforementioned problems helps the community to build highly available, performant, and resilient systems.
A key aspect of operations data is that data may be missing—referred to as “holes”—in the time series. This may happen for a wide variety of reasons, including (but not limited to):
# Packets being dropped due to unresponsive downstream services
# A network hiccup
# Transient hardware or software failure
# An issue with the data collection service
“Holes” in the time series on data analysis can potentially skew the analysis of data. This in turn can materially impact decision making. Arun Kejariwal presents approaches for analyzing operations data in the presence of “holes” in the time series, highlighting how missing data impacts common data analysis such as anomaly detection and forecasting, discussing the implications of missing data on time series of different granularities, such as minutely and hourly, and exploring a gamut of techniques that can be used to address the missing data issue (e.g., approximate the data using interpolation, regression, ensemble methods, etc.). Arun then walks you through how the techniques can be leveraged using real data.
Real Time Analytics: Algorithms and SystemsArun Kejariwal
In this tutorial, an in-depth overview of streaming analytics -- applications, algorithms and platforms -- landscape is presented. We walk through how the field has evolved over the last decade and then discuss the current challenges -- the impact of the other three Vs, viz., Volume, Variety and Veracity, on Big Data streaming analytics.
Anomaly detection in real-time data streams using HeronArun Kejariwal
Twitter has become the de facto medium for consumption of news in real time, and billions of events are generated and analyzed on a daily basis. To analyze these events, Twitter designed its own next-generation streaming system, Heron. Arun Kejariwal and Karthik Ramasamy walk you through how Heron is used to detect anomalies in real-time data streams. Although there’s been over 75 years of prior work in anomaly detection, most of the techniques cannot be used off the shelf because they’re not suitable for high-velocity data streams. Arun and Karthik explain how to make trade-offs between accuracy and speed and discuss incremental approaches that marry sampling with robust measures such as median and MCD for anomaly detection.
Shallow Survey 2018 - Applications of Sonar Detection Uncertainty for Survey ...Giuseppe Masetti
Authors: Giuseppe Masetti1*, Jean-Marie Augustin2, Xavier Lurton2, Brian R. Calder3
1. CCOM/JHC, University of New Hampshire, Durham, NH, USA, gmasetti@ccom.unh.edu
2. Institut Français de Recherche pour l’Exploitation de la Mer (Ifremer), Brest, France
3. CCOM/JHC, University of New Hampshire, Durham, NH, USA
An objective measurement of the bathymetric uncertainty introduced by sonar bottom detection has been proposed (Lurton and Augustin, 2009) to overcome the sonar-specific heuristic solutions proposed by constructors. This approach pairs each sounding with an estimation of sonar detection uncertainty (SDU) based on the width of the signal envelope (amplitude detection) or the noise level of the phase ramp (phase detection), thus capturing the intrinsic quality of the received signal and any applied signal-processing step.
Along with the environment characterization and the motion sensor accuracy, the SDU represents a major contributor to the total vertical uncertainty (TVU). As such, the monitoring of the SDU statistics by detection types, acquisition modes, and transmission sectors (when available) provides an effective way to alert the surveyor about ongoing issues in the data collection. It also has potential application in the evaluation of the health status of the sonar - for example, by comparing SDU-derived performance of repeated surveys on the same seafloor area and estimating the uncertainty contributions from environment and motion. Finally, the SDU may be integrated in multiple stages of the data processing workflow, from data pre-filtering to hydrographic uncertainty modeling, up to more advanced applications like hypotheses disambiguation in statistical gridding algorithms (e.g., CUBE).
Based on such considerations, we conducted a study to explore possible applications of the estimated SDU values for survey quality control and data processing. The results of the analysis applied to real data – collected using multibeam echosounders from manufacturers who are early adopters of this metric (i.e., Kongsberg Maritime and Teledyne Reson) – provide evidence that SDU is a useful tool for survey monitoring.
On March 11, 2016, ICLR held a Friday Forum workshop entitled 'Mapping extreme rainfall statistics for Canada', led by Dr. Slobodan Simonovic of Western University.
Climate change is expected to increase the frequency and intensity of extreme rainfall events, affecting rainfall intensity-duration-frequency (IDF) curve information used in the design, maintenance and operation of water infrastructure in Canada. Presented in this lecture are analyses of precipitation data from 567 Environment Canada hydro-meteorological stations using the IDF_CC tool. Results for the year 2100 based on Canadian climate model and an ensemble of 22 GCMs have been generated. A spatial interpolation method was used to produce Canadian precipitation maps for events of various return periods. Results based on the Canadian climate model indicate a reduction in extreme precipitation in central regions of Canada and increases in other regions. Relative to the ensemble approach, the Canadian climate model results (a) suggest more spatial variability in change of IDFs, and (b) the ensemble approach generated generally lower values than the Canadian climate model.
Dr. Simonovic has extensive research, teaching and consulting experience in water resources systems engineering. He teaches courses in water resources and civil engineering systems. He actively works for national and international professional organizations. Dr. Simonovic’s primary research interest focuses on the application of systems approach to management of complex water and environmental systems. Most of his work is related to the integration of risk, reliability, and uncertainty in hydrology and water resources management. He has received a number of awards for excellence in teaching, research and outreach. He has published over 450 professional publications and three major textbooks. He was inducted to the Canadian Academy of Engineering in June of 2013.
Forecasting time series powerful and simpleIvo Andreev
Time series are a sequence of data points positioned in order of time. Time series forecasting has two main purposes - to understand the mechanisms that lead to rise or fall, and to predict future values. Very often it analyses trends, cyclical events, seasonality and has unique importance in Economics and Business. The quality of predictions can be evaluated only in future due to temporal dependencies on previous data points and there are many model types for approximation. In this session we are going to talk about challenges, ways of improvement and technology stack like ML.NET, ARIMA, Python, Azure ML, Regression and FB Prophet
Banoub, Christopher, "Lean Six Sigma: Optimizing Patient Throughput & Increasing Patient Satisfaction," Veith Symposium 2015, New York, New York, November 2015
4Developers 2015: Measure to fail - Tomasz KowalczewskiPROIDEA
YouTube: https://www.youtube.com/watch?v=H5F0D55nKX4&index=11&list=PLnKL6-WWWE_WNYmP_P5x2SfzJ7jeJNzfp
Tomasz Kowalczewski
Language: English
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with com.codahale metrics library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. We will check how graphite averages data just to helplessly watch important latency spikes disappear. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes
Hardware fails, applications fail, our code... well, it fails too (at least mine). To prevent software failure we test. Hardware failures are inevitable, so we write code that tolerates them, then we test. From tests we gather metrics and act upon them by improving parts that perform inadequately. Measuring right things at right places in an application is as much about good engineering practices and maintaining SLAs as it is about end user experience and may differentiate successful product from a failure.
In order to act on performance metrics such as max latency and consistent response times we need to know their accurate value. The problem with such metrics is that when using popular tools we get results that are not only inaccurate but also too optimistic.
During my presentation I will simulate services that require monitoring and show how gathered metrics differ from real numbers. All this while using what currently seems to be most popular metric pipeline - Graphite together with metrics.dropwizard.io library - and get completely false results. We will learn to tune it and get much better accuracy. We will use JMeter to measure latency and observe how falsely reassuring the results are. Finally I will show how HdrHistogram helps in gathering reliable metrics. We will also run tests measuring performance of different metric classes.
Surviving your Capital Improvement Plan - Kevin McKinnon, Anchorage Water and...marcus evans Network
Kevin McKinnon, Anchorage Water and Wastewater Utility - Speaker at the marcus evans Water & Wastewater Management Summit 2012 held in Summerlin, NV, May 3-4, 2012, delivered his presentation entitled Surviving your Capital Improvement Plan
Similar to Days In Green (DIG): Forecasting the life of a healthy service (13)
In the wake of IoT becoming ubiquitous, there has been a large interest in the industry to develop novel techniques for anomaly detection at the Edge. Example applications include, but not limited to, smart cities/grids of sensors, industrial process control in manufacturing, smart home, wearables, connected vehicles, agriculture (sensing for soil moisture and nutrients). What makes anomaly detection at the Edge different? The following constraints be it due to the sensors or the applications necessitate the need for the development of new algorithms for AD.
* Very low power and low compute/memory resources
* High data volume making centralized AD infeasible owing to the communication overhead
* Need for low latency to drive fast action taking
Guaranteeing privacy In this talk we shall throw light on the above in detail. Subsequently, we shall walk through the algorithm design process for anomaly detection at the Edge. Specifically, we shall dive into the need to build small models/ensembles owing to limited memory on the sensors. Further, how to training data in an online fashion as long term historical data is not available due to limited storage. Given the need for data compression to contain the communication overhead, can one carry out anomaly detection on compressed data? We shall throw light on building of small models, sequential and one-shot learning algorithms, compressing the data with the models and limiting the communication to only the data corresponding to the anomalies and model description. We shall illustrate the above with concrete examples from the wild!
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
In recent years, serverless has gained momentum in the realm of cloud computing. Broadly speaking, it comprises function as a service (FaaS) and backend as a service (BaaS). The distinction between the two is that under FaaS, one writes and maintains the code (e.g., the functions) for serverless compute; in contrast, under BaaS, the platform provides the functionality and manages the operational complexity behind it. Serverless provides a great means to boost development velocity. With greatly reduced infrastructure costs, more agile and focused teams, and faster time to market, enterprises are increasingly adopting serverless approaches to gain a key advantage over their competitors.
Example early use cases of serverless include, for example, data transformation in batch and ETL scenarios and data processing using MapReduce patterns. As a natural extension, serverless is being used in the streaming context such as, but not limited to, real-time bidding, fraud detection, intrusion detection. Serverless is, arguably, naturally suited to extracting insights from fast data, that is, high-volume, high-velocity data. Example tasks in this regard include filtering and reducing noise in the data and leveraging machine learning and deep learning models to provide continuous insights about business operations.
We walk the audience through the landscape of streaming systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage. We overview the inception and growth of the serverless paradigm. Further, we deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions, and paint a bird’s-eye view of the application domains where Pulsar functions can be leveraged.
Baking in intelligence in a serverless flow is paramount from a business perspective. To this end, we detail different serverless patterns—event processing, machine learning, and analytics—for different use cases and highlight the trade-offs. We present perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of serverless streaming architectures and algorithms. The topics covered include an introduction to st
reaming, an introduction to serverless, serverless and streaming requirements, Apache Pulsar, application domains, serverless event processing patterns, serverless machine learning patterns, and serverless analytics patterns.
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
In this talk we overview Sequence-2-Sequence (S2S) and explore its early use cases. We walk the audience through how to leverage S2S modeling for several use cases, particularly with regard to real-time anomaly detection and forecasting.
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
Sequence-to-sequence modeling (seq2seq) is now being used for applications based on time series data. We overview Seq-2-Seq and explore its early use cases. They then walk the audience through how to leverage Seq-2-Seq modeling for a couple of concrete use cases - real-time anomaly detection and forecasting.
In this talk we walk through an architecture in which models are served in real time and the models are updated, using Apache Pulsar, without restarting the application at hand. They then describe how to apply Pulsar functions to support two example use—sampling and filtering—and explore a concrete case study of the same.
Designing Modern Streaming Data ApplicationsArun Kejariwal
Many industry segments have been grappling with fast data (high-volume, high-velocity data). The enterprises in these industry segments need to process this fast data just in time to derive insights and act upon it quickly. Such tasks include but are not limited to enriching data with additional information, filtering and reducing noisy data, enhancing machine learning models, providing continuous insights on business operations, and sharing these insights just in time with customers. In order to realize these results, an enterprise needs to build an end-to-end data processing system, from data acquisition, data ingestion, data processing, and model building to serving and sharing the results. This presents a significant challenge, due to the presence of multiple messaging frameworks and several streaming computing frameworks and storage frameworks for real-time data.
In this tutorial we lead a journey through the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline, messaging frameworks, streaming computing frameworks, storage frameworks for real-time data, and more. We also share case studies from the IoT, gaming, and healthcare as well as their experience operating these systems at internet scale at Twitter and Yahoo. We conclude by offering their perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of messaging systems, streaming systems, storage systems for streaming data, and reinforcement learning-based systems that will power fast processing and analysis of a large (potentially of the order of hundreds of millions) set of data streams.
Topics include:
* An introduction to streaming
* Common data processing patterns
* Different types of end-to-end stream processing architectures
* How to seamlessly move data across data different frameworks
* Case studies: Healthcare and the IoT
* Data sketches for mining insights from data streams
There has been a shift from big data to live streaming data to facilitate faster data-driven decision making. As the number of live data streams grow—partly a result of the expanding IoT—it is critical to develop techniques to better extract actionable insights.
One current application, anomaly detection, is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in an anomaly fatigue, limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.
In this talk, we explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.
Topics include:
* An overview correlation analysis
* Robust correlation analysis
* Overview of alternative measures, such as co-median
* Trade-offs between speed and accuracy
* Correlation analysis in large dimensions
In this talk we walk the audience through how to marry correlation analysis with anomaly detection, discuss how the topics are intertwined, and detail the challenges one may encounter based on production data. We also showcase how deep learning can be leveraged to learn nonlinear correlation, which in turn can be used to further contain the false positive rate of an anomaly detection system. Further, we provide an overview of how correlation can be leveraged for common representation learning.
There has been a shift from big data to live streaming data to facilitate faster data-driven decision making. As the number of live data streams grow—partly a result of the expanding IoT—it is critical to develop techniques to better extract actionable insights.
One current application, anomaly detection, is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in an anomaly fatigue, limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.
In this talk we explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.
Topics include:
* An overview correlation analysis
* Robust correlation analysis
* Trade-offs between speed and accuracy
* Multi-modal correlation analysis
compute tier. Detection and filtering of anomalies in live data is of paramount importance for robust decision making. To this end, in this talk we share techniques for anomaly detection in live data.
In this tutorial we walk through state-of-the-art streaming systems, algorithms, and deployment architectures and cover the typical challenges in modern real-time big data platforms and offering insights on how to address them. We also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, we explore the interplay between storage and stream processing and discuss future developments.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Days In Green (DIG): Forecasting the life of a healthy service
1. Days
In
Green
(DIG):
Forecas1ng
the
life
of
a
healthy
service
Vibhav Garg, Arun Kejariwal
(@ativilambit, @arun_kejariwal)
Capacity and Performance Engineering @ Twitter
June 2014
4. Internet
trends
• Mobile-first
q 25% of total web usage [1]
q Mobile data traffic: 81%, accelerating growth [1]
• Real-time
[1]
hBp://www.kpcb.com/file/kpcb-‐internet-‐trends-‐2014
(May
2014)
VG,
AK
4
#Selfie
5. Capacity
&
Performance
• Organic growth
q Over 255M monthly active users [1]
• Evolving product landscape
• Handle Peak Traffic
q Mobile Busy Hour Is 66% Higher Than Average Hour in 2013, 83% by 2018 [2]
q Events
[1]
hBps://investor.twiBerinc.com/releasedetail.cfm?releaseid=843245
[2]
hBp://www.cisco.com/c/en/us/solu1ons/collateral/service-‐provider/visual-‐networking-‐index-‐vni/white_paper_c11-‐520862.html
VG,
AK
5
6. Systema1c
Capacity
Planning
• Objectives
q Check under-allocation
§ Performance, Availability
o Adversely impact user experience
q Check over-allocation
§ Operational efficiency
o Adversely impacts bottom line
q Check poor scalability
• Approaches
q Reactive
§ Adversely impact user experience
q Proactive
Poor
UX
Underu'liza'on
VG,
AK
6
7. Systema1c
Capacity
Planning
(contd.)
• Non-trivial
q Rapidly evolving product landscape
§ Changes services’ performance profile
q Organic growth
• Scalable Approach
q Service Oriented Architecture
§ 100s of services
q Millions of metrics [1,2]
q Automated
[1]
hBp://strata.oreilly.com/2013/09/how-‐twiBer-‐monitors-‐millions-‐of-‐1me-‐series.html
[2]
hBp://strataconf.com/strata2014/public/schedule/detail/32431
VG,
AK
7
8. DIG:
Days
in
Green
• Objective
q Statistically determine the # of days for which a service is expected to stay
healthy
• Methodology
q Determine driving resource
q Determine capacity threshold T
q Generate a time series and forecast
q DIG - # days before the service is expected to exceed T
VG,
AK
8
Time
Driving
Resource
DIG
T
9. DIG
(contd.)
• Determining Capacity Thresholds
q Service specific
§ Driving resource differs
q Load Test
§ Canaries
§ Replay production traffic
q Examples
§ CPU at 70%
§ Disk utilization at, 80%
§ RPS at X requests/sec
VG,
AK
9
SLA
T
CPU
Latency
10. DIG
(contd.)
• Time Series Analysis
q Data collection
§ Granularity
o Daily
• Long term forecast
o Which value?
• Close to the daily peak but low standard deviation (σ)
o Assume 7 day seasonality
§ Duration
o 30-90 days
q Model fitting
q Forecast
VG,
AK
10
Percen'le
Dura'on
Mean
σ
100
(Max)
57.7
3.29
99
14.4
mins
54.7
2.49
95
72
mins
53.1
2.4
11. DIG
(contd.)
• Model fitting
q Linear
§ Captures trend well
§ Does not fit well for seasonal time series
§ No weightage to recent data
VG,
AK
11
R2
=
0.56
12. DIG
(contd.)
• Model fitting
q Polynomial
§ Fits better than linear, not good for forecasting
§ Seasonality unaware
VG,
AK
12
R2
=
0.62
13. DIG
(contd.)
• Model fitting
q Splines
§ Widely used for curve fitting
§ Tend to overfit data
§ Not suitable for forecasting
q Triple Exponential Smoothing (Holt Winters)
§ Good for fit and forecasting
§ Trend and seasonality modeled implicitly
• ARIMA
VG,
AK
13
14. ARIMA
• Auto-Regressive Integrated Moving Average
q (p, d , q)
q Explicitly models seasonality and trend
q Applicable to non-stationary time series
q Worst Case degenerates to linear fit
Autoregressive
component
Moving
Average
component
Moving
Average
order
Integrated
order
Autoregressive
order
VG,
AK
14
15. DIG
(contd.)
• Model Fitting
q ARIMA in action
§ Captures underlying trend
§ Captures seasonality
q Are we good? Not quite!
VG,
AK
15
Forecast
16. • Time Series Characteristics
q Anomalies
§ Positive
§ Negative
VG,
AK
16
Anomalies
DIG
(contd.)
17. Breakout
• Time series characteristics
q Breakout
§ Flavors
o Mean shift
o Ramp up
§ Direction
o Positive, Negative
DIG
(contd.)
VG,
AK
17
18. • Time series characteristics
q Seasonality breaks
q Various reasons (but not limited to)
§ Daily deployments
§ Changes in traffic
§ Collection issues
Seasonality
Breaks
VG,
AK
18
DIG
(contd.)
19. VG,
AK
19
• Curve fitting with ARIMA
q Trend and seasonality aware
q What does the DIG forecast look like?
Trend
1
Trend
2
DIG
(contd.)
Trend
3
Anomaly
T
Breakout
20. DIG
(contd.)
• ARIMA Forecast
§ Not a good forecast because of multiple trends and anomalies
§ Wide confidence band
§ 40 Days In Green with Confidence band of 10-40
VG,
AK
20
95%
confidence
band
T
DIG
21. • ARIMA Forecast with breakout(s) eliminated
§ 35 Days In Green with a Confidence Band of 2-40
§ Limitations
o Wide confidence band
o Susceptible to anomalies
VG,
AK
21
DIG
(contd.)
T
DIG
22. • ARIMA Forecast with Breakout and Anomaly eliminated
§ 25 Days In Green with a Confidence Band of 2-40
§ Narrow confidence band
§ Improved Accuracy
VG,
AK
22
DIG
(contd.)
T
DIG
23. • DIG Comparison
q With breakout and anomaly detection
DIG
(contd.)
VG,
AK
23
DIG
T
Raw
Raw
-‐
BO
Raw
–
BO-‐
Anomaly
24. DIG
(contd.)
VG,
AK
24
• Discussion
q Boundary conditions
§ False seasonality
T
25. DIG
(contd.)
• Limitations
q “Quality” of data: Poor forecasts
VG,
AK
25
T
27. DIG
(contd.)
VG,
AK
27
• Current Status – Deployed in Production
q Hundreds of services
q Fully automated for CPU, extending to other metrics
q DR Compliance
§ Combine data from multiple datacenters
§ Detect services that are close to DR threshold
• Future Work
q Utilization Based Allocation
28. DIG
(contd.)
VG,
AK
28
• Anomaly Detection
q Algorithm developed in-house
q Presented at USENIX HotCloud’14[1]
[1]
hBps://www.usenix.org/conference/hotcloud14/workshop-‐program/presenta1on/vallis
30. Wrapping
up
&
Lessons
learned
• DIG: Days In Green
q Proactively assess future health of a service
q Modeling and forecasting: ARIMA
q Anomaly and Breakout removal
• Modeling
q Hard to get a stable time series
§ Organic growth, New products, Behavioral aspect
q Exploring advanced data cleansing techniques
q Improve Breakout and Anomaly Detection
VG,
AK
30
31. Acknowledgements
• Piyush Kumar, Capacity Engineer
• Winston Lee, Capacity Engineer
• Owen Vallis Jr & Jordan Hochenbaum, Ex Interns
• Nicholas James, Intern
• Management team
VG,
AK
31
32. Join
the
Flock
• We are hiring!!
q https://twitter.com/JoinTheFlock
q https://twitter.com/jobs
q Contact us: @ativilambit, @arun_kejariwal
Like
problem
solving?
Like
challenges?
Be
at
cujng
Edge
Make
an
impact
VG,
AK
32