In this talk we overview Sequence-2-Sequence (S2S) and explore its early use cases. We walk the audience through how to leverage S2S modeling for several use cases, particularly with regard to real-time anomaly detection and forecasting.
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
Sequence-to-sequence modeling (seq2seq) is now being used for applications based on time series data. We overview Seq-2-Seq and explore its early use cases. They then walk the audience through how to leverage Seq-2-Seq modeling for a couple of concrete use cases - real-time anomaly detection and forecasting.
In this talk we walk the audience through how to marry correlation analysis with anomaly detection, discuss how the topics are intertwined, and detail the challenges one may encounter based on production data. We also showcase how deep learning can be leveraged to learn nonlinear correlation, which in turn can be used to further contain the false positive rate of an anomaly detection system. Further, we provide an overview of how correlation can be leveraged for common representation learning.
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Master Thesis Presentation (Subselection of Topics)Alina Leidinger
This presentation shows some of my work carried out as part of my master thesis on "Mathematical Analysis of Neural Networks" at TUM Chair of Applied Numerical Analysis under Prof. Dr. Massimo Fornasier. The thesis constitutes a literature review with the aim of analysing and contrasting some of the approaches in the mathematical analysis of neural networks. The thesis focuses on 3 key aspects: Modern and classical approximation theory, robustness and stability of neural networks and unique identification of network weights. While the three themes carry approximately equal weight in the thesis, this presentation gives only a very short overview over the first and third chapter of my thesis and focuses on the robustness chapter. See also the full text version available on SlideShare/LinkedIn.
Process of converting data set having vast dimensions into data set with lesser dimensions ensuring that it conveys similar information concisely.
Concept
R code
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
10 9242 it geo-spatial information for managing ambiguity(edit ty)IAESIJEECS
An innate test emerging in any dataset containing data of space as well as time is vulnerability due to different wellsprings of imprecision. Incorporating the effect of the instability is a principal while evaluating the unwavering quality (certainty) of any question result from the hidden information. To bargain with vulnerability, arrangements have been proposed freely in the geo-science and the information science look into group. This interdisciplinary instructional exercise crosses over any barrier between the two groups by giving an exhaustive diagram of the distinctive difficulties required in managing indeterminate geo-spatial information, by looking over arrangements from both research groups, and by distinguishing likenesses, cooperative energies and open research issues.
Sequence-to-Sequence Modeling for Time SeriesArun Kejariwal
Sequence-to-sequence modeling (seq2seq) is now being used for applications based on time series data. We overview Seq-2-Seq and explore its early use cases. They then walk the audience through how to leverage Seq-2-Seq modeling for a couple of concrete use cases - real-time anomaly detection and forecasting.
In this talk we walk the audience through how to marry correlation analysis with anomaly detection, discuss how the topics are intertwined, and detail the challenges one may encounter based on production data. We also showcase how deep learning can be leveraged to learn nonlinear correlation, which in turn can be used to further contain the false positive rate of an anomaly detection system. Further, we provide an overview of how correlation can be leveraged for common representation learning.
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
The papers for publication in The International Journal of Engineering& Science are selected through rigorous peer reviews to ensure originality, timeliness, relevance, and readability.
Master Thesis Presentation (Subselection of Topics)Alina Leidinger
This presentation shows some of my work carried out as part of my master thesis on "Mathematical Analysis of Neural Networks" at TUM Chair of Applied Numerical Analysis under Prof. Dr. Massimo Fornasier. The thesis constitutes a literature review with the aim of analysing and contrasting some of the approaches in the mathematical analysis of neural networks. The thesis focuses on 3 key aspects: Modern and classical approximation theory, robustness and stability of neural networks and unique identification of network weights. While the three themes carry approximately equal weight in the thesis, this presentation gives only a very short overview over the first and third chapter of my thesis and focuses on the robustness chapter. See also the full text version available on SlideShare/LinkedIn.
Process of converting data set having vast dimensions into data set with lesser dimensions ensuring that it conveys similar information concisely.
Concept
R code
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
10 9242 it geo-spatial information for managing ambiguity(edit ty)IAESIJEECS
An innate test emerging in any dataset containing data of space as well as time is vulnerability due to different wellsprings of imprecision. Incorporating the effect of the instability is a principal while evaluating the unwavering quality (certainty) of any question result from the hidden information. To bargain with vulnerability, arrangements have been proposed freely in the geo-science and the information science look into group. This interdisciplinary instructional exercise crosses over any barrier between the two groups by giving an exhaustive diagram of the distinctive difficulties required in managing indeterminate geo-spatial information, by looking over arrangements from both research groups, and by distinguishing likenesses, cooperative energies and open research issues.
K-means Clustering Method for the Analysis of Log Dataidescitation
Clustering analysis method is one of the main
analytical methods in data mining; the method of clustering
algorithm will influence the clustering results directly. This
paper discusses the standard k-means clustering algorithm
and analyzes the shortcomings of standard k-means
algorithm. This paper also focuses on web usage mining to
analyze the data for pattern recognition. With the help of k-
means algorithm, pattern is identified.
Machine Learning for Forecasting: From Data to DeploymentAnant Agarwal
Forecasting is everywhere. This talk covers:
• Fundamental concepts of time series
• Data preprocessing (imputation and outlier analysis)
• Feature engineering and EDA for time series
• Statistical and machine learning algorithms
• Model evaluation through backtesting
• Model explanation using SHAP
• Model monitoring and deployment considerations
Episode 12 : Research Methodology ( Part 2 )
Approach to de-synthesizing data, informational, and/or factual elements to answer research questions
Method of putting together facts and figures
to solve research problem
Systematic process of utilizing data to address research questions
Breaking down research issues through utilizing controlled data and factual information
SAJJAD KHUDHUR ABBAS
Chemical Engineering , Al-Muthanna University, Iraq
Oil & Gas Safety and Health Professional – OSHACADEMY
Trainer of Trainers (TOT) - Canadian Center of Human
Development
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxAKHIL969626
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON2.2MAYOR/CITY COUNSELxNO#66b66cCITY MANAGER1zNO#CD6A80FIRE CHIEF2zNO#504DCDOPERATIONS ASSISTANT CHIEF3zNO#FF8C00ADMINISTRATIVE ASSISTANT CHIEF3zNO#8E388ECHIEF OF PREVENTION5zNO#00ae00CHIEF OF TRAINING5zNO#ff6e01CONFIDENTIAL AMINISTRSTIVE ASSISTANT3x8#935c24ADMINISTRATIVE ASSISTANT4x9#388E8EADMINISTRATIVE ASSISTANT5y10#5483a2BATTALION CHIEF (1 PER SHIFT4zNO#B0171FDISTRICT CHIEF (3 PER SHIFT)11zNO#912CEECAPTAIN (18 PER SHIFT)12zNO#0000EELIEUTANENT (18 PER SHIFT)13zNO#00868BDRIVER/OPERATOR (18 PER SHIFT)14zNO#698B22FIREFIGHTER-1 (18 PER SHIFT)15zNO#FFA500RESCUE SPECIALIST II (10 PER SHIFT)12zNO#7171C6RESCUE SPECIALIST I (10 PER SHIFT)17zNO#418cf0SENIOR FIRE INVESTIGATOR6zNO#00BFFFSENIOR FIRE SAFETY EDUCATOR6zNO#4682B4SENIOR FIRE INSPECTOR6zNO#FF8C00FIRE INVESTIGATOR II19zNO#0000EEFIRE INVESTIGATOR I22zNO#6E7B8BFIRE SAFETY EDUCATOR II20zNO#FF6103FIRE SAFETY EDUCATOR I24zNO#FFE4E1FIRE INSPECTOR II21zNO#808000FIRE INSPECTOR I (2)26zNO#9BCD9BSENIOR TRAINING OFFICER7zNO#87CEFATRAINING OFFICER II (2)28zNO#D02090TRAINING OFFICER I (3)29zNO#308014MAINTENANCE SUPERVISOR/MASTER MECHANIC5zNO#9ACD32ADMINISTRATIVE ASSISTANT31y32#418cf0MAINTENANCE TECHNICIAN II31zNO#CD6A80MAINTENANCE TECHNICIAN (2)33zNO#504DCDzNO#FF8C00yNO#8E388ExNO#00ae00zNO#ff6e01xNO#935c24yNO#388E8ExNO#5483a2zNO#B0171FxNO#912CEExNO#00ae00yNO#00868ByNO#698B22xNO#FFA500yNO#7171C6zNO#418cf0xNO#00BFFFyNO#4682B4xNO#FF8C00yNO#0000EExNO#6E7B8BxNO#FF6103zNO#FFE4E1xNO#808000yNO#9BCD9ByNO#87CEFAxNO#D02090xNO#308014yNO#9ACD32zNO#418cf0yNO#CD6A80xNO#504DCDyNO#FF8C00xNO#8E388ExNO#00ae00yNO#ff6e01zNO#935c24xNO#388E8EyNO#5483a2xNO#B0171FxNO#912CEEyNO#00ae00yNO#00868BxNO#698B22zNO#FFA500zNO#7171C6yNO#6E7B8BxNO#00BFFFyNO#FFE4E1zNO#FF8C00yNO#0000EEyNO#6E7B8BxNO#FF6103yNO#FFE4E1zNO#808000yNO#9BCD9BxNO#87CEFAyNO#D02090xNO#308014xNO#9ACD32yNO#418cf0xNO#CD6A80zNO#504DCDzNO#FF8C00yNO#8E388ExNO#00ae00yNO#ff6e01zNO#935c24yNO#388E8EyNO#5483a2xNO#B0171FyNO#912CEEzNO#00ae1eyNO#00868BxNO#698B22yNO#FFA500xNO#7171C6
Business Decision Making Project Part 2
Jared Linscombe
QNT/275
Dr. Davisson
September 12, 2016
Descriptive Statistics
Descriptive statistics are statistics that describe or summarize features of collected data. Descriptive statistics simply present quantitative information in a manner that can be easily managed. The large amount of data is reduced into a simple summary and therefore the whole process of describing the data is less laborious.
For example, finding the mean helps to summarize a lot of individual information into a way that is quickly understood. The samples are likely to produce different independent variables that affect the sales of Elite Technologies Limited. For this reason, we opt to use bivariate analysis in the describing the statistics. Bivariate analysis of the descriptive statistics that is derived from the data will help in drawing relationships between different variables.
For a more accurate representa ...
Episode 18 : Research Methodology ( Part 8 )
Approach to de-synthesizing data, informational, and/or factual elements to answer research questions
Method of putting together facts and figures
to solve research problem
Systematic process of utilizing data to address research questions
Breaking down research issues through utilizing controlled data and factual information
SAJJAD KHUDHUR ABBAS
Chemical Engineering , Al-Muthanna University, Iraq
Oil & Gas Safety and Health Professional – OSHACADEMY
Trainer of Trainers (TOT) - Canadian Center of Human
Development
K-means Clustering Method for the Analysis of Log Dataidescitation
Clustering analysis method is one of the main
analytical methods in data mining; the method of clustering
algorithm will influence the clustering results directly. This
paper discusses the standard k-means clustering algorithm
and analyzes the shortcomings of standard k-means
algorithm. This paper also focuses on web usage mining to
analyze the data for pattern recognition. With the help of k-
means algorithm, pattern is identified.
Machine Learning for Forecasting: From Data to DeploymentAnant Agarwal
Forecasting is everywhere. This talk covers:
• Fundamental concepts of time series
• Data preprocessing (imputation and outlier analysis)
• Feature engineering and EDA for time series
• Statistical and machine learning algorithms
• Model evaluation through backtesting
• Model explanation using SHAP
• Model monitoring and deployment considerations
Episode 12 : Research Methodology ( Part 2 )
Approach to de-synthesizing data, informational, and/or factual elements to answer research questions
Method of putting together facts and figures
to solve research problem
Systematic process of utilizing data to address research questions
Breaking down research issues through utilizing controlled data and factual information
SAJJAD KHUDHUR ABBAS
Chemical Engineering , Al-Muthanna University, Iraq
Oil & Gas Safety and Health Professional – OSHACADEMY
Trainer of Trainers (TOT) - Canadian Center of Human
Development
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON.docxAKHIL969626
FIRE ADMIN UNIT 1 .orct121320#ffffff#fa951a#FFFFFF#e7b3513VERSON2.2MAYOR/CITY COUNSELxNO#66b66cCITY MANAGER1zNO#CD6A80FIRE CHIEF2zNO#504DCDOPERATIONS ASSISTANT CHIEF3zNO#FF8C00ADMINISTRATIVE ASSISTANT CHIEF3zNO#8E388ECHIEF OF PREVENTION5zNO#00ae00CHIEF OF TRAINING5zNO#ff6e01CONFIDENTIAL AMINISTRSTIVE ASSISTANT3x8#935c24ADMINISTRATIVE ASSISTANT4x9#388E8EADMINISTRATIVE ASSISTANT5y10#5483a2BATTALION CHIEF (1 PER SHIFT4zNO#B0171FDISTRICT CHIEF (3 PER SHIFT)11zNO#912CEECAPTAIN (18 PER SHIFT)12zNO#0000EELIEUTANENT (18 PER SHIFT)13zNO#00868BDRIVER/OPERATOR (18 PER SHIFT)14zNO#698B22FIREFIGHTER-1 (18 PER SHIFT)15zNO#FFA500RESCUE SPECIALIST II (10 PER SHIFT)12zNO#7171C6RESCUE SPECIALIST I (10 PER SHIFT)17zNO#418cf0SENIOR FIRE INVESTIGATOR6zNO#00BFFFSENIOR FIRE SAFETY EDUCATOR6zNO#4682B4SENIOR FIRE INSPECTOR6zNO#FF8C00FIRE INVESTIGATOR II19zNO#0000EEFIRE INVESTIGATOR I22zNO#6E7B8BFIRE SAFETY EDUCATOR II20zNO#FF6103FIRE SAFETY EDUCATOR I24zNO#FFE4E1FIRE INSPECTOR II21zNO#808000FIRE INSPECTOR I (2)26zNO#9BCD9BSENIOR TRAINING OFFICER7zNO#87CEFATRAINING OFFICER II (2)28zNO#D02090TRAINING OFFICER I (3)29zNO#308014MAINTENANCE SUPERVISOR/MASTER MECHANIC5zNO#9ACD32ADMINISTRATIVE ASSISTANT31y32#418cf0MAINTENANCE TECHNICIAN II31zNO#CD6A80MAINTENANCE TECHNICIAN (2)33zNO#504DCDzNO#FF8C00yNO#8E388ExNO#00ae00zNO#ff6e01xNO#935c24yNO#388E8ExNO#5483a2zNO#B0171FxNO#912CEExNO#00ae00yNO#00868ByNO#698B22xNO#FFA500yNO#7171C6zNO#418cf0xNO#00BFFFyNO#4682B4xNO#FF8C00yNO#0000EExNO#6E7B8BxNO#FF6103zNO#FFE4E1xNO#808000yNO#9BCD9ByNO#87CEFAxNO#D02090xNO#308014yNO#9ACD32zNO#418cf0yNO#CD6A80xNO#504DCDyNO#FF8C00xNO#8E388ExNO#00ae00yNO#ff6e01zNO#935c24xNO#388E8EyNO#5483a2xNO#B0171FxNO#912CEEyNO#00ae00yNO#00868BxNO#698B22zNO#FFA500zNO#7171C6yNO#6E7B8BxNO#00BFFFyNO#FFE4E1zNO#FF8C00yNO#0000EEyNO#6E7B8BxNO#FF6103yNO#FFE4E1zNO#808000yNO#9BCD9BxNO#87CEFAyNO#D02090xNO#308014xNO#9ACD32yNO#418cf0xNO#CD6A80zNO#504DCDzNO#FF8C00yNO#8E388ExNO#00ae00yNO#ff6e01zNO#935c24yNO#388E8EyNO#5483a2xNO#B0171FyNO#912CEEzNO#00ae1eyNO#00868BxNO#698B22yNO#FFA500xNO#7171C6
Business Decision Making Project Part 2
Jared Linscombe
QNT/275
Dr. Davisson
September 12, 2016
Descriptive Statistics
Descriptive statistics are statistics that describe or summarize features of collected data. Descriptive statistics simply present quantitative information in a manner that can be easily managed. The large amount of data is reduced into a simple summary and therefore the whole process of describing the data is less laborious.
For example, finding the mean helps to summarize a lot of individual information into a way that is quickly understood. The samples are likely to produce different independent variables that affect the sales of Elite Technologies Limited. For this reason, we opt to use bivariate analysis in the describing the statistics. Bivariate analysis of the descriptive statistics that is derived from the data will help in drawing relationships between different variables.
For a more accurate representa ...
Episode 18 : Research Methodology ( Part 8 )
Approach to de-synthesizing data, informational, and/or factual elements to answer research questions
Method of putting together facts and figures
to solve research problem
Systematic process of utilizing data to address research questions
Breaking down research issues through utilizing controlled data and factual information
SAJJAD KHUDHUR ABBAS
Chemical Engineering , Al-Muthanna University, Iraq
Oil & Gas Safety and Health Professional – OSHACADEMY
Trainer of Trainers (TOT) - Canadian Center of Human
Development
Prospective anomaly detection methods such as the Modified EARS C2 are commonly adapted and used in public health syndromic surveillance systems. These methods however can produce an excessive false alert rate. We present a combined use of retrospective (e.g., Change Point Analysis (or CPA)) and prospective (e.g., C2) anomaly detection methods. This combined approach will help detect sudden aberrations in addition to subtle changes in local trends, help rule out alarm investigations, and assist with retrospective follow-ups. Examples on the utility of this combined approach in working collaboratively with the scientific community are applied to BioSense emergency departments' visits due to ILI. Methods, limitations, future work, and invitation to the scientific community to collaborate with us will be discussed at this talk.
In the wake of IoT becoming ubiquitous, there has been a large interest in the industry to develop novel techniques for anomaly detection at the Edge. Example applications include, but not limited to, smart cities/grids of sensors, industrial process control in manufacturing, smart home, wearables, connected vehicles, agriculture (sensing for soil moisture and nutrients). What makes anomaly detection at the Edge different? The following constraints be it due to the sensors or the applications necessitate the need for the development of new algorithms for AD.
* Very low power and low compute/memory resources
* High data volume making centralized AD infeasible owing to the communication overhead
* Need for low latency to drive fast action taking
Guaranteeing privacy In this talk we shall throw light on the above in detail. Subsequently, we shall walk through the algorithm design process for anomaly detection at the Edge. Specifically, we shall dive into the need to build small models/ensembles owing to limited memory on the sensors. Further, how to training data in an online fashion as long term historical data is not available due to limited storage. Given the need for data compression to contain the communication overhead, can one carry out anomaly detection on compressed data? We shall throw light on building of small models, sequential and one-shot learning algorithms, compressing the data with the models and limiting the communication to only the data corresponding to the anomalies and model description. We shall illustrate the above with concrete examples from the wild!
Serverless Streaming Architectures and Algorithms for the EnterpriseArun Kejariwal
In recent years, serverless has gained momentum in the realm of cloud computing. Broadly speaking, it comprises function as a service (FaaS) and backend as a service (BaaS). The distinction between the two is that under FaaS, one writes and maintains the code (e.g., the functions) for serverless compute; in contrast, under BaaS, the platform provides the functionality and manages the operational complexity behind it. Serverless provides a great means to boost development velocity. With greatly reduced infrastructure costs, more agile and focused teams, and faster time to market, enterprises are increasingly adopting serverless approaches to gain a key advantage over their competitors.
Example early use cases of serverless include, for example, data transformation in batch and ETL scenarios and data processing using MapReduce patterns. As a natural extension, serverless is being used in the streaming context such as, but not limited to, real-time bidding, fraud detection, intrusion detection. Serverless is, arguably, naturally suited to extracting insights from fast data, that is, high-volume, high-velocity data. Example tasks in this regard include filtering and reducing noise in the data and leveraging machine learning and deep learning models to provide continuous insights about business operations.
We walk the audience through the landscape of streaming systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage. We overview the inception and growth of the serverless paradigm. Further, we deep dive into Apache Pulsar, which provides native serverless support in the form of Pulsar functions, and paint a bird’s-eye view of the application domains where Pulsar functions can be leveraged.
Baking in intelligence in a serverless flow is paramount from a business perspective. To this end, we detail different serverless patterns—event processing, machine learning, and analytics—for different use cases and highlight the trade-offs. We present perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of serverless streaming architectures and algorithms. The topics covered include an introduction to st
reaming, an introduction to serverless, serverless and streaming requirements, Apache Pulsar, application domains, serverless event processing patterns, serverless machine learning patterns, and serverless analytics patterns.
In this talk we walk through an architecture in which models are served in real time and the models are updated, using Apache Pulsar, without restarting the application at hand. They then describe how to apply Pulsar functions to support two example use—sampling and filtering—and explore a concrete case study of the same.
Designing Modern Streaming Data ApplicationsArun Kejariwal
Many industry segments have been grappling with fast data (high-volume, high-velocity data). The enterprises in these industry segments need to process this fast data just in time to derive insights and act upon it quickly. Such tasks include but are not limited to enriching data with additional information, filtering and reducing noisy data, enhancing machine learning models, providing continuous insights on business operations, and sharing these insights just in time with customers. In order to realize these results, an enterprise needs to build an end-to-end data processing system, from data acquisition, data ingestion, data processing, and model building to serving and sharing the results. This presents a significant challenge, due to the presence of multiple messaging frameworks and several streaming computing frameworks and storage frameworks for real-time data.
In this tutorial we lead a journey through the landscape of state-of-the-art systems for each stage of an end-to-end data processing pipeline, messaging frameworks, streaming computing frameworks, storage frameworks for real-time data, and more. We also share case studies from the IoT, gaming, and healthcare as well as their experience operating these systems at internet scale at Twitter and Yahoo. We conclude by offering their perspectives on how advances in hardware technology and the emergence of new applications will impact the evolution of messaging systems, streaming systems, storage systems for streaming data, and reinforcement learning-based systems that will power fast processing and analysis of a large (potentially of the order of hundreds of millions) set of data streams.
Topics include:
* An introduction to streaming
* Common data processing patterns
* Different types of end-to-end stream processing architectures
* How to seamlessly move data across data different frameworks
* Case studies: Healthcare and the IoT
* Data sketches for mining insights from data streams
There has been a shift from big data to live streaming data to facilitate faster data-driven decision making. As the number of live data streams grow—partly a result of the expanding IoT—it is critical to develop techniques to better extract actionable insights.
One current application, anomaly detection, is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in an anomaly fatigue, limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.
In this talk, we explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.
Topics include:
* An overview correlation analysis
* Robust correlation analysis
* Overview of alternative measures, such as co-median
* Trade-offs between speed and accuracy
* Correlation analysis in large dimensions
There has been a shift from big data to live streaming data to facilitate faster data-driven decision making. As the number of live data streams grow—partly a result of the expanding IoT—it is critical to develop techniques to better extract actionable insights.
One current application, anomaly detection, is a necessary but insufficient step, due to the fact that anomaly detection over a set of live data streams may result in an anomaly fatigue, limiting effective decision making. One way to address the above is to carry out anomaly detection in a multidimensional space. However, this is typically very expensive computationally and hence not suitable for live data streams. Another approach is to carry out anomaly detection on individual data streams and then leverage correlation analysis to minimize false positives, which in turn helps in surfacing actionable insights faster.
In this talk we explain how marrying correlation analysis with anomaly detection can help and share techniques to guide effective decision making.
Topics include:
* An overview correlation analysis
* Robust correlation analysis
* Trade-offs between speed and accuracy
* Multi-modal correlation analysis
compute tier. Detection and filtering of anomalies in live data is of paramount importance for robust decision making. To this end, in this talk we share techniques for anomaly detection in live data.
In this tutorial we walk through state-of-the-art streaming systems, algorithms, and deployment architectures and cover the typical challenges in modern real-time big data platforms and offering insights on how to address them. We also discuss how advances in technology might impact the streaming architectures and applications of the future. Along the way, we explore the interplay between storage and stream processing and discuss future developments.
Anomaly detection in real-time data streams using HeronArun Kejariwal
Twitter has become the de facto medium for consumption of news in real time, and billions of events are generated and analyzed on a daily basis. To analyze these events, Twitter designed its own next-generation streaming system, Heron. Arun Kejariwal and Karthik Ramasamy walk you through how Heron is used to detect anomalies in real-time data streams. Although there’s been over 75 years of prior work in anomaly detection, most of the techniques cannot be used off the shelf because they’re not suitable for high-velocity data streams. Arun and Karthik explain how to make trade-offs between accuracy and speed and discuss incremental approaches that marry sampling with robust measures such as median and MCD for anomaly detection.
Data Data Everywhere: Not An Insight to Take Action UponArun Kejariwal
The big data era is characterized by ever-increasing velocity and volume of data. Over the last two or three years, several talks at Velocity have explored how to analyze operations data at scale, focusing on anomaly detection, performance analysis, and capacity planning, to name a few topics. Knowledge sharing of the techniques for the aforementioned problems helps the community to build highly available, performant, and resilient systems.
A key aspect of operations data is that data may be missing—referred to as “holes”—in the time series. This may happen for a wide variety of reasons, including (but not limited to):
# Packets being dropped due to unresponsive downstream services
# A network hiccup
# Transient hardware or software failure
# An issue with the data collection service
“Holes” in the time series on data analysis can potentially skew the analysis of data. This in turn can materially impact decision making. Arun Kejariwal presents approaches for analyzing operations data in the presence of “holes” in the time series, highlighting how missing data impacts common data analysis such as anomaly detection and forecasting, discussing the implications of missing data on time series of different granularities, such as minutely and hourly, and exploring a gamut of techniques that can be used to address the missing data issue (e.g., approximate the data using interpolation, regression, ensemble methods, etc.). Arun then walks you through how the techniques can be leveraged using real data.
Real Time Analytics: Algorithms and SystemsArun Kejariwal
In this tutorial, an in-depth overview of streaming analytics -- applications, algorithms and platforms -- landscape is presented. We walk through how the field has evolved over the last decade and then discuss the current challenges -- the impact of the other three Vs, viz., Volume, Variety and Veracity, on Big Data streaming analytics.
Finding bad apples early: Minimizing performance impactArun Kejariwal
The big data era is characterized by the ever-increasing velocity and volume of data. In order to store and analyze the ever-growing data, the operational footprint of data stores and Hadoop have also grown over time. (As per a recent report from IDC, the spending on big data infrastructure is expected to reach $41.5 billion by 2018.) The clusters comprise several thousands of nodes. The high performance of such clusters is vital for delivering the best user experience and productivity of teams.
The performance of such clusters is often limited by slow/bad nodes. Finding slow nodes in large clusters is akin to finding a needle in a haystack; hence, manual identification of slow/bad nodes is not practical. To this end, we developed a novel statistical technique to automatically detect slow/bad nodes in clusters comprising hundreds to thousands of nodes. We modeled the problem as a classification problem and employed a simple, yet very effective, distance measure to determine slow/bad nodes. The key highlights of the proposed technique are the following:
# Robustness against anomalies (note that anomalies may occur, for example, due to an ad-hoc heavyweight job on a Hadoop cluster)
# Given the varying data characteristics of different services, no one model fits all. Consequently, we parameterized the threshold used for classification
The proposed technique works well with both hourly and daily data, and has been in use in production by multiple services. This has not only eliminated manual investigation efforts, but has also mitigated the impact of slow nodes, which used to get detected after several weeks/months of lag!
We shall walk the audience through how the techniques are being used with REAL data.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
17. 17
REAL-TIME RECURRENT
LEARNING#*
# A Learning Algorithm for Continually Running Fully Recurrent Neural Networks [Williams and Zipser, 1989]
* A Method for Improving the Real-Time Recurrent Learning Algorithm [Catfolis, 1993]
18. UORO
A
APPROXIMATE
RTRL
UORO
[Unbiased Online Recurrent Optimization]
Works in a streaming fashion
Online, Memoryless
Avoids backtracking through past
activations and inputs
Low-rank approximation to forward-
mode automatic differentiation
Reduced computation and storage
KF-RTRL
[Kronecker Factored RTRL]
Kronecker product decomposition to
approximate the gradients
Reduces noise in the approximation
Asymptotically, smaller by a factor of n
Memory requirement equivalent to UORO
Higher computation than UORO
Not applicable to arbitrary architectures
# Unbiased Online Recurrent Optimization [Tallec and Ollivier, 2017]
#
* Approximating Real-Time Recurrent Learning with Random Kronecker Factors
[Mujika et al. 2018]
*
20. MEMORY-BASED RNN
ARCHITECTURES
20
BRNN: Bi-directional RNN
[Schuster and Paliwal, 1997]
GLU: Gated Linear Unit
[Dauphin et al. 2016]
Long Short-Term Memory: LSTM
[Hochreiter and Schmidhuber, 1996]
Gated Recurrent Unit: GRU
[Cho et al. 2014]
Gated Highway Network: GHN
[Zilly et al. 2017]
21. Neural Computation, 1997
* Figure borrowed from http://colah.github.io/posts/2015-08-Understanding-LSTMs/
(a) Forget gate (b) Input gate
(c) Output gate
St: hidden state
“The LSTM’s main idea is that, instead of compu7ng St
from St-1 directly with a matrix-vector product followed
by a nonlinearity, the LSTM directly computes St, which
is then added to St-1 to obtain St.” [Jozefowicz et al.
2015]
Resistant to vanishing gradient problem
Achieve better results when dropout is used
Adding bias of 1 to LSTM’s forget gate
*
22. Stacking d RNNs
Recurrence depth d
LONG CREDIT ASSIGNMENT
PATHS
Incorporates Highway layers inside the recurrent
transition
Highway layers in RHNs perform adaptive computation
Transform
Carry
H, T, C: Non-linear transforms
Regularization
Variational inference based dropout
* Figure borrowed from Silly et al. 2017
*
*
31. 31
Self
Relates different positions of a single sequence in order to compute a
representation of the same sequence
Also referred to as intra-attention
Global vs. Local
Global: alignment weights at are inferred from the current target state and all
the source states
Local: alignment weights at are inferred from the current target state and those
source states in the window.
Soft vs. Hard
Soft: Alignment weights are learned and placed “softly” over all patches in the
source image
Hard: only selects one patch of the image to attend to at a time
ATTENTION
FAMILY
34. ✦ Inspired by the cognitive analogy of reminding
๏ Designed to retrieve one or very few past states
✦ Incorporates a differentiable, sparse (hard) attention mechanism to select from past states
34SPARSE ATTENTIVE BACKTRACKING
TCA THROUGH
REMINDING
# Figure borrowed from Ke et al. 2018.
#
35. 35
HEALTH
CARE
# Figure borrowed from Song et al. 2018.
Multi-head Attention
Additional masking to enable causality
Inference
Diagnoses, Length of stay
Future illness, Mortality
Temporal ordering
Positional Encoding & Dense interpolation embedding
MULTI-VARIATE
Sensor measurement, Test results
Irregular sampling, Missing values and measurement errors
Heterogeneous, Presence of long range dependencies
#
37. Auto ML
Trend Anomaly Root Cause Forecast What If Optimization
Real-timeNo Code
Business Monitoring Business Forecast
No Data Scientist
ANODOT MISSION: MAKING BI AUTONOMOUS
39. 4
FINTECH
/ TREASURY DEPARTMENT
TRANSPORTATION
/ DATA SCIENCE DEPARTMENT
How many drivers
will I need tomorrow?
DEMAND FORECAST GROWTH FORECAST
Anticipate demand for inventory, products,
service calls and much more.
Anticipate revenue growth, expenses,
cash flow and other KPIs.
How many funds do I need
to allocate per currency?
Will we hit our targets
next quarter?
F O R E C A S T U S E C A S E S
FINTECH / TREASURY DEPARTMENT
X ?
X ?
TRANSPORTATION / BUSINESS OPERATIONS ALL INDUSTRIES / FINANCE DEPARTMENT
42. CONSIDERATION FOR ACCURATE FORECAST
Discovering influencing
metrics and events
1.
Ensemble of models2.
Identify and account
for data anomalies
3.
Identify and account
for different time
series behaviors
4.
43. HOW TO DISCOVER
INFLUENCING METRICS/EVENTS?
• Target time series +
forecast horizon
• Millions of
measures/events that can
used as features
INPUT:
• Step 1 is computationally
expensive for long
sequences: Use LSH for
speed
• Which correlation function
to use?
CHALLENGES:
STEP 1
Compute correlation
between target and each
measure/event (shifted by
the horizons)
STEP 2
Choose X most correlated
measures
STEP 3
Train forecast model
PROCEDURE:
45. IDENTIFYING AND ACCOUNTING
FOR DATA ANOMALIES
ANOMALIES
DEGRADE
FORECASTING
ACCURACY
How to remedy the
situation?
Discover anomalies and
use the information to
create new features:
Case 1: Anomalies can be explained by
external factors – enhance the anomalies
Case 2: Anomalies can’t be explained by
external factors – weight down the anomalies
46. •
•
•
IDENTIFYING AND ACCOUNTING
FOR DATA ANOMALIES
Case 1: Anomalies can be explained by
external factors – enhance the anomalies
Case 2: Anomalies can’t be explained by
external factors – weight down the anomalies
Discover anomalies and
use the information to
create new features:
SOLUTION:
49. POTENTIAL ADVANTAGES
● Train one model for many time series
● Less data required per time series
OPEN QUESTIONS
● Will a single model be more accurate than individual ones?
● Which types of differing behaviors impact the ability to train a
single model adversely, and which do not?
A SINGLE MODEL FOR THEM ALL?
50. A SINGLE MODEL FOR THEM ALL?
TESTING THE IMPACT OF EACH BEHAVIOR TYPE
LSTMsLSTMsLSTM for
each TS
One LSTM for
all TS
LSTMsLSTMsLSTM for
each TS
One LSTM for
all TS
One LSTM for
all TS
Train
Benchmark
Forecast
Horizontal line
Bench mark loss (absolute error)
Score 5
(model loss/
benchmark
loss)
Score 4
Score 3
Score 2
Score 1
Dataset
Compute
strength of
behavior for
each TS
High
strength
TS
Low
strength
TS
51. (by feature)
Score 5
high
Score 3
low/high
Score 1
low
Score 2
low
Score 4
high
Impact of the behavior to
mixed training
Impact of the behavior
on ability to forecast
A SINGLE MODEL FOR THEM ALL?
TESTING THE IMPACT OF EACH BEHAVIOR TYPE
52. Impact on accuracy
for joint training
Impact on accuracy for
variability of the behavior
seasonal_strength
curvature
x_pacf5
linearity
hurst
x_acf1
entropy
max_level_shift
time_level_shift
max_var_shift
time_kl_shift
unitroot_kpss
unitroot_pp
seasonal frequency
arch acf
garch_acf
seas_pacf
trough
peak
stability
lumpiness
diff2_acf10 e_acf10
diff1_acf10
x_acf10
arch_acf
max_kl_shift
2
Seasonality
Homodesdacity
1
53. MAIN
CONCLUSIONS /
SOLUTIONS
Impact on
accuracy
for joint
training
Impact on accuracy for
variability of the behavior
● Two main factors preventing simple training
of single models
● Seasonality: The frequency is the important
factor, no shape
● homoscedasticity (same variance): prevents
mixing, but strength of it impacts accuracy
overall
● Other behaviors have lower mixing impact
SOLUTIONS
● Separate TS for training based on behavior
● Embed behavior related features for a single
model training.
54. Requires efficient feature selection
1.
Preprocessing before training
boosts forecast accuracy
2.
Seasonality and homoscedasticity
are the key behaviors impacting
ability to train joint models
3.
KEY TAKEAWAYS
Discovering influencing
metrics and events
1.
Identify and account
for data anomalies
2.
Identify and account
for different time
series behaviors
3.
56. READINGS
37
[Rosenblatt]
Principles of Neurodynamics: Perceptrons
and the theory of brain mechanisms
[Eds. Anderson and Rosenfeld]
Neurocomputing: Foundations of
Research
[Eds. Rumelhart and McClelland]
Parallel and Distributed Processing
[Werbos]
The Roots of Backpropagation: From Ordered
Derivatives to Neural Networks and Political
Forecasting
[Eds. Chauvin and Rumelhart]
Backpropagation: Theory, Architectures
and Applications
[Rojas]
Neural Networks: A Systematic
Introduction
[BOOKS]
57. READINGS
38
Perceptrons [Minsky and Papert, 1969]
Une procedure d'apprentissage pour reseau a seuil assymetrique [Le Cun, 1985]
The problem of serial order in behavior [Lashley, 1951]
Beyond regression: New tools for prediction and analysis in the behavioral sciences [Werbos, 1974]
Connectionist models and their properties [Feldman and Ballard, 1982]
Learning-logic [Parker, 1985]
[EARLY WORKS]
58. READINGS
39
Learning internal representations by error propagation [Rumelhart, Hinton, and Williams, Chapter 8 in D. Rumelhart and F. McClelland, Eds.,
Parallel Distributed Processing, Vol. 1, 1986] (Generalized Delta Rule)
Generalization of backpropagation with application to a recurrent gas market model [Werbos, 1988]
Generalization of backpropagation to recurrent and higher order networks [Pineda, 1987]
Backpropagation in perceptrons with feedback [Almeida, 1987]
Second-order backpropagation: Implementing an optimal O(n) approximation to Newton's method in an artificial neural network [Parker,
1987]
Learning phonetic features using connectionist networks: an experiment in speech recognition [Watrous and Shastri, 1987] (Time-delay NN)
[BACKPROPAGATION]
59. READINGS
40
Backpropagation: Past and future [Werbos, 1988]
Adaptive state representation and estimation using recurrent connectionist networks [Williams, 1990]
Generalization of back propagation to recurrent and higher order neural networks [Pineda, 1988]
Learning state space trajectories in recurrent neural networks [Pearlmutter 1989]
Parallelism, hierarchy, scaling in time-delay neural networks for spotting Japanese phonemes/CV-syllables [Sawai et al. 1989]
The role of time in natural intelligence: implications for neural network and artificial intelligence research [Klopf and Morgan, 1990]
[BACKPROPAGATION]
60. READINGS
41
Recurrent Neural Network Regularization [Zaremba et al. 2014]
Regularizing RNNs by Stabilizing Activations [Krueger and Memisevic, 2016]
Sampling-based Gradient Regularization for Capturing Long-Term Dependencies in Recurrent Neural Networks [Chernodub and Nowicki 2016]
A Theoretically Grounded Application of Dropout in Recurrent Neural Networks [Gal and Ghahramani, 2016]
Noisin: Unbiased Regularization for Recurrent Neural Networks [Dieng et al. 2018]
State-Regularized Recurrent Neural Networks [Wang and Niepert, 2019]
[REGULARIZATION of RNNs]
61. READINGS
42
A Decomposable Attention Model for Natural Language Inference [Parikh et al. 2016]
Hybrid Computing Using A Neural Network With Dynamic External Memory [Graves et al. 2017]
Image Transformer [Parmar et al. 2018]
Universal Transformers [Dehghani et al. 2019]
The Evolved Transformer [So et al. 2019]
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context [Dai et al. 2019]
[ATTENTION & TRANSFORMERS]
62. READINGS
43
Financial Time Series Prediction using hybrids of Chaos Theory, Multi-layer Perceptron and Multi-objective Evolutionary Algorithms [Ravi et
al. 2017]
Model-free Prediction of Noisy Chaotic Time Series by Deep Learning [Yeo, 2017]
DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks [Salinas et al. 2017]
Real-Valued (Medical) Time Series Generation With Recurrent Conditional GANs [Hyland et al. 2017]
R2N2: Residual Recurrent Neural Networks for Multivariate Time Series Forecasting [Goel et al. 2017]
Temporal Pattern Attention for Multivariate Time Series Forecasting [Shih et al. 2018]
[TIME SERIES PREDICTION]
63. READINGS
44
Unbiased Online Recurrent Optimization [Tallec and Ollivier, 2017]
Approximating real-time recurrent learning with random Kronecker factors [Mujika et al. 2018]
Theory and Algorithms for Forecasting Time Series [Kuznetsov and Mohri, 2018]
Foundations of Sequence-to-Sequence Modeling for Time Series [Kuznetsov and Meriet, 2018]
On the Variance Unbiased Recurrent Optimization [Cooijmans and Martens, 2019]
Backpropagation through time and the brain [Lillicrap and Santoro, 2019]
[POTPOURRI]
64. RESOURCES
45
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
A review of Dropout as applied to RNNs
https://medium.com/@bingobee01/a-review-of-dropout-as-applied-to-rnns-72e79ecd5b7b
https://distill.pub/2016/augmented-rnns/
https://distill.pub/2019/memorization-in-rnns/
https://lilianweng.github.io/lil-log/2018/06/24/attention-attention.html
Using the latest advancements in deep learning to predict stock price movements
https://towardsdatascience.com/aifortrading-2edd6fac689d
How to Use Weight Regularization with LSTM Networks for Time Series Forecasting
https://machinelearningmastery.com/use-weight-regularization-lstm-networks-time-series-forecasting/