In this talk, we present a CNN architecture for predicting autoregressive asynchronous time series. We illustrate its application on predicting traders’ quotes of credit default swaps (proprietary dataset from Hellebore Capital), and on artificial time series. The paper is available there: http://proceedings.mlr.press/v80/binkowski18a/binkowski18a.pdf

A review of two decades of correlations, hierarchies, networks and clustering...

Opinionated review of two decades of correlations, hierarchies,
networks and clustering in financial markets presented at Ton Duc Thang University in Ho Chi Minh City, Vietnam.

Network and risk spillovers: a multivariate GARCH perspective

M. Billio, M. Caporin, L. Frattarolo, L. Pelizzon: “Network and risk spillovers: a multivariate GARCH perspective”.
Final SYRTO Conference - Université Paris1 Panthéon-Sorbonne
February 19, 2016

Entropy and systemic risk measures

Entropy and systemic risk measures
M. Billio, R. Casarin, M. Costola, A. Pasqualini
Ca’ Foscari Venice University
Final SYRTO Conference - Université Paris1 Panthéon-Sorbonne
February 19, 2016

Clustering CDS: algorithms, distances, stability and convergence rates

Talk given at CMStatistics 2016 (http://cmstatistics.org/CMStatistics2016/).
The standard methodology for clustering financial time series is quite brittle to outliers / heavy-tails for many reasons: Single Linkage / MST suffers from the chaining phenomenon; Pearson correlation coefficient is relevant for Gaussian distributions which is usually not the case for financial returns (especially for credit derivatives). At Hellebore Capital Ltd, we strive to improve the methodology and to ground it. We think that stability is a paramount property to verify, which is closely linked to statistical convergence rates of the methodologies (combination of clustering algorithms and dependence estimators). This gives us a model selection criterion: The best clustering methodology is the methodology that can reach a given 'accuracy' with the minimum sample size.

A short introduction to statistical learning

This document provides an introduction to statistical learning methods. It begins with background information on statistical learning problems and discusses concepts like underfitting, overfitting, and consistency. It then summarizes decision trees and random forests, describing how they are learned from data and make predictions. Support vector machines and neural networks are also briefly mentioned. Key goals of statistical learning methods include accuracy on training data as well as generalization to new data.

A new-quantile-based-fuzzy-time-series-forecasting-model

The document presents a new quantile based fuzzy time series forecasting model. It begins by reviewing existing fuzzy time series forecasting methods and their applications. It then proposes a new method that bases forecasts on predicting future trends in the data using third order fuzzy relationships. The method converts statistical quantiles into fuzzy quantiles using membership functions. It uses a fuzzy metric and trend forecast to calculate future values. The method is applied to TAIFEX index forecasting. Results show the proposed method performs comparably better than other fuzzy time series methods in terms of complexity and forecasting accuracy.

This document summarizes a seminar on econometrics and machine learning given by Arthur Charpentier at Università degli studi dell’Insubria in May 2018. It discusses the history and development of econometrics, including its probabilistic foundations. It also covers key econometric techniques like regression, maximum likelihood estimation, and nonparametric methods. Model selection criteria like AIC and BIC are also briefly discussed. The document provides a high-level overview of major topics in econometrics through the lens of its use in large datasets and connection to machine learning.

This document contains an agenda and introduction for a summer school lecture series on big data for economics given by Arthur Charpentier. The introduction discusses how big data brings new questions to economics due to issues like the curse of dimensionality and new types of data like text, network, and tensor data. It provides examples of how these new types of data differ from traditional individual data and require new techniques. The document also introduces two toy datasets that will be used in the lectures: a medical dataset on myocardial infarction and a simulated binary choice dataset.

Probabilistic Modelling with Information Filtering Networks

Information filtering networks can be used to construct sparse probabilistic models for predictions and risk quantification

Multimodal Deep Learning

Deep neural networks have boosted the convergence of multimedia data analytics in a unified framework shared by practitioners in natural language, vision and speech. Image captioning, lip reading or video sonorization are some of the first applications of a new and exciting field of research exploiting the generalization properties of deep neural representation. This tutorial will firstly review the basic neural architectures to encode and decode vision, text and audio, to later review the those models that have successfully translated information across modalities. The contents of this tutorial are available at: https://telecombcn-dl.github.io/2019-mmm-tutorial/.

1.IntroDescriptiveDisplay-20222023WS.pdf

This example shows a correlation, but does not prove causation. While the Redskins' performance and election outcomes changed together frequently, the football game results did not cause the election results.
18/10/2022
Prof. Dr. Constantinos Antoniou | Applied Statistics in Transport 56
Correlation vs. Causation
Correlation:
- Two variables change together
- Does not imply one causes the other
Causation:
- One variable causes changes in the other
- Requires evidence the relationship is not due to chance or other factors
Just because two things correlate does NOT mean one causes the other. Additional analysis is needed to establish causation.
18/10/2022
Prof.

QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...

QMC: Transition Workshop - Selected Highlights from the Probabilistic Numeric...The Statistical and Applied Mathematical Sciences Institute

This document summarizes the work of Working Group II on Probabilistic Numerics from the SAMSI QMC Transition Workshop. The working group aims to develop probabilistic numerical methods that provide a richer probabilistic quantification of numerical error in outputs, allowing for better statistical inference. Members of the working group have published several papers on topics like Bayesian probabilistic numerical methods for solving differential equations and performing integral approximations, and applying these methods to problems in mathematical epidemiology and industrial process monitoring. The group has also organized workshops and reading groups to discuss the development of probabilistic numerical methods.Selective and incremental re-computation in reaction to changes: an exercise ...

Invited research seminar given at Durham University, Computer Science about findings from the Recomp project http://recomp.org.uk/

“Un modelo basado en agentes para el estudio de la actividad en redes sociale...

“Un modelo basado en agentes para el estudio de la actividad en redes sociale...Complejidady Economía

Conferencia 13.agosto.2013 Seminario de Complejiad y Economía
“Un modelo basado en agentes para el estudio de la actividad en redes sociales online”Slides ub-2

1) The document discusses simulation-based techniques and the bootstrap method for economics.
2) It provides historical references for permutation methods dating back to Fisher (1935) and jackknife and bootstrapping which started with Monte Carlo algorithms in the 1940s.
3) Bootstrapping is introduced as an asymptotic refinement based on computer simulations that generates additional samples from the original data to reduce uncertainty, rather than collecting more observations.

QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...

QMC: Undergraduate Workshop, Monte Carlo Techniques in Earth Science - Amit A...The Statistical and Applied Mathematical Sciences Institute

A crucial ingredient of a successful weather prediction system is its ability to combine observational data with the
output of numerical weather prediction models to estimate the state of the atmosphere and the oceans. This problem of estimation of the state of a high dimensional chaotic system such as the atmosphere, given noisy and partial observations of it is known as data assimilation in the context of earth sciences. The main object of interest in these problems is
the conditional distribution, called the posterior, of the state conditioned on the observations. Monte Carlo methods are the most commonly used techniques to study this posterior and also to use it efficiently for prediction. I will give a general introduction to the data assimilation problems and also to Monte Carlo techniques, followed by a discussion of some commonly used Monte Carlo algorithms for data assimilation.Kernel methods and variable selection for exploratory analysis and multi-omic...

Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021

2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...

2018 Modern Math Workshop - Foundations of Statistical Learning Theory: Quint...The Statistical and Applied Mathematical Sciences Institute

This document provides an introduction to statistical machine learning and statistical learning theory. It begins by acknowledging the invitation to present and then outlines topics to be covered, including input/output spaces, loss functions, risk functionals, generalization error, and regularization. Examples of applications like handwritten digit recognition and accent recognition are presented. It discusses challenges in classification problems like imbalanced data and complex decision boundaries. The goal of statistical learning theory is to minimize theoretical risk by finding the best predictive function, while accounting for limitations like an unknown data distribution.Inference via Bayesian Synthetic Likelihoods for a Mixed-Effects SDE Model of...

- The document describes a talk given by Umberto Picchini on Bayesian inference for a mixed-effects stochastic differential equation (SDE) model of tumor growth.
- The model introduces a state-space model for tumor growth in mice with dynamics driven by an SDE and formulates a mixed-effects SDE model to estimate population parameters.
- The talk aims to show how to perform approximate Bayesian inference for the mixed-effects SDE model using synthetic likelihoods.

Inventory theory presentation

This slide was used in the "Mathematics of Logistics" seminar at Nishinari Laboratory, Faculty of Engineering, the University of Tokyo.
references:
1.久保幹雄 (2007) 『ロジスティクスの数理』 共立出版
2.Dimitri P. Bertsekas (2005). Dynamic Programming and Optimal Control. Athena Scientific. Vol 1,2. 4th edition.

Kernel methods for data integration in systems biology

This document provides an overview of a seminar presentation on kernel methods for data integration in systems biology. It begins with short biographies of the presenter, who is trained as a mathematician and statistician and applies their skills to research in human health and animal genomics using various omics data types. Examples are given of the presenter's past work inferring networks and integrating gene expression and lipid data, as well as expression and 3D DNA location data. The talk will discuss how to integrate multiple omics data from different sources and types using kernels. Kernels allow reducing high-dimensional data to similarity matrices and are not restricted to numeric data. They also allow embedding expert knowledge and provide a framework for statistical learning.

Learning Intrusion Prevention Policies Through Optimal Stopping

The document discusses formulating intrusion prevention as an optimal stopping problem. It describes a use case where a defender monitors an infrastructure for signs of intrusion by an attacker. The defender can take defensive actions or stops at different time steps, with the goal of stopping an intrusion. This problem is modeled as a partially observable Markov decision process (POMDP) where the optimal strategy is to determine the optimal times to stop and take defensive actions based on observations over time.

MediaEval 2018: Fine grained sport action recognition: Application to table t...

This document discusses fine-grained sport action recognition applied to table tennis. It introduces two proposed tasks for the MediaEval benchmark: 1) recognizing temporally segmented strokes from a table tennis video and assigning them to one of 21 classes, and 2) performing the same recognition but without temporal segmentation, allowing a 10% error range on boundaries. It also describes the TTStroke-21 dataset created for these tasks, which contains over 1,000 annotated strokes from 129 table tennis videos.

MediaEval 2018: Ensembled Convolutional Neural Network Models for Retrieving ...

Paper: http://ceur-ws.org/Vol-2283/MediaEval_18_paper_27.pdf
Youtube: https://youtu.be/iDwuoVfpDKQ
Yu Feng, Sergiy Shebotnov, Claus Brenner, Monika Sester, Ensembled Convolutional Neural Network Models for Retrieving Flood Relevant Tweets. Proc. of MediaEval 2018, 29-31 October 2018, Sophia Antipolis, France.
Abstract: Social media, which provides instant textual and visual information exchange, plays a more important role in emergency response than ever before. Many researchers nowadays are focusing on disaster monitoring using crowd sourcing. Interpretation and retrieval of such information significantly influences the efficiency of these applications. This paper presents a method proposed by team EVUS-ikg for the MediaEval 2018 challenge on Multimedia Satellite Task. We only focused on the subtask “flood classification for social multimedia”. A supervised learning method with an ensemble of 10 Convolutional Neural Networks (CNN) was applied to classify the tweets in the benchmark.
Presented by Yu Feng

Link-wise Artificial Compressibility Method: a simple way to deal with comple...

The document summarizes the Link-wise Artificial Compressibility Method (LW-ACM), which is a simplified version of the Lattice Boltzmann Method (LBM) for solving fluid flow problems with complex geometries on structured grids. It discusses how LW-ACM modifies the standard Artificial Compressibility Method (ACM) to use a link-wise formulation that borrows ideas from LBM to handle boundaries without needing complex mesh generation. The document provides examples showing LW-ACM can accurately simulate flows like Couette flow and Couette flow with wall injection on simple structured grids.

A FRIENDLY APPROACH TO PARTICLE FILTERS IN COMPUTER VISION

This is a friendly approach to particle filters. Some hints, examples, and good practices to be able to successfully apply particle filters to solve your computer vision pro

Using Large Language Models in 10 Lines of Code

Modern NLP models can be daunting: No more bag-of-words but complex neural network architectures, with billions of parameters. Engineers, financial analysts, entrepreneurs, and mere tinkerers, fear not! You can get started with as little as 10 lines of code.
Presentation prepared for the Abu Dhabi Machine Learning Meetup Season 3 Episode 3 hosted at ADGM in Abu Dhabi.

What deep learning can bring to...

... two decades of correlation, hierarchies, networks and clustering in financial markets
Summary of some of my past research work at Complex Networks 2022.
The study of correlations, hierarchies, networks and communities (or clustering) has more than 20 years of history in econophysics.
However, for the practitioner, it seems that these tools are not fully ready yet:
Many questions around their proper use for trading or risk monitoring are left unanswered.
Deep Learning might help solve some hard problems such as finding more reliably communities (or clusters) and their number.
Running large simulations (based on GANs, VAEs or realistic market simulators) could also help understand when complex networks methods can give wrong insights (e.g. not enough data, or not stationary enough; too low correlations).
Conference: Complex Networks 2022 in Palermo, Sicily, Italy.

A quick demo of Top2Vec With application on 2020 10-K business descriptions

A short presentation I did at the Hong Kong Machine Learning Meetup Season 4 Episode 4. Top2Vec is a novel method to find topics in a corpus of documents. It can automatically find a relevant number of topics in the corpus. Besides, you get also relevant word and document vectors for further processing.

How deep generative models can help quants reduce the risk of overfitting?

How deep generative models can help quants reduce
the risk of overfitting? Applications of GANs for Quants.
Presentation at the "QuantUniversity Autumn School 2020".

Generating Realistic Synthetic Data in Finance

Talk at IHS Markit Webinar (15 October 2020) on the potential Applications of GANs in Finance. These models could be useful for quants and their managers to avoid over-fitting, portfolio and risk managers for proper capital and risk allocation, cloud computing servicing willing to work with banks and other sensitive data rich organizations, auditors and regulators to detect frauds, and data vendors (such as IHS Markit) to bring new products to market and iterate quickly with clients.

Applications of GANs in Finance

This presentation highlights potential use cases of deep generative models, and Generative Adversarial Networks (GANs) in particular, in Finance. Essentially, these models are useful to generate realistic synthetic datasets. Quantitative Strategists, Traders, Asset and Risk Managers can find these novel techniques useful. Auditors and Regulators should also become aware of their existence as they may be source of new accounting frauds and misleading financial statements (deepfakes).

My recent attempts at using GANs for simulating realistic stocks returns

A presentation for the Hong Kong Machine Learning meetup summarizing my hobby research over the past year. My goal is to be able to simulate realistic multivariate financial time series. If so, I will be able to compare different statistical methods for portfolio construction, studying complex networks, algorithmic trading, being able to do some reinforcement learning, etc. Still far from being achieved...

Takeaways from ICML 2019, Long Beach, California

The document summarizes takeaways from various talks and presentations at the ICML 2019 conference. It discusses topics like safe machine learning and biases in algorithms, active learning techniques, attention mechanisms in deep learning, differential privacy in census data, time series forecasting methods, Hawkes processes, Shapley values for explainability and data valuation, topological data analysis, optimal transport, applications of machine learning in robotics, Gaussian processes, learning from noisy labels, interpretability methods in NLP, and the GluonTS library for probabilistic time series modeling.

Some contributions to the clustering of financial time series - Applications ...

This document discusses contributions to clustering financial time series, specifically credit default swap data. It introduces credit default swaps and the raw data set. It then discusses challenges in clustering financial time series due to non-stationarity and noisy correlations. It presents initial work on analyzing the consistency of clustering as the sample size increases, through simulations in a simplified setting. Finally, it proposes a two-step approach to proving consistency, by first identifying geometrical configurations that lead to the true clustering structure.

Clustering Financial Time Series using their Correlations and their Distribut...

This document discusses methods for clustering random walks. It introduces the GNPR (Generic Non-Parametric Representation) method for defining a distance between two random walks that separates dependence and distribution information. The GNPR method is shown to outperform standard approaches on synthetic datasets containing different clusters based on distribution and dependence. The GNPR method is also used to cluster credit default swaps, identifying a cluster of "Western sovereigns". The document concludes that GNPR is an effective way to deal with dependence and distribution information separately without losing information.

Optimal Transport vs. Fisher-Rao distance between Copulas

How can we compare two dependence structures (represented by copulas)? It depends on the task. For clustering variables with similar dependence, prefer Optimal Transport. For detecting change points in a dynamical dependence structure, prefer Fisher-Rao and its associated f-divergences (for example, an approach a la Frédéric Barbaresco in radar signal processing). This study illustrates these properties with bivariate Gaussian copulas.

On Clustering Financial Time Series - Beyond Correlation

This document discusses clustering financial time series data using correlation matrices. It summarizes that analyzing 560 credit default swaps over 2500 days, the empirical correlation matrix eigenvalues closely match the theoretical Marchenko-Pastur distribution, indicating noise. Only 26 eigenvalues exceed the theoretical maximum, which may correspond to market and industry factors. Hierarchical clustering can reorder assets to reveal correlation patterns. Filtering by this reveals the underlying network structure. Beyond correlations, copulas represent the dependence structure, and a distance measure is proposed combining L1 and L0 distances of cumulative distribution functions to cluster on full distributions rather than just correlations. Stability tests show the proposed approach yields more robust clusters than standard correlation-based methods.

Optimal Transport between Copulas for Clustering Time Series

Presentation slides of our ICASSP 2016 conference paper in Shanghai. They describe the motivation and design of the Target Dependence Coefficient, a coefficient which can target or forget specific dependence relationships between the variables. This coefficient can be useful for clustering financial time series. Several of such use-cases are described on our Tech Blog https://www.datagrapple.com/Tech/optimal-copula-transport.html

On the stability of clustering financial time series

Talk at IEEE ICMLA 2015 Miami
In this presentation, we suggest some data perturbations that can help to validate or reject a clustering methodology besides yielding insights on the time series at hand. We show in this study that Pearson correlation is not that relevant for clustering these time series since it yields unstable clusters; prefer a more robust measure such as Spearman correlation based on rank statistics.

Clustering Random Walk Time Series

This document discusses clustering random walk time series. It introduces the concept of clustering and discusses challenges in clustering time series, such as how to define a distance between dependent random variables. It proposes using the copula transform and empirical copula transform to map time series to uniform distributions before calculating distances. The hierarchical block model for nested data partitions is presented. Experimental results show the proposed distances perform well in recovering the nested partitions on both synthetic hierarchical block model data and real credit default swap time series data. Consistency of clustering algorithms is defined in relation to recovering the hierarchical block model partitions.

On clustering financial time series - A need for distances between dependent ...

This document discusses clustering financial time series data using distances between dependent random variables. It notes that traditional clustering based only on correlation can lead to spurious clusters, as correlation does not fully capture dependence. The paper proposes a distance measure that combines information about both the correlation and distribution of random variables. It tests this distance measure on synthetic data from a hierarchical block model and real credit default swap market data, finding it performs better than distances based only on correlation or distribution individually. Some open questions are also discussed, such as how to select the optimal weighting of correlation vs distribution information.

- 1. Autoregressive Convolutional Neural Networks for Asynchronous Time Series Hong Kong Machine Learning Meetup - Season 1 Episode 1 Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat Imperial College London, Ecole Polytechnique, Hellebore Capital 18 July 2018 HELLEBORECAPITAL Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 1 / 10
- 2. Introduction Problem: Many real-world time series are asynchronous, i.e. the durations between consecutive observations are irregular/random or the separate dimensions are not observed simultaneously. Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 2 / 10
- 3. Introduction Problem: Many real-world time series are asynchronous, i.e. the durations between consecutive observations are irregular/random or the separate dimensions are not observed simultaneously. At the same time: time series models usually require both regularity of observations and simultaneous sampling of all dimensions, continuous-time models often require simultaneous sampling. Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 2 / 10
- 4. Introduction Problem: Many real-world time series are asynchronous, i.e. the durations between consecutive observations are irregular/random or the separate dimensions are not observed simultaneously. At the same time: time series models usually require both regularity of observations and simultaneous sampling of all dimensions, continuous-time models often require simultaneous sampling. Numerous interpolation methods have been developed for preprocessing of asynchronous series. However,... Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 2 / 10
- 5. Drawbacks of synchronous sampling ... every interpolation method leads to either increase in the number of data points or loss of data. 0 20 40 60 80 100 original series Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 3 / 10
- 6. Drawbacks of synchronous sampling ... every interpolation method leads to either increase in the number of data points or loss of data. 0 20 40 60 80 100 original series frequency = 10s; information loss But the situation can be much worse... Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 3 / 10
- 7. Drawbacks of synchronous sampling ... every interpolation method leads to either increase in the number of data points or loss of data. 0 20 40 60 80 100 original series frequency = 10s; information loss frequency = 1s; 12x more points But the situation can be much worse... Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 3 / 10
- 8. Drawbacks of synchronous sampling WLPH SULFH HYROXWLRQRITXRWHGSULFHVWKURXJKRXWRQHGD VRXUFH$ELG VRXUFH$DVN VRXUFH%ELG VRXUFH%DVN VRXUFHELG VRXUFHDVN VRXUFH'ELG VRXUFH'DVN Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 4 / 10
- 9. Drawbacks of synchronous sampling WLPH SULFH HYROXWLRQRITXRWHGSULFHVWKURXJKRXWRQHGD VRXUFH$ELG VRXUFH$DVN VRXUFH%ELG VRXUFH%DVN VRXUFHELG VRXUFHDVN VRXUFH'ELG VRXUFH'DVN Objectives: Propose alternative representation of asynchronous data, Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 4 / 10
- 10. Drawbacks of synchronous sampling WLPH SULFH HYROXWLRQRITXRWHGSULFHVWKURXJKRXWRQHGD VRXUFH$ELG VRXUFH$DVN VRXUFH%ELG VRXUFH%DVN VRXUFHELG VRXUFHDVN VRXUFH'ELG VRXUFH'DVN Objectives: Propose alternative representation of asynchronous data, Find neural network architecture appropriate for such representation. Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 4 / 10
- 11. How to deal with asynchronous data? 0 0.3 1 1.5 1.8 2.7 3.5 4.20.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 value time X Y duration Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 5 / 10
- 12. How to deal with asynchronous data? 0 0.3 1 1.5 1.8 2.7 3.5 4.20.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 value time X Y duration X indicator value Y indicator duration Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 5 / 10
- 13. How to deal with asynchronous data? 0 0.3 1 1.5 1.8 2.7 3.5 4.20.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 value time X Y duration 1 4.0 0 .3 X indicator value Y indicator duration Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 5 / 10
- 14. How to deal with asynchronous data? 0 0.3 1 1.5 1.8 2.7 3.5 4.20.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 value time X Y duration 1 4.0 7.5 0 0 1 .3 .7 X indicator value Y indicator duration Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 5 / 10
- 15. How to deal with asynchronous data? 0 0.3 1 1.5 1.8 2.7 3.5 4.20.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 value time X Y duration 1 1 4.0 7.5 0 0 1 .3 .7 9.0 2.3 0 1 1 0 .5 .3 7.7 5.0 1 0 0 1 .9 .6 4.5 5.1 1 0 0 .7 1.3 X indicator value Y indicator duration Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 5 / 10
- 16. Not satisfactory performance of Neural Nets Architectures such as Long-Short Term Memory (LSTM) and Convolutional Neural Networks (CNN) do not perform as well as expected, compared to simple autoregressive (AR) model Xn = M m=1 Xn−m × am + εn (1) Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 6 / 10
- 17. Not satisfactory performance of Neural Nets Architectures such as Long-Short Term Memory (LSTM) and Convolutional Neural Networks (CNN) do not perform as well as expected, compared to simple autoregressive (AR) model. Idea: equip AR model with data-dependent weights Xn = M m=1 Xn−m × am(Xn−m) + εn (1) Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 6 / 10
- 18. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 19. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 20. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 21. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 22. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors Offset networkSignificance network Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 23. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors Convolution kx1 kernel c channels Convolution 1x1 kernel c channels Offset networkSignificance network Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 24. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors × (𝑵 𝑺 − 𝟏) layers Convolution kx1 kernel c channels × (𝑵 𝒐𝒇𝒇 − 𝟏) layers Convolution 1x1 kernel c channels Offset networkSignificance network Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 25. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors × (𝑵 𝑺 − 𝟏) layers Convolution kx1 kernel c channels Convolution 1x1 kernel dI channels Convolution kx1 kernel dI channels × (𝑵 𝒐𝒇𝒇 − 𝟏) layers Convolution 1x1 kernel c channels Offset networkSignificance network Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 26. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors × (𝑵 𝑺 − 𝟏) layers Convolution kx1 kernel c channels Convolution 1x1 kernel dI channels Convolution kx1 kernel dI channels × (𝑵 𝒐𝒇𝒇 − 𝟏) layers Convolution 1x1 kernel c channels Offset network 𝒙𝑰 Significance network Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps 𝐨𝐟𝐟 Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 27. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors Weighting 𝑯 𝒏−𝟏 = 𝝈 𝑺 ⨂ (𝐨𝐟𝐟 + 𝒙 𝑰 ) × (𝑵 𝑺 − 𝟏) layers Convolution kx1 kernel c channels 𝑺 𝛔 Convolution 1x1 kernel dI channels Convolution kx1 kernel dI channels × (𝑵 𝒐𝒇𝒇 − 𝟏) layers Convolution 1x1 kernel c channels Offset network 𝒙𝑰 Significance network Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps 𝐨𝐟𝐟 Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 28. Proposed Architecture The model predicts yn = E[xI n|x−M n ], where x−M n = (xn−1, . . . , xn−M) - regressors I = (i1, i2, . . . , idI ) - target dimensions with ˆyn = M m=1 W·,m ⊗ σ(S(x−M n ))·,m data dependent weights ⊗ oﬀ(xn−m) + xI n−m adjusted regressors Weighting 𝑯 𝒏−𝟏 = 𝝈 𝑺 ⨂ (𝐨𝐟𝐟 + 𝒙 𝑰 ) × (𝑵 𝑺 − 𝟏) layers Convolution kx1 kernel c channels 𝑺 𝛔 Convolution 1x1 kernel dI channels Convolution kx1 kernel dI channels × (𝑵 𝒐𝒇𝒇 − 𝟏) layers Convolution 1x1 kernel c channels Offset network 𝒙𝑰 Significance network Input series 𝒙 𝒕−𝟔 𝒙 𝒕−𝟓 𝒙 𝒕−𝟒 𝒙 𝒕−𝟑 𝒙 𝒕−𝟐 𝒙 𝒕−𝟏 d - dimensional timesteps Locally connected layer fully connected for each of 𝒅𝑰 dimensions 𝑯 𝒏 = 𝑾𝑯 𝒏−𝟏 + 𝒃 𝐨𝐟𝐟 ෝ𝒙 𝒕 𝑰 Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 7 / 10
- 29. Experiments Datasets: artiﬁcially generated, synchronous asynchronous Electricity consumption [UCI repository] Quotes [16 tasks] Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 8 / 10
- 30. Experiments Datasets: artiﬁcially generated, synchronous asynchronous Electricity consumption [UCI repository] Quotes [16 tasks] Benchmarks: (linear) VAR model vanilla LSTM, 1d-CNN 25-layer conv. ResNet Phased LSTM [Neil et al. 2016] Sync 16 Sync 64 Async 16 Async 64 Electricity Quotes0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 MSE VAR CNN ResNet LSTM Phased LSTM SOCNN (ours) Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 8 / 10
- 31. Experiments #2 Ablation study: Signiﬁcance Network needs more depth than the Oﬀset Past observations are pretty good predictors, we just need to weight them Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 9 / 10
- 32. Experiments #2 Ablation study: Signiﬁcance Network needs more depth than the Oﬀset Past observations are pretty good predictors, we just need to weight them Robustness: What happens to the error if we add noise to the input? DGGHGQRLVHLQVWDQGDUGGHYLDWLRQV
- 33. PVH WUDLQVHW 11 /670 /670 6211 VLJQLILFDQFH _RIIVHW_ DGGHGQRLVHLQVWDQGDUGGHYLDWLRQV
- 34. PVH WHVWVHW 11 /670 /670 6211 VLJQLILFDQFH _RIIVHW_ The proposed model seems to be more robust. Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 9 / 10
- 35. Code: https://github.com/mbinkowski/nntimeseries Thank you for your attention! Mikolaj Bi´nkowski, Gautier Marti, Philippe Donnat (Imperial College)CNNs for Asynchronous Time Series 18 July 2018 10 / 10