SlideShare a Scribd company logo
1 of 12
Download to read offline
Final Report : CFM Challenge
MVA course : Représentations Parcimonieuses
Khalil Bergaoui, Azza Ben Farhat
{khalil.bergaoui;azza.ben-farhat}@student.ecp.fr
March 23rd 2021
1 Introduction
In the context of the MVA course entitled ”Représentations Parcimonieuses”,
we participated in the CFM challenge : Stock Trading Prediction of Auction Vol-
ume. Throughout this report, we will detail the methodology that we adopted
during the challenge, we will present the results that we obtained and compare
them to CFM’s benchmark. Additionnally, we will discuss some of the difficul-
ties that we encountered in this project and discuss our final solution as well as
potential future directions.
In the next section, we will begin by briefly presenting the goal of the CFM
challenge and reviewing the related work.
2 Related Work
The goal of this year’s CFM challenge is to predict the volume (total value of
stock exchanged) available for auction, for 900 stocks over about 350 days. The
problem is thus formulated as a regression problem.
2.1 Litterature Overview
Although the litterature on the topic of auction volume prediction is not partic-
ularly rich, financial time series analysis as well as regression tasks are widely
covered topics in machine learning. Considering the diverse nature of this chal-
lenge’s input data (combination of independent values, identifiers and short
noisy time series, in particular the return and volume features), the auction
volume prediction problem can be tackled from different angles. In this section,
we will briefly present relevant methods to approach the problem that we have
found in the litterature.
While there is no particularly straightforward state-of-the-art method for
auction volume prediction, the basic strategy is to extend the techniques used
for supervised machine learning for regression tasks. In particular we have found
1
that better results are achieved with hybrid autoregressive models as in [5] where
the authors perform the prediction in two steps : first a model is trained to fit
the data, then a second model in trained to fit the difference between the first
model’s predictions and the ground truth data (the residual error). This was in
fact the strategy adopted by CFM’s benchmark model.
2.2 CFM Benchmark
A hybrid model is used as a benchmark in this challenge. In fact, a linear
regression model is trained to predict the auction volume, for a given data
sample x = (xi)1≤i≤126, using all columns except ”pid” : vpred = β0 +
P125
i=1 xiβi
where (βi)i are the regression parameters. Then a tree based ensemble learning
algorithm, LightGBM, includes the ’pid’ information to fit the residual error  =
vpred −vtrue between the ground truth target and the linear model’s prediction.
Note that LightGBM is a gradient boosting framework that learns by grow-
ing trees vertically (leaf-wise) and relies on a large set of hyperparameters that
need to be carefully fine-tuned in order to avoid overfitting.
3 Methodology
In this section, we will describe the steps that we carried during the project
and will describe the difficulties that we encountered. The starting point in this
data challenge was data exploration and dealing with noisy samples that took
the form of missing values.
3.1 Missing values
The following table summarizes information about missing data:
Rows with missing data in training set (%) 37%
Rows with missing data in test set (%) 33%
Missing values of absolute returns in train set (%) 5%
Missing values of absolute returns in test set (%) 4%
Table 1: Missing data in the train and test datasets.
Remark : We noticed that if the value of the nth
feature of the absolute
returns abs retn is missing, then the value of the nth
feature of relative volume
rel voln is also missing.
The difficulty when dealing with missing values is that we are unable to quan-
titatively anticipate the impact of the adopted strategy on the performance of
the learning algorithm, which is why we decided to start with simply remplacing
missing values with zeros as it was the case in CFM’s benchmark model.
2
3.2 Feature extraction
Once missing values are replaced, we can start extracting relevant features from
the data to use them as inputs for learning algorithms. In fact, a successful
predictive model would use informative features about the output, which in our
case is the auction volume for a given stock at a given day. So in addition to
the provided input columns, which represent daily information about a given
stock, it might be interesting to exploit the past of the auction volume for a
given stock or the interaction between different stocks. For this we performed
a quick correlation analysis that can be summerized in the below figures:
(a) Within-stock correlation (b) Between-stock correlation
On the left figure, we plot the auto-correlation of the auction volume time
series (over the 800 days in the train set) using different lag values. We randomly
picked two stocks: stock 360 (blue curve) and stock 850 (orange curve). On
the right figure, we display the correlation map of the target (auction volume)
between a random set of 50 different stocks. In both cases, we obtain low
correlation values. In addition, since in the test set we do not always have
access to the auction volume in the preceding days, we decided to focus our
study on the provided input columns only.
3.2.1 Principal Component Analysis
Principal Component Analysis (PCA) is a statistical technique generally used for
dimensionality reduction in machine learning[7]. Geometrically, it corresponds
to a projection method where data with m-columns (features) is projected into
a subspace with m or fewer columns, which are uncorrelated and orthogonal,
called principal components, whilst retaining the essence of the original data by
maximizing the variability of the data set that is contained in the new vectors.
The new vectors are ordered such that the retention of variation present in the
original variables decreases as we move down in the order. So, in this way, the
first principal component retains maximum variation that was present in the
original components.
We applied a PCA transform to the normalized train set without keeping
3
the ”pid”, ”date” and ”ID” features, to look for a possible new set of features
that could be used for training the model. The following table summarizes the
explained variance ratio of the first 5 principal components:
Principal component Variance ratio
1 9.99999560e-01
2 3.91120674e-07
3 4.83349491e-08
4 8.88623697e-11
5 2.14177158e-11,
Table 2: Explained variance ratio of the 5 first principal components
We can see that the first principle component contains almost all of the
variance of the original data. This means that replacing the original data set by
the first principle component is a good approximation, since it explains almost
all the variance of the original data. At the same time, such result implies that
all the columns (”pid”, ”date” and ”ID” excepted) are linearly dependent which
we found a bit surprising. Then, we computed the correlation between the first
principal componant and the output we want to predict (log of the auction
volume) and we have found a relatively low value of 0.18. Our guess is that in
the high dimensional space (126-dimensions), most data points are concentrated
along the same direction (first principal component) which is a biased direction
and we assume, by looking at the distribution of the target (auction volume)
values over the training set, displayed in Figure 3, that the bias corresponds to
data points corresponding to values around −2.
We would like to add that another possible interpretation is that the observed
linear dependency could have been ”fakely” caused by replacing the missing
values with non zero values (in this experiment we replaced each missing value in
a given column, for a given stock, with the average computed accross that same
column, for all rows corresponding to different days of the same stock). However
we repeated the same experiment when replacing with all missing values with
zero and obtained a similar result (99% variance explained by first component
only).
In this case, it seems that linear methods are not sufficient to capture the
specificity of the data set. One could possibly explore non linear methods for
the dimensionality reduction step, but for time constraints we were unable to
do it.
3.2.2 Wavelet Transform
The wavelet transform is a mathematical tool used in signal processing in or-
der to decompose signals over dilated and translated wavelets. In particular, a
wavelet ψδ,σ is a function parametrized with shift and scale parameters allow-
ing to analyze a given signal at multiple resolutions.Whereas there exist broad
4
categories of wavelet functions, we will restrict our application to real wavelets
since they are, in contrast with complex wavelets, often used to detect sharp
signal transitions[9]. In our case, we apply wavelet transforms to detect sharp
transitions in both the return signal and the volume signal (columns abs ret
and rel vol respectively in the input data) as they are likely to be significant
predictive features. In our experiments, we use a real version of the continuous
Morlet wavelet given by : ψ(t) = e− t2
2 cos(5t) such that the continuous wavelet
transform is applied to our discrete data (61 discrete samples for each of the
return and volume signals) as a convolution with the discretized integral of the
wavelet ψ. We use the the CWT implementation of the PyWavelet package [2]
and analyze the signals over a number of 32 scale ( In other words Larger scales
correspond to stretching of the wavelet. For example, at scale=10 the wavelet
is stretched by a factor of 10, making it sensitive to lower frequencies in the
signal).
Figure 2: Wavelet Transform at 32 scales.
Thus the 1D signal is transformed to 32 one-dimensional signals that can be
represented as an image as we can see from the above figure where we illustrate
the relevance of the wavelet transforms in capturing non-stationary transitions.
For instance, around the 15th
period of day 23, the volume of stock 739 ubruptly
increases. This sudden change is reflected in the multiple resolution wavelet
transform displayed below the signal.
Therefore, wavelet transforms represent a good candidate set of features that
allow us to use computer vision techniques in our predictive task as we will see
in more details in section 3.3
5
3.3 Algorithms
3.3.1 Nearest Neighbors
Using the previously described dimensionality reduction technique, PCA, we
are able to map the input features to a low-dimensional space in which the
application of nearest neighbors algorithm is more appropriate[4]. In this case,
given a test data sample (a 126-dimensional vector), we project it on the low-
dimensional space and use its K nearest neighbors among the training points (for
the Euclidean distance) to make the prediction, where K is a hyperparameter
that needs to be finetuned. However, such parameter can be tricky to optimize
since it heavily depends on the density distribution of datapoints in the lower-
dimensional space. Additionally, in our case, the density distribution of training
points in biased towards specific output values, as displayed in the below figure,
which could easily lead us to overfit the training data.
Figure 3: Distribution of the target(auction volume) over the training set.
3.3.2 Ensemble learning : Random Forest
Random forests [3] are an ensemble learning method for classification and regres-
sion tasks that operate by constructing a multitude of decision trees at training
time and outputting the class that is the mode of the classes (classification) or
mean/average prediction (regression) of the individual trees.
It applies the general technique of bootstrap aggregating, or bagging, to tree
learners, where each tree has a random sample with replacement of the training
set and fits trees to these samples. The difference between bagging trees and
random forests is that random forests use a modified tree learning algorithm
that selects, at each candidate split in the learning process, a random subset of
the features. This helps decrease the correlation between the different learners.
Once the trees are trained, the predictions are made by taking the average of
the predictions from all the individual regression trees.
Random forests are more efficient than decision trees because they are more
robust to noise and they overfit less.
6
3.3.3 Ensemble methods : Stacking
Stacking or Stacked Generalization is an ensemble machine learning algorithm
that uses a meta-learning algorithm to learn how to best combine the predic-
tions from two or more base machine learning algorithms. Generally it gives
good results when it combines models that have differnet learning rules and
outperforms the ensemble models when taken individually.
Its architecture is divided into two parts: the base-models or level-0 models
which are the models fit on the training data and used to make predictions,
and the meta-model or the level-1 model is the model who learns how to best
combine the predictions of the base-models. Once the base models are trained,
they are fed with unseen training data, and their predictions are paired with
the ground truth and fed to the meta-model.
It is preferred to use base-models that learn in different ways so that the
errors in predictions made by the models are uncorrelated or have a low corre-
lation. As for the meta-model, it is often a simple model, providing a smooth
interpretation of the predictions made by the base models.
We applied stacking using KNN and Random Forest models as base-models,
and Linear Regression as the meta-model. The prediction results obtained when
we trained the models on the 5 first principal components are presented in the
final table in section 3.4.
3.3.4 Neural Networks
In contrast with the previously described learning algorithms, neural networks
lie within the framework of representation learning, in the sense that they do
not always require carefully designed features since extracting discriminative
features from the input data is part of the learning process. However, the
difficulty lies in the tuning of both the architectural hyperparameters (layer type,
depth, hidden units, activation functions...) and training parameters (batch
size, optimizer ...). In fact, as it is shown in [11] the optimal hyperparameters
depend on the the given dataset and vary significantly from one training task
to another.
In our case, we adapt common deep learning techniques to CFM’s dataset:
- We apply computer vision methods : instead of using RGB images we use
a 2-channel image obtained by stacking the wavelet transforms, computed over
the input data as described section 3.1.2.
- We use stock embedding : this allows us to replace the stock identifier ’pid’
with a trainable real-valued vector of a fixed size d (hyperparameter). Thus the
stocks are represented with a trainable matrix W ∈ RS×d
, where S = 900 is the
number of distinct stocks in the dataset. Note the embedding is relevant in our
case because the same training stocks are present in the test set.
- We use non linear activation functions to capture non-linearity in the data.
Further in this section, we discuss the impact of the choice of the non-linearity.
- We use batch normalization [10] : this allows us, among other things, to
normalize scalar features which are not necessarily of the same order of magni-
7
tude (e.g day and LS columns).
In this perspective, we experiment with different neural network configura-
tions and use the mean squared error over the validation set as decision criterion.
In the below figure, we detail the architecture of the network that achieved low-
est validation MSE in our trials :
Figure 4: Prediction Network architecture.
At each training iteration, the data flows through the network from left
to right, the final layer acting as a linear regression model to transform the
99-dimensional feature vector φ(x) to the predicted auction volume. Then,
the prediction error is the backpropagated via gradient descent to update the
parameters of the network. We use Adam optimizer with a learning rate lr =
0.0001, a batch of 96 datasamples at each iteration and train the network for a
few epochs ≈ 7 until the validation MSE stops decreasing. Note that training
takes roughly 8 minutes for one epoch using mainstream Colab GPU, however
this can be optimized because, in our pipeline, we compute the wavelet transform
on CPU.
After fitting the network, we adopt the hybrid model strategy and train a
second neural network to ”correct” the first network’s predictions. In this step,
the input corresponds to the concatenation of the prediction network’s learned
representation φ(x) and the predicted value. This 100-dimensional vector is
used to train the correction network displayed in the below figure:
8
Figure 5: Correction Network architecture.
Similarly, we train this network using Adam optimizer with a lower learning
rate lr = 10−5
and a batch of 32 datasamples at each iteration. To avoid
overfitting, we train the network to minimize a regularized MSE loss of the
form : MSE(y, ypred) + λ||w||2 ; where λ is the regularization coefficient fixed
at 0.01 and w the weights of the final fully connected layer of the network.
The combination of the prediction network and the correction network, as
a hybrid model, allows us to achieve better performance than CFM’s proposed
benchmark model as reported in section 3.4.
Choice of the activation function:
As widely used in convolutional neural networks, we implemented our net-
work using ReLU (Rectified Linear Unit) as non linear activation function :
ReLU(x) = min(0, x). However, we were unable reach better performance than
the benchmark with architectural changes only. As it turned out, the ReLU
function was the source of a saturation at the output level since our model was
unable to predict values larger than a threshold of ≈ −1.62 which corresponds to
the network’s output bias (i.e the value predicted when the final feature vector
is the null vector). To fix this issue we replaced the ReLU with its Leaky version
with parameter 0.02: LeakyReLU(x) = 0.02 if x  0 and x if x ≥ 0, which
allows the flow of negative values through the network’s layers. As illustrated
in the below figure, this change in the activation function solves the issue.
9
(a) Output distribution with ReLU (b) Output distribution with LeakyReLU
(0.02)
Note that this effect might be caused by not scaling the wavelet transform
arrays to have values in [0, 1] before feeding them to the convolutional network
as it would have been the case when dealing with RGB image.
In addition to this choice of activation function, we made use of the residual
connection[8] in some layers of our network in order to enhance the flow of data.
3.4 Results
In this section, we report the performance of the algorithms presented in section
3 and compare them to CFM’s benchmark. We have found that ensemble learn-
ing through stacking gives better results compared to the one obained from the
base models taken individually. However, the test performance on the challenge
platform is 0.706. The gap between the validation and the test sets shows that
our stacking model overfitted the data. We assume that this is mainly due to
the difficulty in optimally fine-tuning its hyperparameters. We have also found
that training a prediction neural network, as described in section 3, achieves
good results but fails to perform better than CFM’s benchmark and in particu-
lar a neural network trained on wavelet transforms performs better and overfits
less (smaller gap between validation MSE and public MSE) than the same net-
work not trained on wavelet transforms. Additionally, when used as part of a
hybrid model (prediction network and correction network) the neural network
approach achieves lower mean squared error over the private set, compared to
the benchmark.
We summarize numerical results obtained on public and private sets in the
below table :
10
Model Validation MSE Public MSE Private MSE
KNN (Neraest Neighbors) 0.5041 - -
Random Forest 0.5242 - -
Stacking (KNN+Random Forest) 0.4605 0.7068 -
NN (ReLU without WT) 0.4103 0.5435 -
NN (ReLU + WT) 0.4612 0.5272 -
NN (LeakyReLU + WT) 0.4389 0.4963 -
CFM benchmark - 0.4742 0.4735
Hybrid NN 0.4072 0.4677 0.4650
Table 3: Performances of the trained models
4 Discussion
In this report, we have described the approach we adopted during the challenge
and the different methods we implemented. We have managed to improve the
performance of our predictive model and achieve slightly better results than
CFM’s benchmark, by adopting the same strategy of a hybrid predictive model
while changing the preprocessing steps as well as the learning algorithm. In par-
ticular, we have found that neural networks are able to achieve relatively good
results with a suitable architecture, for e.g the architectures that we proposed
in figures 4 and 5. However, we must point out that our final result does not
represent the optimal solution using the pipeline that we have described. In
fact, we did not study the impact of changing all the possible hyperparameters
since it is a tedious task, considering the large number of their possible combi-
nations. Which is why we believe that our final results can be further improved
by changing certain parameters such as the stock embedding dimension, the size
of convolutional and maxpooling operations, the activation function,etc.
Additionally, we believe that further improvements can be achieved by focus-
ing on the preprocessing step and trying to extract even more relevant features
from the raw data. One particularly interesting method we have found in the
litterature can be applied to the return and volume columns by viewing them
as time series and applying Topological Data Analysis tools, as in [6], to extract
significant topological features that would help with the prediction. In fact,
we have experimented with this method, using the open source python package
gudhi [1]. We adopted the same approach described in in order to extract the
L1 norm of the first persistence landscapes [6] of the point cloud represented by
both the return and volume time series. We have found that such topological
feature correlated relatively well with the target (auction volume) : as a mat-
ter of fact, we computed this feature for only 50 rows in the training set (50
different stocks at the same day) and have found a correlation value of 0.426
between the L1 norm and the auction volume. However, this method is very
computationally heavy especially for a large training set as ours. For this reason
mainly, we were unable to test the relevance of this method but we think that
11
it is somewhat promising.
References
[1] Gudhi library for tda : Tutorials. https://github.com/GUDHI/TDA-
tutorial.
[2] Python’s library for continuous wavelet transform.
https://pywavelets.readthedocs.io/en/latest/ref/cwt.html.
[3] Breiman. Random forests, machine learning. Springer, 2001.
[4] Alexander Hinneburg Charu C Aggarwal and Daniel A Keim. On the
surprising behavior of distance metrics in high dimensional space. Springer,
2001.
[5] Simi Haber Daniel Libman and Mary Schaps. Volume prediction with
neural networks. 2019.
[6] Topological Data Analysis for Financial Time Series : Landscapes of
Crashes. Marian gidea, yuri katz. arXiv:1703.04385, 2017.
[7] Ian T. Jolliffe and Jorge Cadima. Principal component analysis: a review
and recent developments. https://doi.org/10.1098/rsta.2015.0202, 2016.
[8] Shaoqing Ren Jian Sun Kaiming He, Xiangyu Zhang. Deep residual learn-
ing for image recognition. arXiv:1512.03385 [cs.CV], 2015.
[9] Stéphane Mallat. ”a wavelet tour of signal processing”, 3ème edition.
https://www.di.ens.fr/ mallat/College/WaveletTourChap4.2.pdf.
[10] Christian Szegedy Sergey Ioffe. Batch normalization: Acceler-
ating deep network training by reducing internal covariate shift.
arXiv:1502.03167v3[cs.LG], 2015.
[11] Xavier Alameda-Pineda Radu Horaud Stéphane Lathuilière, Pablo Mesejo.
A comprehensive analysis of deep regression. arXiv:1803.08450v3 [cs.CV],
2018.
12

More Related Content

What's hot

GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_reportCharles Hubbard
 
A Study of BFLOAT16 for Deep Learning Training
A Study of BFLOAT16 for Deep Learning TrainingA Study of BFLOAT16 for Deep Learning Training
A Study of BFLOAT16 for Deep Learning TrainingSubhajit Sahu
 
Graphical Models In Python | Edureka
Graphical Models In Python | EdurekaGraphical Models In Python | Edureka
Graphical Models In Python | EdurekaEdureka!
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingMLAI2
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...MLAI2
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMIJCSEA Journal
 
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...ijmnct
 
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...ijmnct
 
A genetic algorithm to solve the
A genetic algorithm to solve theA genetic algorithm to solve the
A genetic algorithm to solve theIJCNCJournal
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachAlexander Rakhlin
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
Image similarity using symbolic representation and its variations
Image similarity using symbolic representation and its variationsImage similarity using symbolic representation and its variations
Image similarity using symbolic representation and its variationssipij
 
Modified approximate 8-point multiplier less DCT like transform
Modified approximate 8-point multiplier less DCT like transformModified approximate 8-point multiplier less DCT like transform
Modified approximate 8-point multiplier less DCT like transformIJERA Editor
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...Kyong-Ha Lee
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paperprreiya
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapterBan Bang
 

What's hot (20)

GPUFish_technical_report
GPUFish_technical_reportGPUFish_technical_report
GPUFish_technical_report
 
A Study of BFLOAT16 for Deep Learning Training
A Study of BFLOAT16 for Deep Learning TrainingA Study of BFLOAT16 for Deep Learning Training
A Study of BFLOAT16 for Deep Learning Training
 
Graphical Models In Python | Edureka
Graphical Models In Python | EdurekaGraphical Models In Python | Edureka
Graphical Models In Python | Edureka
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
 
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
 
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
SOLVING OPTIMAL COMPONENTS ASSIGNMENT PROBLEM FOR A MULTISTATE NETWORK USING ...
 
A genetic algorithm to solve the
A genetic algorithm to solve theA genetic algorithm to solve the
A genetic algorithm to solve the
 
Flavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approachFlavours of Physics Challenge: Transfer Learning approach
Flavours of Physics Challenge: Transfer Learning approach
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
Image similarity using symbolic representation and its variations
Image similarity using symbolic representation and its variationsImage similarity using symbolic representation and its variations
Image similarity using symbolic representation and its variations
 
Modified approximate 8-point multiplier less DCT like transform
Modified approximate 8-point multiplier less DCT like transformModified approximate 8-point multiplier less DCT like transform
Modified approximate 8-point multiplier less DCT like transform
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
Ak04605259264
Ak04605259264Ak04605259264
Ak04605259264
 
DSP IEEE paper
DSP IEEE paperDSP IEEE paper
DSP IEEE paper
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapter
 

Similar to CFM Challenge - Course Project

Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisMonica Franklin
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_JieMDO_Lab
 
An Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAn Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAkhil Goyal
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetIJCERT
 
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...Amir Ziai
 
Static Analysis of Computer programs
Static Analysis of Computer programs Static Analysis of Computer programs
Static Analysis of Computer programs Arvind Devaraj
 
SBSI optimization tutorial
SBSI optimization tutorialSBSI optimization tutorial
SBSI optimization tutorialRichard Adams
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Higgs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleHiggs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleSajith Edirisinghe
 
Higgs bosob machine learning challange
Higgs bosob machine learning challangeHiggs bosob machine learning challange
Higgs bosob machine learning challangeTharindu Ranasinghe
 

Similar to CFM Challenge - Course Project (20)

Chapter 18,19
Chapter 18,19Chapter 18,19
Chapter 18,19
 
working with python
working with pythonworking with python
working with python
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
FSRM 582 Project
FSRM 582 ProjectFSRM 582 Project
FSRM 582 Project
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
A Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer AnalysisA Systematic Approach To Probabilistic Pointer Analysis
A Systematic Approach To Probabilistic Pointer Analysis
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
icpr_2012
icpr_2012icpr_2012
icpr_2012
 
RBHF_SDM_2011_Jie
RBHF_SDM_2011_JieRBHF_SDM_2011_Jie
RBHF_SDM_2011_Jie
 
An Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing TheoryAn Empirical Investigation Of The Arbitrage Pricing Theory
An Empirical Investigation Of The Arbitrage Pricing Theory
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
 
report
reportreport
report
 
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
On the Performance of the Pareto Set Pursuing (PSP) Method for Mixed-Variable...
 
Topic 1.4
Topic 1.4Topic 1.4
Topic 1.4
 
Static Analysis of Computer programs
Static Analysis of Computer programs Static Analysis of Computer programs
Static Analysis of Computer programs
 
SBSI optimization tutorial
SBSI optimization tutorialSBSI optimization tutorial
SBSI optimization tutorial
 
Telecom customer churn prediction
Telecom customer churn predictionTelecom customer churn prediction
Telecom customer churn prediction
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Higgs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleHiggs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - Kaggle
 
Higgs bosob machine learning challange
Higgs bosob machine learning challangeHiggs bosob machine learning challange
Higgs bosob machine learning challange
 

Recently uploaded

Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 

Recently uploaded (20)

Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 

CFM Challenge - Course Project

  • 1. Final Report : CFM Challenge MVA course : Représentations Parcimonieuses Khalil Bergaoui, Azza Ben Farhat {khalil.bergaoui;azza.ben-farhat}@student.ecp.fr March 23rd 2021 1 Introduction In the context of the MVA course entitled ”Représentations Parcimonieuses”, we participated in the CFM challenge : Stock Trading Prediction of Auction Vol- ume. Throughout this report, we will detail the methodology that we adopted during the challenge, we will present the results that we obtained and compare them to CFM’s benchmark. Additionnally, we will discuss some of the difficul- ties that we encountered in this project and discuss our final solution as well as potential future directions. In the next section, we will begin by briefly presenting the goal of the CFM challenge and reviewing the related work. 2 Related Work The goal of this year’s CFM challenge is to predict the volume (total value of stock exchanged) available for auction, for 900 stocks over about 350 days. The problem is thus formulated as a regression problem. 2.1 Litterature Overview Although the litterature on the topic of auction volume prediction is not partic- ularly rich, financial time series analysis as well as regression tasks are widely covered topics in machine learning. Considering the diverse nature of this chal- lenge’s input data (combination of independent values, identifiers and short noisy time series, in particular the return and volume features), the auction volume prediction problem can be tackled from different angles. In this section, we will briefly present relevant methods to approach the problem that we have found in the litterature. While there is no particularly straightforward state-of-the-art method for auction volume prediction, the basic strategy is to extend the techniques used for supervised machine learning for regression tasks. In particular we have found 1
  • 2. that better results are achieved with hybrid autoregressive models as in [5] where the authors perform the prediction in two steps : first a model is trained to fit the data, then a second model in trained to fit the difference between the first model’s predictions and the ground truth data (the residual error). This was in fact the strategy adopted by CFM’s benchmark model. 2.2 CFM Benchmark A hybrid model is used as a benchmark in this challenge. In fact, a linear regression model is trained to predict the auction volume, for a given data sample x = (xi)1≤i≤126, using all columns except ”pid” : vpred = β0 + P125 i=1 xiβi where (βi)i are the regression parameters. Then a tree based ensemble learning algorithm, LightGBM, includes the ’pid’ information to fit the residual error = vpred −vtrue between the ground truth target and the linear model’s prediction. Note that LightGBM is a gradient boosting framework that learns by grow- ing trees vertically (leaf-wise) and relies on a large set of hyperparameters that need to be carefully fine-tuned in order to avoid overfitting. 3 Methodology In this section, we will describe the steps that we carried during the project and will describe the difficulties that we encountered. The starting point in this data challenge was data exploration and dealing with noisy samples that took the form of missing values. 3.1 Missing values The following table summarizes information about missing data: Rows with missing data in training set (%) 37% Rows with missing data in test set (%) 33% Missing values of absolute returns in train set (%) 5% Missing values of absolute returns in test set (%) 4% Table 1: Missing data in the train and test datasets. Remark : We noticed that if the value of the nth feature of the absolute returns abs retn is missing, then the value of the nth feature of relative volume rel voln is also missing. The difficulty when dealing with missing values is that we are unable to quan- titatively anticipate the impact of the adopted strategy on the performance of the learning algorithm, which is why we decided to start with simply remplacing missing values with zeros as it was the case in CFM’s benchmark model. 2
  • 3. 3.2 Feature extraction Once missing values are replaced, we can start extracting relevant features from the data to use them as inputs for learning algorithms. In fact, a successful predictive model would use informative features about the output, which in our case is the auction volume for a given stock at a given day. So in addition to the provided input columns, which represent daily information about a given stock, it might be interesting to exploit the past of the auction volume for a given stock or the interaction between different stocks. For this we performed a quick correlation analysis that can be summerized in the below figures: (a) Within-stock correlation (b) Between-stock correlation On the left figure, we plot the auto-correlation of the auction volume time series (over the 800 days in the train set) using different lag values. We randomly picked two stocks: stock 360 (blue curve) and stock 850 (orange curve). On the right figure, we display the correlation map of the target (auction volume) between a random set of 50 different stocks. In both cases, we obtain low correlation values. In addition, since in the test set we do not always have access to the auction volume in the preceding days, we decided to focus our study on the provided input columns only. 3.2.1 Principal Component Analysis Principal Component Analysis (PCA) is a statistical technique generally used for dimensionality reduction in machine learning[7]. Geometrically, it corresponds to a projection method where data with m-columns (features) is projected into a subspace with m or fewer columns, which are uncorrelated and orthogonal, called principal components, whilst retaining the essence of the original data by maximizing the variability of the data set that is contained in the new vectors. The new vectors are ordered such that the retention of variation present in the original variables decreases as we move down in the order. So, in this way, the first principal component retains maximum variation that was present in the original components. We applied a PCA transform to the normalized train set without keeping 3
  • 4. the ”pid”, ”date” and ”ID” features, to look for a possible new set of features that could be used for training the model. The following table summarizes the explained variance ratio of the first 5 principal components: Principal component Variance ratio 1 9.99999560e-01 2 3.91120674e-07 3 4.83349491e-08 4 8.88623697e-11 5 2.14177158e-11, Table 2: Explained variance ratio of the 5 first principal components We can see that the first principle component contains almost all of the variance of the original data. This means that replacing the original data set by the first principle component is a good approximation, since it explains almost all the variance of the original data. At the same time, such result implies that all the columns (”pid”, ”date” and ”ID” excepted) are linearly dependent which we found a bit surprising. Then, we computed the correlation between the first principal componant and the output we want to predict (log of the auction volume) and we have found a relatively low value of 0.18. Our guess is that in the high dimensional space (126-dimensions), most data points are concentrated along the same direction (first principal component) which is a biased direction and we assume, by looking at the distribution of the target (auction volume) values over the training set, displayed in Figure 3, that the bias corresponds to data points corresponding to values around −2. We would like to add that another possible interpretation is that the observed linear dependency could have been ”fakely” caused by replacing the missing values with non zero values (in this experiment we replaced each missing value in a given column, for a given stock, with the average computed accross that same column, for all rows corresponding to different days of the same stock). However we repeated the same experiment when replacing with all missing values with zero and obtained a similar result (99% variance explained by first component only). In this case, it seems that linear methods are not sufficient to capture the specificity of the data set. One could possibly explore non linear methods for the dimensionality reduction step, but for time constraints we were unable to do it. 3.2.2 Wavelet Transform The wavelet transform is a mathematical tool used in signal processing in or- der to decompose signals over dilated and translated wavelets. In particular, a wavelet ψδ,σ is a function parametrized with shift and scale parameters allow- ing to analyze a given signal at multiple resolutions.Whereas there exist broad 4
  • 5. categories of wavelet functions, we will restrict our application to real wavelets since they are, in contrast with complex wavelets, often used to detect sharp signal transitions[9]. In our case, we apply wavelet transforms to detect sharp transitions in both the return signal and the volume signal (columns abs ret and rel vol respectively in the input data) as they are likely to be significant predictive features. In our experiments, we use a real version of the continuous Morlet wavelet given by : ψ(t) = e− t2 2 cos(5t) such that the continuous wavelet transform is applied to our discrete data (61 discrete samples for each of the return and volume signals) as a convolution with the discretized integral of the wavelet ψ. We use the the CWT implementation of the PyWavelet package [2] and analyze the signals over a number of 32 scale ( In other words Larger scales correspond to stretching of the wavelet. For example, at scale=10 the wavelet is stretched by a factor of 10, making it sensitive to lower frequencies in the signal). Figure 2: Wavelet Transform at 32 scales. Thus the 1D signal is transformed to 32 one-dimensional signals that can be represented as an image as we can see from the above figure where we illustrate the relevance of the wavelet transforms in capturing non-stationary transitions. For instance, around the 15th period of day 23, the volume of stock 739 ubruptly increases. This sudden change is reflected in the multiple resolution wavelet transform displayed below the signal. Therefore, wavelet transforms represent a good candidate set of features that allow us to use computer vision techniques in our predictive task as we will see in more details in section 3.3 5
  • 6. 3.3 Algorithms 3.3.1 Nearest Neighbors Using the previously described dimensionality reduction technique, PCA, we are able to map the input features to a low-dimensional space in which the application of nearest neighbors algorithm is more appropriate[4]. In this case, given a test data sample (a 126-dimensional vector), we project it on the low- dimensional space and use its K nearest neighbors among the training points (for the Euclidean distance) to make the prediction, where K is a hyperparameter that needs to be finetuned. However, such parameter can be tricky to optimize since it heavily depends on the density distribution of datapoints in the lower- dimensional space. Additionally, in our case, the density distribution of training points in biased towards specific output values, as displayed in the below figure, which could easily lead us to overfit the training data. Figure 3: Distribution of the target(auction volume) over the training set. 3.3.2 Ensemble learning : Random Forest Random forests [3] are an ensemble learning method for classification and regres- sion tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individual trees. It applies the general technique of bootstrap aggregating, or bagging, to tree learners, where each tree has a random sample with replacement of the training set and fits trees to these samples. The difference between bagging trees and random forests is that random forests use a modified tree learning algorithm that selects, at each candidate split in the learning process, a random subset of the features. This helps decrease the correlation between the different learners. Once the trees are trained, the predictions are made by taking the average of the predictions from all the individual regression trees. Random forests are more efficient than decision trees because they are more robust to noise and they overfit less. 6
  • 7. 3.3.3 Ensemble methods : Stacking Stacking or Stacked Generalization is an ensemble machine learning algorithm that uses a meta-learning algorithm to learn how to best combine the predic- tions from two or more base machine learning algorithms. Generally it gives good results when it combines models that have differnet learning rules and outperforms the ensemble models when taken individually. Its architecture is divided into two parts: the base-models or level-0 models which are the models fit on the training data and used to make predictions, and the meta-model or the level-1 model is the model who learns how to best combine the predictions of the base-models. Once the base models are trained, they are fed with unseen training data, and their predictions are paired with the ground truth and fed to the meta-model. It is preferred to use base-models that learn in different ways so that the errors in predictions made by the models are uncorrelated or have a low corre- lation. As for the meta-model, it is often a simple model, providing a smooth interpretation of the predictions made by the base models. We applied stacking using KNN and Random Forest models as base-models, and Linear Regression as the meta-model. The prediction results obtained when we trained the models on the 5 first principal components are presented in the final table in section 3.4. 3.3.4 Neural Networks In contrast with the previously described learning algorithms, neural networks lie within the framework of representation learning, in the sense that they do not always require carefully designed features since extracting discriminative features from the input data is part of the learning process. However, the difficulty lies in the tuning of both the architectural hyperparameters (layer type, depth, hidden units, activation functions...) and training parameters (batch size, optimizer ...). In fact, as it is shown in [11] the optimal hyperparameters depend on the the given dataset and vary significantly from one training task to another. In our case, we adapt common deep learning techniques to CFM’s dataset: - We apply computer vision methods : instead of using RGB images we use a 2-channel image obtained by stacking the wavelet transforms, computed over the input data as described section 3.1.2. - We use stock embedding : this allows us to replace the stock identifier ’pid’ with a trainable real-valued vector of a fixed size d (hyperparameter). Thus the stocks are represented with a trainable matrix W ∈ RS×d , where S = 900 is the number of distinct stocks in the dataset. Note the embedding is relevant in our case because the same training stocks are present in the test set. - We use non linear activation functions to capture non-linearity in the data. Further in this section, we discuss the impact of the choice of the non-linearity. - We use batch normalization [10] : this allows us, among other things, to normalize scalar features which are not necessarily of the same order of magni- 7
  • 8. tude (e.g day and LS columns). In this perspective, we experiment with different neural network configura- tions and use the mean squared error over the validation set as decision criterion. In the below figure, we detail the architecture of the network that achieved low- est validation MSE in our trials : Figure 4: Prediction Network architecture. At each training iteration, the data flows through the network from left to right, the final layer acting as a linear regression model to transform the 99-dimensional feature vector φ(x) to the predicted auction volume. Then, the prediction error is the backpropagated via gradient descent to update the parameters of the network. We use Adam optimizer with a learning rate lr = 0.0001, a batch of 96 datasamples at each iteration and train the network for a few epochs ≈ 7 until the validation MSE stops decreasing. Note that training takes roughly 8 minutes for one epoch using mainstream Colab GPU, however this can be optimized because, in our pipeline, we compute the wavelet transform on CPU. After fitting the network, we adopt the hybrid model strategy and train a second neural network to ”correct” the first network’s predictions. In this step, the input corresponds to the concatenation of the prediction network’s learned representation φ(x) and the predicted value. This 100-dimensional vector is used to train the correction network displayed in the below figure: 8
  • 9. Figure 5: Correction Network architecture. Similarly, we train this network using Adam optimizer with a lower learning rate lr = 10−5 and a batch of 32 datasamples at each iteration. To avoid overfitting, we train the network to minimize a regularized MSE loss of the form : MSE(y, ypred) + λ||w||2 ; where λ is the regularization coefficient fixed at 0.01 and w the weights of the final fully connected layer of the network. The combination of the prediction network and the correction network, as a hybrid model, allows us to achieve better performance than CFM’s proposed benchmark model as reported in section 3.4. Choice of the activation function: As widely used in convolutional neural networks, we implemented our net- work using ReLU (Rectified Linear Unit) as non linear activation function : ReLU(x) = min(0, x). However, we were unable reach better performance than the benchmark with architectural changes only. As it turned out, the ReLU function was the source of a saturation at the output level since our model was unable to predict values larger than a threshold of ≈ −1.62 which corresponds to the network’s output bias (i.e the value predicted when the final feature vector is the null vector). To fix this issue we replaced the ReLU with its Leaky version with parameter 0.02: LeakyReLU(x) = 0.02 if x 0 and x if x ≥ 0, which allows the flow of negative values through the network’s layers. As illustrated in the below figure, this change in the activation function solves the issue. 9
  • 10. (a) Output distribution with ReLU (b) Output distribution with LeakyReLU (0.02) Note that this effect might be caused by not scaling the wavelet transform arrays to have values in [0, 1] before feeding them to the convolutional network as it would have been the case when dealing with RGB image. In addition to this choice of activation function, we made use of the residual connection[8] in some layers of our network in order to enhance the flow of data. 3.4 Results In this section, we report the performance of the algorithms presented in section 3 and compare them to CFM’s benchmark. We have found that ensemble learn- ing through stacking gives better results compared to the one obained from the base models taken individually. However, the test performance on the challenge platform is 0.706. The gap between the validation and the test sets shows that our stacking model overfitted the data. We assume that this is mainly due to the difficulty in optimally fine-tuning its hyperparameters. We have also found that training a prediction neural network, as described in section 3, achieves good results but fails to perform better than CFM’s benchmark and in particu- lar a neural network trained on wavelet transforms performs better and overfits less (smaller gap between validation MSE and public MSE) than the same net- work not trained on wavelet transforms. Additionally, when used as part of a hybrid model (prediction network and correction network) the neural network approach achieves lower mean squared error over the private set, compared to the benchmark. We summarize numerical results obtained on public and private sets in the below table : 10
  • 11. Model Validation MSE Public MSE Private MSE KNN (Neraest Neighbors) 0.5041 - - Random Forest 0.5242 - - Stacking (KNN+Random Forest) 0.4605 0.7068 - NN (ReLU without WT) 0.4103 0.5435 - NN (ReLU + WT) 0.4612 0.5272 - NN (LeakyReLU + WT) 0.4389 0.4963 - CFM benchmark - 0.4742 0.4735 Hybrid NN 0.4072 0.4677 0.4650 Table 3: Performances of the trained models 4 Discussion In this report, we have described the approach we adopted during the challenge and the different methods we implemented. We have managed to improve the performance of our predictive model and achieve slightly better results than CFM’s benchmark, by adopting the same strategy of a hybrid predictive model while changing the preprocessing steps as well as the learning algorithm. In par- ticular, we have found that neural networks are able to achieve relatively good results with a suitable architecture, for e.g the architectures that we proposed in figures 4 and 5. However, we must point out that our final result does not represent the optimal solution using the pipeline that we have described. In fact, we did not study the impact of changing all the possible hyperparameters since it is a tedious task, considering the large number of their possible combi- nations. Which is why we believe that our final results can be further improved by changing certain parameters such as the stock embedding dimension, the size of convolutional and maxpooling operations, the activation function,etc. Additionally, we believe that further improvements can be achieved by focus- ing on the preprocessing step and trying to extract even more relevant features from the raw data. One particularly interesting method we have found in the litterature can be applied to the return and volume columns by viewing them as time series and applying Topological Data Analysis tools, as in [6], to extract significant topological features that would help with the prediction. In fact, we have experimented with this method, using the open source python package gudhi [1]. We adopted the same approach described in in order to extract the L1 norm of the first persistence landscapes [6] of the point cloud represented by both the return and volume time series. We have found that such topological feature correlated relatively well with the target (auction volume) : as a mat- ter of fact, we computed this feature for only 50 rows in the training set (50 different stocks at the same day) and have found a correlation value of 0.426 between the L1 norm and the auction volume. However, this method is very computationally heavy especially for a large training set as ours. For this reason mainly, we were unable to test the relevance of this method but we think that 11
  • 12. it is somewhat promising. References [1] Gudhi library for tda : Tutorials. https://github.com/GUDHI/TDA- tutorial. [2] Python’s library for continuous wavelet transform. https://pywavelets.readthedocs.io/en/latest/ref/cwt.html. [3] Breiman. Random forests, machine learning. Springer, 2001. [4] Alexander Hinneburg Charu C Aggarwal and Daniel A Keim. On the surprising behavior of distance metrics in high dimensional space. Springer, 2001. [5] Simi Haber Daniel Libman and Mary Schaps. Volume prediction with neural networks. 2019. [6] Topological Data Analysis for Financial Time Series : Landscapes of Crashes. Marian gidea, yuri katz. arXiv:1703.04385, 2017. [7] Ian T. Jolliffe and Jorge Cadima. Principal component analysis: a review and recent developments. https://doi.org/10.1098/rsta.2015.0202, 2016. [8] Shaoqing Ren Jian Sun Kaiming He, Xiangyu Zhang. Deep residual learn- ing for image recognition. arXiv:1512.03385 [cs.CV], 2015. [9] Stéphane Mallat. ”a wavelet tour of signal processing”, 3ème edition. https://www.di.ens.fr/ mallat/College/WaveletTourChap4.2.pdf. [10] Christian Szegedy Sergey Ioffe. Batch normalization: Acceler- ating deep network training by reducing internal covariate shift. arXiv:1502.03167v3[cs.LG], 2015. [11] Xavier Alameda-Pineda Radu Horaud Stéphane Lathuilière, Pablo Mesejo. A comprehensive analysis of deep regression. arXiv:1803.08450v3 [cs.CV], 2018. 12