ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge

.
......
Epileptic Seizure Prediction
American Epilepsy Society Kaggle Challenge
Dr. Francisco Zamora-Martínez
a.k.a. Paco
francisco.zamora@uch.ceu.es
Embedded Systems and Artificial Intelligence (ESAI) research team
Escuela superior de enseñanzas técnicas
Departamento de ciencias físicas, matemáticas y de la computación
Universidad CEU Cardenal Herrera
Cyient Insigts Ltd, Hyderabad, India
March 2015

What is Kaggle?
.
As company
..
......
A data science company.
Focused in data science competitions and challenges.
Consulting in energy efficiency and power consumption field.
.
Competitions
..
......
Knowledge competitions (no prize).
Sponsored competitions (prizes).
Normally open source solutions are required.
Any task in any field can be proposed as competition: energy,
signal, image, video, language, …
Stay tuned of new competitions at Kaggle.1
1
http://www.kaggle.com/

The epileptic seizure prediction challenge
.
Problem statement
..
......
Sponsored competition: 25,000$ in prizes.
7 subjects: 5 dogs, 2 humans.
Hundreds of 10 minutes EEG recordings (usually 16 channels) per
subject.
Training data is grouped by 60 minutes consecutive sequences.
Test data is totally random.
Two classes:
(+) pre-ictal state: from one hour five minutes until five minutes before
a recorded seizure.
(–) inter-ictal state: any random sample within at least one week
before or after any seizure.
Few positive samples, large number of negative samples
(imbalanced data set) ⇒ hand crafted pre-processing.

.
Problem statement
..
......
Every file has to be classified as pre-ictal or inter-ictal.
Classification performance measured as Area Under ROC Curve
(AUC).
Good performance measure for imbalanced data sets.
Computed over output probabilities, not required binarization (or
thresholding).
Test data divided into public (40 %) and private (60 %) test sets.
Private results are shown at the end of competition.
Some important issues:
High variability between subjects: difficult to train global model.
One model per subject, AUC performance reduction in case of not
calibrated outputs.

.
5 seconds samples
..
...... Pre-ictal sample for Dog 2

.
5 seconds samples
..
...... Inter-ictal sample for Dog 2

.
AUC example
..
......
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Truepostiverate
False postive rate
ROC curve
0.9824 AUC
0.7560 AUC
0.5000 AUC
Random model performs 0.5 AUC

.
Data pre-processing and feature extraction (1)
..
......
Spectral analysis: Fast Fourier Transform (FFT) with APRIL-ANN
toolkit.
60 seconds windows with 50 % overlapping.
19 windows for every channel.
Filtered by a Hamming window.
16 384 FFT bins for Dogs and 262 144 FFT bins for humans.
Filter bank: aggregates FFT bins with 6 filters.
• delta ( 0.1Hz — 4Hz )
• theta ( 4Hz — 8Hz )
• alpha ( 8Hz — 12Hz )
• beta ( 12Hz — 30Hz )
• low-gamma ( 30Hz — 70Hz )
• high-gamma ( 70Hz — 180Hz )
Log-compression of filters output.

.
Spectral analysis example
..
......
FFT Filters

.
Spectral analysis example
..
......
Sliding window

.
..
......
PCA/ICA transformations, combination tools R and APRIL-ANN.
Take as input every row (time position) of every FFT matrix (training
data only).
Center (and scale) all the data.
Compute PCA/ICA matrix rotations.
Apply this rotation to all the data (training and test).
This step removes linear dependencies between pairs of input
features.

.
..
......
Eigen values of pairwise channels correlation, implemented in R
(windowed)
Traverse the data using the same slide window as for FFT.
For every window compute the correlation matrix between channels.
Take the eigen values of correlation matrix as input features.
This features show to be useful when combined with FFT in the
input of any model.

.
..
......
Eigen values of pairwise channels correlation, implemented in R
(globally)
Take every recorded file as a whole input window.
Compute the correlation matrix between channels in previous
window.
Take the eigen values of correlation matrix as input features.
This features show minor improvements used in an ensemble with
other models.

.
..
......
Global statistics, implemented in R.
Take every recorded file as a whole input window.
Compute Fourier basis coefficients (15 elements).
Compute standard deviation of previous coefficients for each serie.
Compute the standard deviation and mean of previous value for
every channel.
This features show minor improvements used in an ensemble with
other models.

.
Data pre-processing summary
..
......
Spectral analysis (FFT + filters + log).
PCA and/or ICA to remove pairwise linear dependencies.
Eigen values of correlations matrix:
From windowed data.
From a whole input file (global).
A bunch of global statistics based in Fourier basis analysis.

.
Statistical models: general aspects
..
......
The system receives an input file (10 minutes of data).
One probability value is required.
Windowed models: 19 rows (60 seconds, 50 % overlapping):
The model computes probability of pre-ictal class for each input row.
Combination of 19 prob. values into one output probability
(i.e. geometric mean).
p(pre-ictal|x) = 1 − 19
19∏
t=1
(1 − pt)
Global models, receive a file, produce one output probability.
Cross-validation (CV) using F folds, one for each positive sample
sequence.
Large gap between CV AUC and public/private test sets AUC.

Model
Model
Model
...
Model
...

.
Statistical models: K-nearest-neighbors (KNNs)
..
......
Implementation available in APRIL-ANN.
Square of Euclidean distance as metric.
Good results with K = 40.
Given a set K with K neighbors,
probability computed following softmax principle:
p(c = pre-ictal|s) =
∑
s′∈pre-ictal(K)
exp(−||s − s′
||2
2)
∑
s′∈K
exp(−||s − s′
||2
2)
0.80080 CV, 0.67589 public test (FFT).
0.81044 CV, 0.72876 public test (+ PCA + COR).

.
Statistical models: Artificial Neural Networks (ANNs)
..
......
Implementation available in APRIL-ANN.
From two to six hidden layers (two and five in final solution).
Input context of three rows (previous, current, next)
Rectified Linear Units as hidden layers activation function.
Logistic output.
128 or 64 neurons in each layer.
128 mini-batch size.

.
Statistical models: Artificial Neural Networks (ANNs)
..
......
Stochastic gradient descent, cross-entropy loss function,
momentum term.
Dropout and L2 for regularization, input additive Gaussian noise.
L = −t log ˆy − (1 − t) log(1 − ˆy) + β||W||2
2
w(t+1)
= w(t)
−
η0
1 + ϵt
·
∂L
∂w(t)
+ γ(w(t)
− w(t−1)
)
0.90717 CV, 0.74890 public test (FFT + COR).
0.90816 CV, 0.78153 public test (+ PCA).
0.92827 CV, 0.79372 public test (+ 5 layers).
Being t the target output, ˆy the output of the ANN, β the weight decay term, W all the weights, w a
weight, η the learning rate, ϵ learning rate decay parameter, L the loss function, γ the momentum
term.

.
Ensemble: Bayesian Model Combination (BMC)
..
......
Implemented for APRIL-ANN.
Basic algorithm available in literature (and Wikipedia).
Adapted to work with likelihood instead of classification accuracy.
Linear combination of models θi, weights αi estimated by
Bayesian inference:
p(c = pre-ictal|s) =
∑
i
αip(c|s, θi)
Uniform ensemble: 0.8048 public test.
BMC ensemble: 0.8249 public test.

.
Summary results
..
......
Model Features CV AUC Pub. AUC
KNN FFT 0.8008 0.6759
KNN FFT+CORW 0.7994 0.7040
KNN PCA+CORW 0.8104 0.7288
KNN ICA+CORW 0.8103 0.6840
ANN2 FFT+CORW 0.9072 0.7489
ANN2 PCA+CORW 0.9082 0.7815
ANN2p PCA+CORW 0.9175 0.7895
ANN2 ICA+CORW 0.9104 0.7772
ANN3 PCA+CORW 0.9188 0.7690
ANN4 PCA+CORW 0.9268 0.7772
ANN5 PCA+CORW 0.9283 0.7937
ANN6 PCA+CORW 0.9291 0.7722
KNN CORG 0.7097 0.6552
KNN COVRED 0.6900 0.6901
UNIFORM ENSEMBLE — 0.8048
BMC ENSEMBLE 0.9271 0.8249

Final competition standings
Priv. # Pub. # Team Priv. Score Pub. Score Entries
1 1 Medrr2
0.83993 0.90316 264
⋆
2 4 QMSDP 0.81962 0.85951 501
⋆
3 8 Birchwood 0.80079 0.83869 160
⋆
4 15 ESAI CEU-UCH 0.79347 0.82488 182
5 3 Michael Hills 0.79251 0.86248 427
A total of 504 teams participated.
⋆
Prize winners.
2
A company, closed development, not interested in open sourced their solution, so they decline the
prize.

Conclusions
IMPORTANT: avoid overfitting to public test score.
Removing linear dependencies by PCA/ICA achieve significant
improvements.
Large gap between CV and test set score.
Larger for ANN than for KNN models.
Stable and better results using ensembles:
Bayesian Model Combination better than uniform ensembling.
Less overfitting to public test score.
Better bias-variance trade-off.
Different pre-processing pipelines × different modelization
techniques.
Solution code available at GitHub.3
3
https://github.com/ESAI-CEU-UCH/kaggle-epilepsy

Future work
Global model instead of one model per subject.
Don’t be evil with new subjects.
Wavelet signal processing.
Analysis of discriminant frequencies, filtering optimization, feature
learning.
¿Convolutional Neural Networks?
¿Recurrent Neural Networks?

End of session 2
Thanks for your attention!
Questions?

ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge

More Related Content

What's hot

Similar to ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge

More from Francisco Zamora-Martinez

Recently uploaded

ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge