.
......
Epileptic Seizure Prediction
American Epilepsy Society Kaggle Challenge
Dr. Francisco Zamora-Martínez
a.k.a. Paco
francisco.zamora@uch.ceu.es
Embedded Systems and Artificial Intelligence (ESAI) research team
Escuela superior de enseñanzas técnicas
Departamento de ciencias físicas, matemáticas y de la computación
Universidad CEU Cardenal Herrera
Cyient Insigts Ltd, Hyderabad, India
March 2015
What is Kaggle?
.
As company
..
......
A data science company.
Focused in data science competitions and challenges.
Consulting in energy efficiency and power consumption field.
.
Competitions
..
......
Knowledge competitions (no prize).
Sponsored competitions (prizes).
Normally open source solutions are required.
Any task in any field can be proposed as competition: energy,
signal, image, video, language, …
Stay tuned of new competitions at Kaggle.1
1
http://www.kaggle.com/
The epileptic seizure prediction challenge
.
Problem statement
..
......
Sponsored competition: 25,000$ in prizes.
7 subjects: 5 dogs, 2 humans.
Hundreds of 10 minutes EEG recordings (usually 16 channels) per
subject.
Training data is grouped by 60 minutes consecutive sequences.
Test data is totally random.
Two classes:
(+) pre-ictal state: from one hour five minutes until five minutes before
a recorded seizure.
(–) inter-ictal state: any random sample within at least one week
before or after any seizure.
Few positive samples, large number of negative samples
(imbalanced data set) ⇒ hand crafted pre-processing.
The epileptic seizure prediction challenge
.
Problem statement
..
......
Every file has to be classified as pre-ictal or inter-ictal.
Classification performance measured as Area Under ROC Curve
(AUC).
Good performance measure for imbalanced data sets.
Computed over output probabilities, not required binarization (or
thresholding).
Test data divided into public (40 %) and private (60 %) test sets.
Private results are shown at the end of competition.
Some important issues:
High variability between subjects: difficult to train global model.
One model per subject, AUC performance reduction in case of not
calibrated outputs.
The epileptic seizure prediction challenge
.
5 seconds samples
..
...... Pre-ictal sample for Dog 2
The epileptic seizure prediction challenge
.
5 seconds samples
..
...... Inter-ictal sample for Dog 2
The epileptic seizure prediction challenge
.
AUC example
..
......
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Truepostiverate
False postive rate
ROC curve
0.9824 AUC
0.7560 AUC
0.5000 AUC
Random model performs 0.5 AUC
The epileptic seizure prediction challenge
.
Data pre-processing and feature extraction (1)
..
......
Spectral analysis: Fast Fourier Transform (FFT) with APRIL-ANN
toolkit.
60 seconds windows with 50 % overlapping.
19 windows for every channel.
Filtered by a Hamming window.
16 384 FFT bins for Dogs and 262 144 FFT bins for humans.
Filter bank: aggregates FFT bins with 6 filters.
• delta ( 0.1Hz — 4Hz )
• theta ( 4Hz — 8Hz )
• alpha ( 8Hz — 12Hz )
• beta ( 12Hz — 30Hz )
• low-gamma ( 30Hz — 70Hz )
• high-gamma ( 70Hz — 180Hz )
Log-compression of filters output.
The epileptic seizure prediction challenge
.
Spectral analysis example
..
......
FFT Filters
The epileptic seizure prediction challenge
.
Spectral analysis example
..
......
Sliding window
The epileptic seizure prediction challenge
.
Data pre-processing and feature extraction (2)
..
......
PCA/ICA transformations, combination tools R and APRIL-ANN.
Take as input every row (time position) of every FFT matrix (training
data only).
Center (and scale) all the data.
Compute PCA/ICA matrix rotations.
Apply this rotation to all the data (training and test).
This step removes linear dependencies between pairs of input
features.
The epileptic seizure prediction challenge
.
Data pre-processing and feature extraction (3)
..
......
Eigen values of pairwise channels correlation, implemented in R
(windowed)
Traverse the data using the same slide window as for FFT.
For every window compute the correlation matrix between channels.
Take the eigen values of correlation matrix as input features.
This features show to be useful when combined with FFT in the
input of any model.
The epileptic seizure prediction challenge
.
Data pre-processing and feature extraction (4)
..
......
Eigen values of pairwise channels correlation, implemented in R
(globally)
Take every recorded file as a whole input window.
Compute the correlation matrix between channels in previous
window.
Take the eigen values of correlation matrix as input features.
This features show minor improvements used in an ensemble with
other models.
The epileptic seizure prediction challenge
.
Data pre-processing and feature extraction (5)
..
......
Global statistics, implemented in R.
Take every recorded file as a whole input window.
Compute Fourier basis coefficients (15 elements).
Compute standard deviation of previous coefficients for each serie.
Compute the standard deviation and mean of previous value for
every channel.
This features show minor improvements used in an ensemble with
other models.
The epileptic seizure prediction challenge
.
Data pre-processing summary
..
......
Spectral analysis (FFT + filters + log).
PCA and/or ICA to remove pairwise linear dependencies.
Eigen values of correlations matrix:
From windowed data.
From a whole input file (global).
A bunch of global statistics based in Fourier basis analysis.
The epileptic seizure prediction challenge
.
Statistical models: general aspects
..
......
The system receives an input file (10 minutes of data).
One probability value is required.
Windowed models: 19 rows (60 seconds, 50 % overlapping):
The model computes probability of pre-ictal class for each input row.
Combination of 19 prob. values into one output probability
(i.e. geometric mean).
p(pre-ictal|x) = 1 − 19
19∏
t=1
(1 − pt)
Global models, receive a file, produce one output probability.
Cross-validation (CV) using F folds, one for each positive sample
sequence.
Large gap between CV AUC and public/private test sets AUC.
The epileptic seizure prediction challenge
Model
Model
Model
...
Model
...
The epileptic seizure prediction challenge
.
Statistical models: K-nearest-neighbors (KNNs)
..
......
Implementation available in APRIL-ANN.
Square of Euclidean distance as metric.
Good results with K = 40.
Given a set K with K neighbors,
probability computed following softmax principle:
p(c = pre-ictal|s) =
∑
s′∈pre-ictal(K)
exp(−||s − s′
||2
2)
∑
s′∈K
exp(−||s − s′
||2
2)
0.80080 CV, 0.67589 public test (FFT).
0.81044 CV, 0.72876 public test (+ PCA + COR).
The epileptic seizure prediction challenge
.
Statistical models: Artificial Neural Networks (ANNs)
..
......
Implementation available in APRIL-ANN.
From two to six hidden layers (two and five in final solution).
Input context of three rows (previous, current, next)
Rectified Linear Units as hidden layers activation function.
Logistic output.
128 or 64 neurons in each layer.
128 mini-batch size.
The epileptic seizure prediction challenge
.
Statistical models: Artificial Neural Networks (ANNs)
..
......
Stochastic gradient descent, cross-entropy loss function,
momentum term.
Dropout and L2 for regularization, input additive Gaussian noise.
L = −t log ˆy − (1 − t) log(1 − ˆy) + β||W||2
2
w(t+1)
= w(t)
−
η0
1 + ϵt
·
∂L
∂w(t)
+ γ(w(t)
− w(t−1)
)
0.90717 CV, 0.74890 public test (FFT + COR).
0.90816 CV, 0.78153 public test (+ PCA).
0.92827 CV, 0.79372 public test (+ 5 layers).
Being t the target output, ˆy the output of the ANN, β the weight decay term, W all the weights, w a
weight, η the learning rate, ϵ learning rate decay parameter, L the loss function, γ the momentum
term.
The epileptic seizure prediction challenge
.
Ensemble: Bayesian Model Combination (BMC)
..
......
Implemented for APRIL-ANN.
Basic algorithm available in literature (and Wikipedia).
Adapted to work with likelihood instead of classification accuracy.
Linear combination of models θi, weights αi estimated by
Bayesian inference:
p(c = pre-ictal|s) =
∑
i
αip(c|s, θi)
Uniform ensemble: 0.8048 public test.
BMC ensemble: 0.8249 public test.
The epileptic seizure prediction challenge
.
Summary results
..
......
Model Features CV AUC Pub. AUC
KNN FFT 0.8008 0.6759
KNN FFT+CORW 0.7994 0.7040
KNN PCA+CORW 0.8104 0.7288
KNN ICA+CORW 0.8103 0.6840
ANN2 FFT+CORW 0.9072 0.7489
ANN2 PCA+CORW 0.9082 0.7815
ANN2p PCA+CORW 0.9175 0.7895
ANN2 ICA+CORW 0.9104 0.7772
ANN3 PCA+CORW 0.9188 0.7690
ANN4 PCA+CORW 0.9268 0.7772
ANN5 PCA+CORW 0.9283 0.7937
ANN6 PCA+CORW 0.9291 0.7722
KNN CORG 0.7097 0.6552
KNN COVRED 0.6900 0.6901
UNIFORM ENSEMBLE — 0.8048
BMC ENSEMBLE 0.9271 0.8249
Final competition standings
Final competition standings
Priv. # Pub. # Team Priv. Score Pub. Score Entries
1 1 Medrr2
0.83993 0.90316 264
⋆
2 4 QMSDP 0.81962 0.85951 501
⋆
3 8 Birchwood 0.80079 0.83869 160
⋆
4 15 ESAI CEU-UCH 0.79347 0.82488 182
5 3 Michael Hills 0.79251 0.86248 427
A total of 504 teams participated.
⋆
Prize winners.
2
A company, closed development, not interested in open sourced their solution, so they decline the
prize.
Conclusions
IMPORTANT: avoid overfitting to public test score.
Removing linear dependencies by PCA/ICA achieve significant
improvements.
Large gap between CV and test set score.
Larger for ANN than for KNN models.
Stable and better results using ensembles:
Bayesian Model Combination better than uniform ensembling.
Less overfitting to public test score.
Better bias-variance trade-off.
Different pre-processing pipelines × different modelization
techniques.
Solution code available at GitHub.3
3
https://github.com/ESAI-CEU-UCH/kaggle-epilepsy
Future work
Global model instead of one model per subject.
Don’t be evil with new subjects.
Wavelet signal processing.
Analysis of discriminant frequencies, filtering optimization, feature
learning.
¿Convolutional Neural Networks?
¿Recurrent Neural Networks?
End of session 2
Thanks for your attention!
Questions?

ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge

  • 1.
    . ...... Epileptic Seizure Prediction AmericanEpilepsy Society Kaggle Challenge Dr. Francisco Zamora-Martínez a.k.a. Paco francisco.zamora@uch.ceu.es Embedded Systems and Artificial Intelligence (ESAI) research team Escuela superior de enseñanzas técnicas Departamento de ciencias físicas, matemáticas y de la computación Universidad CEU Cardenal Herrera Cyient Insigts Ltd, Hyderabad, India March 2015
  • 2.
    What is Kaggle? . Ascompany .. ...... A data science company. Focused in data science competitions and challenges. Consulting in energy efficiency and power consumption field. . Competitions .. ...... Knowledge competitions (no prize). Sponsored competitions (prizes). Normally open source solutions are required. Any task in any field can be proposed as competition: energy, signal, image, video, language, … Stay tuned of new competitions at Kaggle.1 1 http://www.kaggle.com/
  • 3.
    The epileptic seizureprediction challenge . Problem statement .. ...... Sponsored competition: 25,000$ in prizes. 7 subjects: 5 dogs, 2 humans. Hundreds of 10 minutes EEG recordings (usually 16 channels) per subject. Training data is grouped by 60 minutes consecutive sequences. Test data is totally random. Two classes: (+) pre-ictal state: from one hour five minutes until five minutes before a recorded seizure. (–) inter-ictal state: any random sample within at least one week before or after any seizure. Few positive samples, large number of negative samples (imbalanced data set) ⇒ hand crafted pre-processing.
  • 4.
    The epileptic seizureprediction challenge . Problem statement .. ...... Every file has to be classified as pre-ictal or inter-ictal. Classification performance measured as Area Under ROC Curve (AUC). Good performance measure for imbalanced data sets. Computed over output probabilities, not required binarization (or thresholding). Test data divided into public (40 %) and private (60 %) test sets. Private results are shown at the end of competition. Some important issues: High variability between subjects: difficult to train global model. One model per subject, AUC performance reduction in case of not calibrated outputs.
  • 5.
    The epileptic seizureprediction challenge . 5 seconds samples .. ...... Pre-ictal sample for Dog 2
  • 6.
    The epileptic seizureprediction challenge . 5 seconds samples .. ...... Inter-ictal sample for Dog 2
  • 7.
    The epileptic seizureprediction challenge . AUC example .. ...... 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Truepostiverate False postive rate ROC curve 0.9824 AUC 0.7560 AUC 0.5000 AUC Random model performs 0.5 AUC
  • 8.
    The epileptic seizureprediction challenge . Data pre-processing and feature extraction (1) .. ...... Spectral analysis: Fast Fourier Transform (FFT) with APRIL-ANN toolkit. 60 seconds windows with 50 % overlapping. 19 windows for every channel. Filtered by a Hamming window. 16 384 FFT bins for Dogs and 262 144 FFT bins for humans. Filter bank: aggregates FFT bins with 6 filters. • delta ( 0.1Hz — 4Hz ) • theta ( 4Hz — 8Hz ) • alpha ( 8Hz — 12Hz ) • beta ( 12Hz — 30Hz ) • low-gamma ( 30Hz — 70Hz ) • high-gamma ( 70Hz — 180Hz ) Log-compression of filters output.
  • 9.
    The epileptic seizureprediction challenge . Spectral analysis example .. ...... FFT Filters
  • 10.
    The epileptic seizureprediction challenge . Spectral analysis example .. ...... Sliding window
  • 11.
    The epileptic seizureprediction challenge . Data pre-processing and feature extraction (2) .. ...... PCA/ICA transformations, combination tools R and APRIL-ANN. Take as input every row (time position) of every FFT matrix (training data only). Center (and scale) all the data. Compute PCA/ICA matrix rotations. Apply this rotation to all the data (training and test). This step removes linear dependencies between pairs of input features.
  • 12.
    The epileptic seizureprediction challenge . Data pre-processing and feature extraction (3) .. ...... Eigen values of pairwise channels correlation, implemented in R (windowed) Traverse the data using the same slide window as for FFT. For every window compute the correlation matrix between channels. Take the eigen values of correlation matrix as input features. This features show to be useful when combined with FFT in the input of any model.
  • 13.
    The epileptic seizureprediction challenge . Data pre-processing and feature extraction (4) .. ...... Eigen values of pairwise channels correlation, implemented in R (globally) Take every recorded file as a whole input window. Compute the correlation matrix between channels in previous window. Take the eigen values of correlation matrix as input features. This features show minor improvements used in an ensemble with other models.
  • 14.
    The epileptic seizureprediction challenge . Data pre-processing and feature extraction (5) .. ...... Global statistics, implemented in R. Take every recorded file as a whole input window. Compute Fourier basis coefficients (15 elements). Compute standard deviation of previous coefficients for each serie. Compute the standard deviation and mean of previous value for every channel. This features show minor improvements used in an ensemble with other models.
  • 15.
    The epileptic seizureprediction challenge . Data pre-processing summary .. ...... Spectral analysis (FFT + filters + log). PCA and/or ICA to remove pairwise linear dependencies. Eigen values of correlations matrix: From windowed data. From a whole input file (global). A bunch of global statistics based in Fourier basis analysis.
  • 16.
    The epileptic seizureprediction challenge . Statistical models: general aspects .. ...... The system receives an input file (10 minutes of data). One probability value is required. Windowed models: 19 rows (60 seconds, 50 % overlapping): The model computes probability of pre-ictal class for each input row. Combination of 19 prob. values into one output probability (i.e. geometric mean). p(pre-ictal|x) = 1 − 19 19∏ t=1 (1 − pt) Global models, receive a file, produce one output probability. Cross-validation (CV) using F folds, one for each positive sample sequence. Large gap between CV AUC and public/private test sets AUC.
  • 17.
    The epileptic seizureprediction challenge Model Model Model ... Model ...
  • 18.
    The epileptic seizureprediction challenge . Statistical models: K-nearest-neighbors (KNNs) .. ...... Implementation available in APRIL-ANN. Square of Euclidean distance as metric. Good results with K = 40. Given a set K with K neighbors, probability computed following softmax principle: p(c = pre-ictal|s) = ∑ s′∈pre-ictal(K) exp(−||s − s′ ||2 2) ∑ s′∈K exp(−||s − s′ ||2 2) 0.80080 CV, 0.67589 public test (FFT). 0.81044 CV, 0.72876 public test (+ PCA + COR).
  • 19.
    The epileptic seizureprediction challenge . Statistical models: Artificial Neural Networks (ANNs) .. ...... Implementation available in APRIL-ANN. From two to six hidden layers (two and five in final solution). Input context of three rows (previous, current, next) Rectified Linear Units as hidden layers activation function. Logistic output. 128 or 64 neurons in each layer. 128 mini-batch size.
  • 20.
    The epileptic seizureprediction challenge . Statistical models: Artificial Neural Networks (ANNs) .. ...... Stochastic gradient descent, cross-entropy loss function, momentum term. Dropout and L2 for regularization, input additive Gaussian noise. L = −t log ˆy − (1 − t) log(1 − ˆy) + β||W||2 2 w(t+1) = w(t) − η0 1 + ϵt · ∂L ∂w(t) + γ(w(t) − w(t−1) ) 0.90717 CV, 0.74890 public test (FFT + COR). 0.90816 CV, 0.78153 public test (+ PCA). 0.92827 CV, 0.79372 public test (+ 5 layers). Being t the target output, ˆy the output of the ANN, β the weight decay term, W all the weights, w a weight, η the learning rate, ϵ learning rate decay parameter, L the loss function, γ the momentum term.
  • 21.
    The epileptic seizureprediction challenge . Ensemble: Bayesian Model Combination (BMC) .. ...... Implemented for APRIL-ANN. Basic algorithm available in literature (and Wikipedia). Adapted to work with likelihood instead of classification accuracy. Linear combination of models θi, weights αi estimated by Bayesian inference: p(c = pre-ictal|s) = ∑ i αip(c|s, θi) Uniform ensemble: 0.8048 public test. BMC ensemble: 0.8249 public test.
  • 22.
    The epileptic seizureprediction challenge . Summary results .. ...... Model Features CV AUC Pub. AUC KNN FFT 0.8008 0.6759 KNN FFT+CORW 0.7994 0.7040 KNN PCA+CORW 0.8104 0.7288 KNN ICA+CORW 0.8103 0.6840 ANN2 FFT+CORW 0.9072 0.7489 ANN2 PCA+CORW 0.9082 0.7815 ANN2p PCA+CORW 0.9175 0.7895 ANN2 ICA+CORW 0.9104 0.7772 ANN3 PCA+CORW 0.9188 0.7690 ANN4 PCA+CORW 0.9268 0.7772 ANN5 PCA+CORW 0.9283 0.7937 ANN6 PCA+CORW 0.9291 0.7722 KNN CORG 0.7097 0.6552 KNN COVRED 0.6900 0.6901 UNIFORM ENSEMBLE — 0.8048 BMC ENSEMBLE 0.9271 0.8249
  • 23.
  • 24.
    Final competition standings Priv.# Pub. # Team Priv. Score Pub. Score Entries 1 1 Medrr2 0.83993 0.90316 264 ⋆ 2 4 QMSDP 0.81962 0.85951 501 ⋆ 3 8 Birchwood 0.80079 0.83869 160 ⋆ 4 15 ESAI CEU-UCH 0.79347 0.82488 182 5 3 Michael Hills 0.79251 0.86248 427 A total of 504 teams participated. ⋆ Prize winners. 2 A company, closed development, not interested in open sourced their solution, so they decline the prize.
  • 25.
    Conclusions IMPORTANT: avoid overfittingto public test score. Removing linear dependencies by PCA/ICA achieve significant improvements. Large gap between CV and test set score. Larger for ANN than for KNN models. Stable and better results using ensembles: Bayesian Model Combination better than uniform ensembling. Less overfitting to public test score. Better bias-variance trade-off. Different pre-processing pipelines × different modelization techniques. Solution code available at GitHub.3 3 https://github.com/ESAI-CEU-UCH/kaggle-epilepsy
  • 26.
    Future work Global modelinstead of one model per subject. Don’t be evil with new subjects. Wavelet signal processing. Analysis of discriminant frequencies, filtering optimization, feature learning. ¿Convolutional Neural Networks? ¿Recurrent Neural Networks?
  • 27.
    End of session2 Thanks for your attention! Questions?