Biomedical Signals Classification With Transformer Based Model
Presented by
Sandeep Kumar
M.Tech in Electrical Engineering (Instrumentation and Control)
Zakir Hussain College of Engineering and Technology
Aligarh Muslim University
Aligarh, India
sandeepietmjpru@gmail.com
TABLE OF CONTENT
1. Introduction 2. Problem Statement 3. EEG Dataset 4. Methodology
5. Transformer Model 6. Performance Analysis 7. Results 8. Conclusion
INTRODUCTION
 A biosignal or biomedical signal is any signal in living beings that can be continually measured and monitored.
The term biosignal is often used to refer to bioelectrical signals, but it may refer to both electrical and non-
electrical signals.
 Electrical biosignals, or bioelectrical time signals, usually refers to the change in electric current produced by
the sum of an electrical potential difference across a specialized tissue, organ or cell system like the nervous
system.
 EEG, ECG, EOG and EMG are measured with a differential amplifier which registers the difference between
two electrodes attached to the skin.
 The Transformer Neural Network is a novel architecture that aims to solve sequence-to-sequence tasks while
handling long-range dependencies with ease.
PROBLEM STATEMENT
Perform a classification task on EEG signals from people with suspected epilepsy with
the help of transformer neural network.
EEG DATASETS
1. Bonn EEG Dataset :-
This EEG datasets[2] come from Bonn University in Germany, which is also the location of the database that is
publicly accessible to the general public.
a) Every class contains 100 text files which represent 100 channels.
b) Database includes five distinct classes, from class A to class E
c) Each channels contains 4097 samples
d) Sampling frequency = 173.6 Hz
e) Time duration of the signals = 4097/173.6 = 23.6 sec
EEG DATASETS
1. Bonn EEG Dataset :-
Set A Set B Set C Set D Set E
Healthy Healthy Epilepsy Epilepsy Epilepsy
Open Eyes Close Eyes Interictal Interictal Ictal
Surface
EEG
Surface
EEG
Intracranial Intracranial Intracranial
EEG DATASETS
2. CHB-MIT EEG Dataset :-
This database[3], collected at the Children’s Hospital Boston, consists of EEG recordings from pediatric subjects
with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure
medication in order to characterize their seizures and assess their candidacy for surgical intervention.
a) Recordings, grouped into 23 cases, were collected from 22 subjects (5 males, ages 3–22; and 17 females, ages
1.5–19).
b) All signals were sampled at 256 samples per second (Hz) with 16-bit resolution.
EEG DATASETS
EEG of a channel from CHB-MIT Dataset
METHODOLOGY
EEG DATA
ACQUISITION
PRE-
PROCESSING
WAVELET
DECOMPOSITIO
N
FEATURE
EXTRACTION CLASSIFICATION RESULTS
EEG DATAACQUISITION
Data acquisition is a crucial step in the process of building a classification model. It involves collecting and
gathering the necessary data that will be used to train and evaluate the classification algorithm.
 The Bonn EEG dataset is available in text format (.txt), and
 The CHB-MIT dataset is available in European Data Format (.edf).
PRE-PROCESSING
The main purpose of the pre-processing step in classification is that to remove the artifact and convert the data from
one domain to another domain i.e., time domain to frequency domain.
 In case of Bonn EEG dataset there is no need to removing artifact because Bonn Dataset is clean EEG.
 But on the other side, the CHB-MIT dataset having 60 Hz baseline interference and other artifacts.
WAVELET DECOMPOSITION
 After removing baseline interference and artifacts both the datasets transform into time-frequency domain with
the help of wavelet transform and decomposed the dataset through wavelet transform.
 The wavelet transform provides both time and frequency localization, which means it can identify where
specific frequencies occur in the signal at different time points.
 The DWT decomposition proceeds in a hierarchical manner, resulting in a tree-like structure known as a
decomposition tree or wavelet tree.
 The decomposition process involves low-pass filtering (approximation) and high-pass filtering (detail)
operations, followed by down sampling.
WAVELET DECOMPOSITION
Wavelet Decomposition of Bonn EEG Dataset
FEATURE EXTRACTION
Feature extraction is a crucial step in the classification process, especially when dealing with high-dimensional data.
 It involves selecting or transforming raw input data into a reduced set of relevant features that capture the
essential information necessary for accurate classification.
 By extracting meaningful features, we can improve the performance and efficiency of classification
algorithms.
 Here are some statistical features used in this methodology:
1. Skewness: It is a measure of the asymmetry of a distribution. Skewness can help identify the tail behaviour of
the distribution.
𝜇3 =
𝑖
𝑁
(𝑋𝑖 − 𝑋)3
𝑁 − 1 ∗ 𝜎3
FEATURE EXTRACTION
2. Coherence: Coherence in signal processing pertains to a statistical metric quantifying the level of similarity
or correlation between two signals in the frequency domain.
𝐶𝑥𝑦 =
𝑃𝑥𝑦
2
𝑃𝑥𝑥 − 𝑃𝑦𝑦
3. Standard Deviation: The standard deviation is a statistical measure that quantifies the amount of variation or
dispersion in a set of data.
𝜎 =
(𝑋𝑖 − 𝜇)
2
𝑁
4. Power Spectral Density: The power spectral density (PSD) or power spectrum density is a term used to
describe the distribution of average power of a signal x(t) in the frequency domain.
𝑆 𝜔 = lim
𝜏→∞
𝑋(𝜔) 2
𝜏
CLASSIFICATION
After extracting features from the EEG data, the next step involves classifying the data. This classification process
aims to define boundaries between different classes and assign labels based on the observed characteristics of the
data.
 The transformer consists of an encoder and a decoder.
 It learns to capture the relationships between different feature elements and model the dependencies in the
data.
 The transformer model is trained using labelled EEG data, where the dataset is divided into training and
validation sets.
 Throughout the training process, the model acquires the ability to classify EEG features by learning from the
labelled data.
TRANSFORMER MODEL
 Transformer is the new type of neural network architectures which works on the Attention Mechanism.
 The Transformer model was proposed in the paper “Attention is All You Need”.[4]
 It is basically used for Natural Language Processing(NLP).
SEQ-TO-SEQ ATTENTION
 To understand the Transformer , it necessary to understand the Seq-to-Seq Attention.
Architecture of Seq-to-Seq with Attention
HOW S2SwA WORKS ?
HOW S2SwA WORKS ?
1. Look at the set of encoder hidden states it received – each encoder hidden state is most associated with a
certain vector in input.
2. Give each hidden state a score.
3. Multiply each hidden state by its softmaxed score, thus amplifying hidden states with high scores, and
drowning out hidden states with low scores.
4. Attention Step: We use the encoder hidden states and the h4 vector to calculate vector (C4) for this time step.
5. We concatenate h4 and C4 into one vector.
6. We pass this vector through a feedforward neural network.
7. The output of the feedforward neural networks indicates the output of this time step.
TRANSFORMER ARCHITECTURE
HOW TRANSFORMER WORKS ?
1. Creating Self-Attention Vectors
The first step in calculating self-attention vectors (Q,K,V)
from each of the encoder’s input vectors.
These vectors are created by multiplying the embedding by
three weight matrices (𝑊𝑄, 𝑊𝐾, 𝑊𝑉).
HOW TRANSFORMER WORKS ?
2. Score Calculating
The score is calculated by taking the dot product of the Q
with the K of the respective input vector we’re scoring.
Divide the score by the square root of the dimension of key
vector.
3. Apply Softmax
Softmax normalizes the scores so they’re all positive and
add up to 1.
HOW TRANSFORMER WORKS ?
4. Evaluate Self-Attention
Multiply each value vector by the softmax score and sum
up the weighted value vector. This produces the output of
the self-attention layer at this position.
𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑄, 𝐾, 𝑉 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥
𝑄𝐾𝑇
𝑑𝐾
𝑉
HOW TRANSFORMER WORKS ?
5. Evaluate Multihead-Attention
The self-attention layer redefined by adding a mechanism called “multi-headed” attention. This improves the
performance of the attention layer in two ways-
• It expands the model’s ability to focus on different positions.
• With multi-headed attention we have not only one, but multiple sets of Q/K/V weight matrices. Then, after
training, each set is used to project the input into a different representation subspace.
𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑 𝑄, 𝐾, 𝑉 = 𝑐𝑜𝑛𝑐𝑎𝑡 ℎ𝑒𝑎𝑑1,ℎ𝑒𝑎𝑑2, … … ℎ𝑒𝑎𝑑ℎ 𝑊𝑂
𝑤ℎ𝑒𝑟𝑒 ℎ𝑒𝑎𝑑𝑖 = 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄𝑊
𝑖
𝑄
, 𝐾𝑊𝑖
𝐾
, 𝑉𝑊𝑖
𝑉
)
HOW TRANSFORMER WORKS ?
WHY TRANSFORMER ?
 Recurrent neural networks (RNN) are capable of looking at previous inputs to predict the next possible output.
But RNN’s curse of the shorter window of reference, resulting in Vanishing Gradient.
 This is still true for Gated Recurrent Units (GRU’s) and Long-short Term Memory (LSTM’s) networks,
although they do have a bigger capacity to achieve longer-term memory compared to RNN.
 The attention mechanism, in theory, have an infinite window to reference from, therefore being capable of using
the entire input. In terms of training, Transformers is definitely faster because of the parallel processing
capability.
 Transformer having lower compute costs, smaller carbon footprint.
PERFORMANCE ANALYSIS
A confusion matrix is a matrix that summarizes the performance of a transformer model on a set of test data. It is
often used to measure the performance of classification models, which aim to predict a categorical label for each
input instance.
 True Positive: The model's accurate prediction count of
positive instances.
 True Negative: The model accurately predicted the
number of negative instances.
 False Positive: The count of cases the model predicted as
positive but are, in fact, negative
 False Negative: The count of cases in which the model
predicted negativity, but they are, in fact, positive.
PERFORMANCE ANALYSIS
The performance of the trained model can be assessed by evaluating it on an independent test dataset. Metrics such
as sensitivity, specificity, accuracy, and F1-score can be utilized to evaluate the classification performance.
 Accuracy: Accuracy is used to measure the performance
of the model.
 Sensitivity: Sensitivity is a measure of how accurate a
model’s positive predictions are.
 Specificity: The number of correct negative predictions
divided by the total number of negatives.
 F1-Score: F1-score is used to evaluate the overall
performance of a classification model. It is the harmonic
mean of sensitivity and specificity.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃 + 𝑇𝑁
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑇𝑃
𝑇𝑃 + 𝐹𝑁
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑇𝑁
𝑇𝑁 + 𝐹𝑃
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 2
𝑇𝑃
2𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
RESULTS
1. Experiment on The Bonn Dataset
On the Bonn dataset, the experiment performs from binary class to five class classification. The dataset was broken
down in five binary combinations i.e., A vs E, B vs E, C vs E, D vs E, ABCD vs E, one tertiary combination i.e., AB
vs CD vs E, and one five class combination i.e., A vs B vs C vs D vs E.
 Binary Class Results:
RESULTS
Combinations Sensitivity Specificity Accuracy F1-Score
A vs E 100% 100% 100% 100%
B vs E 100% 94.74% 97.50% 97.67%
C vs E 95.24% 94.74% 95.0% 95.24%
D vs E 100% 95.45% 97.50% 97.30%
ABCD vs E 100% 76.92% 94.00% 96.10%
Overall Performance 99.04% 92.37% 96.80% 97.26%
RESULTS
 Multiclass Class Results:
1. Tertiary Classification (AB vs CD vs E)
Specificity Sensitivity F1-Score Support
NORMAL 91% 95% 93% 22
INTERICTAL 88% 93% 90% 15
SEIZURE 100% 85% 92% 13
ACCURACY 92% 50
MACRO AVG 93% 91% 92% 50
WEIGHTED AVG 92% 92% 92% 50
RESULTS
 Multiclass Class Results:
2. Five Class Classification (A vs B vs C vs D vs E)
Specificity Sensitivity F1-Score SUPPORT
SET A 93% 89% 91% 28
SET B 92% 86% 89% 14
SET C 43% 60% 50% 10
SET D 78% 75% 77% 24
SET E 96% 92% 94% 24
ACCURACY 0.83% 100
MACRO AVG 80% 80% 80% 100
WEIGHTED
AVG
85% 83% 84% 100
RESULTS COMPARISION
Author/Year Methodology Accuracy
Guo et al.[12]
2010
Multiple layer Perceptron 95%
Tawfik et al.[13]
2016
Support Vector Machine 97%
Wani et al.[14]
2019
Artificial Neural Network
(ANN)
95%
Chowdhury et al.[15]
2019
Convolution Neural Network 99%
Chua et al.[16]
2011
Gaussian Based Classification 93%
Proposed Work Transformer Neural Network 97%
RESULTS
2. Experiment on CHB-MIT Dataset
Patient ID Specificity Sensitivity Accuracy F1-Score
CHB01 97% 99% 99% 98%
CHB02 90% 95% 95% 92%
CHB03 97% 99% 99% 98%
CHB04 99% 99% 99% 99%
CHB05 97% 99% 99% 98%
CHB06 99% 100% 100% 99%
CHB07 99% 100% 100% 99%
CHB08 97% 99% 99% 98%
CHB09 99% 100% 100% 99%
CHB10 99% 99% 99% 99%
CHB11 97% 99% 99% 98%
CHB12 97% 99% 99% 98%
CHB13 97% 99% 99% 98%
CHB14 97% 99% 99% 98%
CHB15 99% 99% 99% 99%
CHB16 90% 95% 95% 92%
CHB17 97% 99% 99% 98%
CHB18 95% 97% 97% 93%
CHB19 90% 99% 99% 98%
CHB20 92% 95% 95% 93%
CHB21 96% 95% 95% 94%
CHB22 90% 99% 97% 91%
CHB23 99% 100% 100% 98%
CHB24 93% 92% 92% 90%
Average 95.91% 98.12% 98.04% 96.29%
 Overall Sensitivity = 98.12%
 Overall Specificity = 95.91%
 Overall Accuracy = 98.04%
 Overall F1-Score = 96.29%
CONCLUSION
 This research introduces a methodology for epilepsy seizure detection of EEG signals using transformer based
deep learning model.
 To extract meaningful features and transform the data into the frequency domain, wavelet decomposition was
employed, demonstrating its efficacy in this context.
 The proposed algorithm was rigorously evaluated using the well-known Bonn dataset and the CHB-MIT scalp
EEG database. The results showcased the remarkable performance of our method across these datasets,
affirming its effectiveness in capturing the distinguishing characteristics of brain signals.
 Looking ahead, future endeavors will focus on further validating the proposed method by extending its
evaluation to encompass a broader range of databases. This expansion will allow for a more comprehensive
assessment of the algorithm's robustness and generalizability across diverse populations.
REFRENCES
1. WHO Improving Access to Epilepsy Care Available online:
https://www.who.int/mental_health/neurology/epilepsy/en/(accessed on 29 August 2019).
2. 17. Andrzejak RG, Lehnertz K, Rieke C, Mormann F, David P, Elger CE (2001) Indications of nonlinear deterministic and
finite dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state,
Phys. Rev. E, 64, 061907
1. EEG CHB-MIT Dataset Available online: https://physionet.org/content/chbmit/1.0.0/
2. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, L lion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia
Polosukhin. Attention Is All You Need. arXiv:1706.03762,2017, doi:10.48550/arXiv.1706.0376
3. Sandeep Kumar, Yusuf Uzzaman Khan, “ Biomedical signals classification with transformer based model ” IEEE
International Conference on Power, Instrumentation, Energy and Control (PIECON-2023), Aligarh Muslim University,
India. DOI: 10.1109/PIECON56912.2023.10085908.
4. 1. Z. Wei, J. Zou, J. Zhang, and J. Xu, “Automatic epileptic EEG detection using convolutional neural network with
improvements in time-domain”, Biomed Signal Process Control, Vol. 53, 2019, doi: 10.1016/j.bspc.2019.04.028.
REFRENCES
7. G. C. Jana, R. Sharma, and A. Agrawal, “A 1D-CNNSpectrogram Based Approach for Seizure Detection from EEG
Signal”, Procedia Computer Science, Vol. 167, pp. 403–412, 2020, doi: 10.1016/j.procs.2020.03.248.
8. G. Dhiman, D. Oliva, A. Kaur, K. K. Singh, S. Vimal, A. Sharma, and K. Cengiz, “Bepo: A novel binary emperor penguin
optimizer for automatic feature selection,” Knowledge-Based Systems, vol. 211, p. 106560, 2021, doi:
10.1016/j.knosys.2020.106560.
9. K. Singh and J. Malhotra, “Two-layer lstm network-based prediction of epileptic seizures using eeg spectral features,”
Complex & Intelligent Systems, pp. 1–14, 2022, doi: 10.1007/s40747-021-00627-z
10. D. Thara, B. Prema Sudha, and F. Xiong, “Epileptic seizure detection and prediction using stacked bidirectional long short-
term memory,” Pattern Recognition Letters, vol. 128, pp. 529–535, 2019, doi: 10.1016/j.patrec.2019.10.034.
11. S. Raghu, N. Sriraam, A. S. Hegde, and P. L. Kubben, “A novel approach for classification of epileptic seizures using
matrix determinant,” Expert Systems with Applications, vol. 127, pp. 323– 341, 2019, doi: 10.1016/j.eswa.2019.03.021.
12. Guo, L.,Rivero, D.,Dorado, J. Rabunal, J.R., Pazos,” A automatic epileptic seizure detection in EEGs based online length
feature and artificial neural networks,” J. Neurosci. Methods 191(1), 101–109 (2010), doi: 10.1016/j.jneumeth.2010.05.020.
REFRENCES
13. Tawfik, N.S., Youssef, S.M., Kholief, M,” A hybrid automated detection of epileptic seizures in EEG records,” Comput.
Electr. Eng. 53, 177–190 (2016), doi: 10.1016/j.compeleceng.2015.09.001.
14. Wani, S.M., Sabut, S., Nalbalwar, S.L,” Detection of epileptic seizure using wavelet transform and neural network
classifier, “Computing, Communication and Signal Processing, pp. 739–747. Springer, Singapore 2019, doi: 10.1007/978-
981-13-1513-8_75
15. Chowdhury, T.T., Hossain, A., Fattah, S.A., Shahnaz, C,” Seizure and non-seizure EEG signals detection using 1-D
convolutional neural network architecture of deep learning algorithm,”2019 1st International Conference on Advances in
Science, Engineering and Robotics Technology (ICASERT), pp. 1–4. IEEE ,2019, doi: 10.1109/ICASERT.2019.8934564.
16. Chua, K.C., Chandran, V., Acharya, U.R., Lim, C.M, “ Application of higher order spectra to identify epileptic EEG,” J.
Med. Syst. 35(6), 1563–1571 (2011), doi: 10.1007/s10916-010-9433-z
THANK YOU
Presentation by Sandeep Kumar

Biomedical Signals Classification With Transformer Based Model.pptx

  • 1.
    Biomedical Signals ClassificationWith Transformer Based Model Presented by Sandeep Kumar M.Tech in Electrical Engineering (Instrumentation and Control) Zakir Hussain College of Engineering and Technology Aligarh Muslim University Aligarh, India sandeepietmjpru@gmail.com
  • 2.
    TABLE OF CONTENT 1.Introduction 2. Problem Statement 3. EEG Dataset 4. Methodology 5. Transformer Model 6. Performance Analysis 7. Results 8. Conclusion
  • 3.
    INTRODUCTION  A biosignalor biomedical signal is any signal in living beings that can be continually measured and monitored. The term biosignal is often used to refer to bioelectrical signals, but it may refer to both electrical and non- electrical signals.  Electrical biosignals, or bioelectrical time signals, usually refers to the change in electric current produced by the sum of an electrical potential difference across a specialized tissue, organ or cell system like the nervous system.  EEG, ECG, EOG and EMG are measured with a differential amplifier which registers the difference between two electrodes attached to the skin.  The Transformer Neural Network is a novel architecture that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease.
  • 4.
    PROBLEM STATEMENT Perform aclassification task on EEG signals from people with suspected epilepsy with the help of transformer neural network.
  • 5.
    EEG DATASETS 1. BonnEEG Dataset :- This EEG datasets[2] come from Bonn University in Germany, which is also the location of the database that is publicly accessible to the general public. a) Every class contains 100 text files which represent 100 channels. b) Database includes five distinct classes, from class A to class E c) Each channels contains 4097 samples d) Sampling frequency = 173.6 Hz e) Time duration of the signals = 4097/173.6 = 23.6 sec
  • 6.
    EEG DATASETS 1. BonnEEG Dataset :- Set A Set B Set C Set D Set E Healthy Healthy Epilepsy Epilepsy Epilepsy Open Eyes Close Eyes Interictal Interictal Ictal Surface EEG Surface EEG Intracranial Intracranial Intracranial
  • 7.
    EEG DATASETS 2. CHB-MITEEG Dataset :- This database[3], collected at the Children’s Hospital Boston, consists of EEG recordings from pediatric subjects with intractable seizures. Subjects were monitored for up to several days following withdrawal of anti-seizure medication in order to characterize their seizures and assess their candidacy for surgical intervention. a) Recordings, grouped into 23 cases, were collected from 22 subjects (5 males, ages 3–22; and 17 females, ages 1.5–19). b) All signals were sampled at 256 samples per second (Hz) with 16-bit resolution.
  • 8.
    EEG DATASETS EEG ofa channel from CHB-MIT Dataset
  • 9.
  • 10.
    EEG DATAACQUISITION Data acquisitionis a crucial step in the process of building a classification model. It involves collecting and gathering the necessary data that will be used to train and evaluate the classification algorithm.  The Bonn EEG dataset is available in text format (.txt), and  The CHB-MIT dataset is available in European Data Format (.edf).
  • 11.
    PRE-PROCESSING The main purposeof the pre-processing step in classification is that to remove the artifact and convert the data from one domain to another domain i.e., time domain to frequency domain.  In case of Bonn EEG dataset there is no need to removing artifact because Bonn Dataset is clean EEG.  But on the other side, the CHB-MIT dataset having 60 Hz baseline interference and other artifacts.
  • 12.
    WAVELET DECOMPOSITION  Afterremoving baseline interference and artifacts both the datasets transform into time-frequency domain with the help of wavelet transform and decomposed the dataset through wavelet transform.  The wavelet transform provides both time and frequency localization, which means it can identify where specific frequencies occur in the signal at different time points.  The DWT decomposition proceeds in a hierarchical manner, resulting in a tree-like structure known as a decomposition tree or wavelet tree.  The decomposition process involves low-pass filtering (approximation) and high-pass filtering (detail) operations, followed by down sampling.
  • 13.
  • 14.
    FEATURE EXTRACTION Feature extractionis a crucial step in the classification process, especially when dealing with high-dimensional data.  It involves selecting or transforming raw input data into a reduced set of relevant features that capture the essential information necessary for accurate classification.  By extracting meaningful features, we can improve the performance and efficiency of classification algorithms.  Here are some statistical features used in this methodology: 1. Skewness: It is a measure of the asymmetry of a distribution. Skewness can help identify the tail behaviour of the distribution. 𝜇3 = 𝑖 𝑁 (𝑋𝑖 − 𝑋)3 𝑁 − 1 ∗ 𝜎3
  • 15.
    FEATURE EXTRACTION 2. Coherence:Coherence in signal processing pertains to a statistical metric quantifying the level of similarity or correlation between two signals in the frequency domain. 𝐶𝑥𝑦 = 𝑃𝑥𝑦 2 𝑃𝑥𝑥 − 𝑃𝑦𝑦 3. Standard Deviation: The standard deviation is a statistical measure that quantifies the amount of variation or dispersion in a set of data. 𝜎 = (𝑋𝑖 − 𝜇) 2 𝑁 4. Power Spectral Density: The power spectral density (PSD) or power spectrum density is a term used to describe the distribution of average power of a signal x(t) in the frequency domain. 𝑆 𝜔 = lim 𝜏→∞ 𝑋(𝜔) 2 𝜏
  • 16.
    CLASSIFICATION After extracting featuresfrom the EEG data, the next step involves classifying the data. This classification process aims to define boundaries between different classes and assign labels based on the observed characteristics of the data.  The transformer consists of an encoder and a decoder.  It learns to capture the relationships between different feature elements and model the dependencies in the data.  The transformer model is trained using labelled EEG data, where the dataset is divided into training and validation sets.  Throughout the training process, the model acquires the ability to classify EEG features by learning from the labelled data.
  • 17.
    TRANSFORMER MODEL  Transformeris the new type of neural network architectures which works on the Attention Mechanism.  The Transformer model was proposed in the paper “Attention is All You Need”.[4]  It is basically used for Natural Language Processing(NLP).
  • 18.
    SEQ-TO-SEQ ATTENTION  Tounderstand the Transformer , it necessary to understand the Seq-to-Seq Attention. Architecture of Seq-to-Seq with Attention
  • 19.
  • 20.
    HOW S2SwA WORKS? 1. Look at the set of encoder hidden states it received – each encoder hidden state is most associated with a certain vector in input. 2. Give each hidden state a score. 3. Multiply each hidden state by its softmaxed score, thus amplifying hidden states with high scores, and drowning out hidden states with low scores. 4. Attention Step: We use the encoder hidden states and the h4 vector to calculate vector (C4) for this time step. 5. We concatenate h4 and C4 into one vector. 6. We pass this vector through a feedforward neural network. 7. The output of the feedforward neural networks indicates the output of this time step.
  • 21.
  • 22.
    HOW TRANSFORMER WORKS? 1. Creating Self-Attention Vectors The first step in calculating self-attention vectors (Q,K,V) from each of the encoder’s input vectors. These vectors are created by multiplying the embedding by three weight matrices (𝑊𝑄, 𝑊𝐾, 𝑊𝑉).
  • 23.
    HOW TRANSFORMER WORKS? 2. Score Calculating The score is calculated by taking the dot product of the Q with the K of the respective input vector we’re scoring. Divide the score by the square root of the dimension of key vector. 3. Apply Softmax Softmax normalizes the scores so they’re all positive and add up to 1.
  • 24.
    HOW TRANSFORMER WORKS? 4. Evaluate Self-Attention Multiply each value vector by the softmax score and sum up the weighted value vector. This produces the output of the self-attention layer at this position. 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛 𝑄, 𝐾, 𝑉 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥 𝑄𝐾𝑇 𝑑𝐾 𝑉
  • 25.
    HOW TRANSFORMER WORKS? 5. Evaluate Multihead-Attention The self-attention layer redefined by adding a mechanism called “multi-headed” attention. This improves the performance of the attention layer in two ways- • It expands the model’s ability to focus on different positions. • With multi-headed attention we have not only one, but multiple sets of Q/K/V weight matrices. Then, after training, each set is used to project the input into a different representation subspace. 𝑀𝑢𝑙𝑡𝑖𝐻𝑒𝑎𝑑 𝑄, 𝐾, 𝑉 = 𝑐𝑜𝑛𝑐𝑎𝑡 ℎ𝑒𝑎𝑑1,ℎ𝑒𝑎𝑑2, … … ℎ𝑒𝑎𝑑ℎ 𝑊𝑂 𝑤ℎ𝑒𝑟𝑒 ℎ𝑒𝑎𝑑𝑖 = 𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄𝑊 𝑖 𝑄 , 𝐾𝑊𝑖 𝐾 , 𝑉𝑊𝑖 𝑉 )
  • 26.
  • 27.
    WHY TRANSFORMER ? Recurrent neural networks (RNN) are capable of looking at previous inputs to predict the next possible output. But RNN’s curse of the shorter window of reference, resulting in Vanishing Gradient.  This is still true for Gated Recurrent Units (GRU’s) and Long-short Term Memory (LSTM’s) networks, although they do have a bigger capacity to achieve longer-term memory compared to RNN.  The attention mechanism, in theory, have an infinite window to reference from, therefore being capable of using the entire input. In terms of training, Transformers is definitely faster because of the parallel processing capability.  Transformer having lower compute costs, smaller carbon footprint.
  • 28.
    PERFORMANCE ANALYSIS A confusionmatrix is a matrix that summarizes the performance of a transformer model on a set of test data. It is often used to measure the performance of classification models, which aim to predict a categorical label for each input instance.  True Positive: The model's accurate prediction count of positive instances.  True Negative: The model accurately predicted the number of negative instances.  False Positive: The count of cases the model predicted as positive but are, in fact, negative  False Negative: The count of cases in which the model predicted negativity, but they are, in fact, positive.
  • 29.
    PERFORMANCE ANALYSIS The performanceof the trained model can be assessed by evaluating it on an independent test dataset. Metrics such as sensitivity, specificity, accuracy, and F1-score can be utilized to evaluate the classification performance.  Accuracy: Accuracy is used to measure the performance of the model.  Sensitivity: Sensitivity is a measure of how accurate a model’s positive predictions are.  Specificity: The number of correct negative predictions divided by the total number of negatives.  F1-Score: F1-score is used to evaluate the overall performance of a classification model. It is the harmonic mean of sensitivity and specificity. 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 = 𝑇𝑁 𝑇𝑁 + 𝐹𝑃 𝐹1 − 𝑆𝑐𝑜𝑟𝑒 = 2 𝑇𝑃 2𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
  • 30.
    RESULTS 1. Experiment onThe Bonn Dataset On the Bonn dataset, the experiment performs from binary class to five class classification. The dataset was broken down in five binary combinations i.e., A vs E, B vs E, C vs E, D vs E, ABCD vs E, one tertiary combination i.e., AB vs CD vs E, and one five class combination i.e., A vs B vs C vs D vs E.  Binary Class Results:
  • 31.
    RESULTS Combinations Sensitivity SpecificityAccuracy F1-Score A vs E 100% 100% 100% 100% B vs E 100% 94.74% 97.50% 97.67% C vs E 95.24% 94.74% 95.0% 95.24% D vs E 100% 95.45% 97.50% 97.30% ABCD vs E 100% 76.92% 94.00% 96.10% Overall Performance 99.04% 92.37% 96.80% 97.26%
  • 32.
    RESULTS  Multiclass ClassResults: 1. Tertiary Classification (AB vs CD vs E) Specificity Sensitivity F1-Score Support NORMAL 91% 95% 93% 22 INTERICTAL 88% 93% 90% 15 SEIZURE 100% 85% 92% 13 ACCURACY 92% 50 MACRO AVG 93% 91% 92% 50 WEIGHTED AVG 92% 92% 92% 50
  • 33.
    RESULTS  Multiclass ClassResults: 2. Five Class Classification (A vs B vs C vs D vs E) Specificity Sensitivity F1-Score SUPPORT SET A 93% 89% 91% 28 SET B 92% 86% 89% 14 SET C 43% 60% 50% 10 SET D 78% 75% 77% 24 SET E 96% 92% 94% 24 ACCURACY 0.83% 100 MACRO AVG 80% 80% 80% 100 WEIGHTED AVG 85% 83% 84% 100
  • 34.
    RESULTS COMPARISION Author/Year MethodologyAccuracy Guo et al.[12] 2010 Multiple layer Perceptron 95% Tawfik et al.[13] 2016 Support Vector Machine 97% Wani et al.[14] 2019 Artificial Neural Network (ANN) 95% Chowdhury et al.[15] 2019 Convolution Neural Network 99% Chua et al.[16] 2011 Gaussian Based Classification 93% Proposed Work Transformer Neural Network 97%
  • 35.
    RESULTS 2. Experiment onCHB-MIT Dataset Patient ID Specificity Sensitivity Accuracy F1-Score CHB01 97% 99% 99% 98% CHB02 90% 95% 95% 92% CHB03 97% 99% 99% 98% CHB04 99% 99% 99% 99% CHB05 97% 99% 99% 98% CHB06 99% 100% 100% 99% CHB07 99% 100% 100% 99% CHB08 97% 99% 99% 98% CHB09 99% 100% 100% 99% CHB10 99% 99% 99% 99% CHB11 97% 99% 99% 98% CHB12 97% 99% 99% 98% CHB13 97% 99% 99% 98% CHB14 97% 99% 99% 98% CHB15 99% 99% 99% 99% CHB16 90% 95% 95% 92% CHB17 97% 99% 99% 98% CHB18 95% 97% 97% 93% CHB19 90% 99% 99% 98% CHB20 92% 95% 95% 93% CHB21 96% 95% 95% 94% CHB22 90% 99% 97% 91% CHB23 99% 100% 100% 98% CHB24 93% 92% 92% 90% Average 95.91% 98.12% 98.04% 96.29%  Overall Sensitivity = 98.12%  Overall Specificity = 95.91%  Overall Accuracy = 98.04%  Overall F1-Score = 96.29%
  • 36.
    CONCLUSION  This researchintroduces a methodology for epilepsy seizure detection of EEG signals using transformer based deep learning model.  To extract meaningful features and transform the data into the frequency domain, wavelet decomposition was employed, demonstrating its efficacy in this context.  The proposed algorithm was rigorously evaluated using the well-known Bonn dataset and the CHB-MIT scalp EEG database. The results showcased the remarkable performance of our method across these datasets, affirming its effectiveness in capturing the distinguishing characteristics of brain signals.  Looking ahead, future endeavors will focus on further validating the proposed method by extending its evaluation to encompass a broader range of databases. This expansion will allow for a more comprehensive assessment of the algorithm's robustness and generalizability across diverse populations.
  • 37.
    REFRENCES 1. WHO ImprovingAccess to Epilepsy Care Available online: https://www.who.int/mental_health/neurology/epilepsy/en/(accessed on 29 August 2019). 2. 17. Andrzejak RG, Lehnertz K, Rieke C, Mormann F, David P, Elger CE (2001) Indications of nonlinear deterministic and finite dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state, Phys. Rev. E, 64, 061907 1. EEG CHB-MIT Dataset Available online: https://physionet.org/content/chbmit/1.0.0/ 2. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, L lion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need. arXiv:1706.03762,2017, doi:10.48550/arXiv.1706.0376 3. Sandeep Kumar, Yusuf Uzzaman Khan, “ Biomedical signals classification with transformer based model ” IEEE International Conference on Power, Instrumentation, Energy and Control (PIECON-2023), Aligarh Muslim University, India. DOI: 10.1109/PIECON56912.2023.10085908. 4. 1. Z. Wei, J. Zou, J. Zhang, and J. Xu, “Automatic epileptic EEG detection using convolutional neural network with improvements in time-domain”, Biomed Signal Process Control, Vol. 53, 2019, doi: 10.1016/j.bspc.2019.04.028.
  • 38.
    REFRENCES 7. G. C.Jana, R. Sharma, and A. Agrawal, “A 1D-CNNSpectrogram Based Approach for Seizure Detection from EEG Signal”, Procedia Computer Science, Vol. 167, pp. 403–412, 2020, doi: 10.1016/j.procs.2020.03.248. 8. G. Dhiman, D. Oliva, A. Kaur, K. K. Singh, S. Vimal, A. Sharma, and K. Cengiz, “Bepo: A novel binary emperor penguin optimizer for automatic feature selection,” Knowledge-Based Systems, vol. 211, p. 106560, 2021, doi: 10.1016/j.knosys.2020.106560. 9. K. Singh and J. Malhotra, “Two-layer lstm network-based prediction of epileptic seizures using eeg spectral features,” Complex & Intelligent Systems, pp. 1–14, 2022, doi: 10.1007/s40747-021-00627-z 10. D. Thara, B. Prema Sudha, and F. Xiong, “Epileptic seizure detection and prediction using stacked bidirectional long short- term memory,” Pattern Recognition Letters, vol. 128, pp. 529–535, 2019, doi: 10.1016/j.patrec.2019.10.034. 11. S. Raghu, N. Sriraam, A. S. Hegde, and P. L. Kubben, “A novel approach for classification of epileptic seizures using matrix determinant,” Expert Systems with Applications, vol. 127, pp. 323– 341, 2019, doi: 10.1016/j.eswa.2019.03.021. 12. Guo, L.,Rivero, D.,Dorado, J. Rabunal, J.R., Pazos,” A automatic epileptic seizure detection in EEGs based online length feature and artificial neural networks,” J. Neurosci. Methods 191(1), 101–109 (2010), doi: 10.1016/j.jneumeth.2010.05.020.
  • 39.
    REFRENCES 13. Tawfik, N.S.,Youssef, S.M., Kholief, M,” A hybrid automated detection of epileptic seizures in EEG records,” Comput. Electr. Eng. 53, 177–190 (2016), doi: 10.1016/j.compeleceng.2015.09.001. 14. Wani, S.M., Sabut, S., Nalbalwar, S.L,” Detection of epileptic seizure using wavelet transform and neural network classifier, “Computing, Communication and Signal Processing, pp. 739–747. Springer, Singapore 2019, doi: 10.1007/978- 981-13-1513-8_75 15. Chowdhury, T.T., Hossain, A., Fattah, S.A., Shahnaz, C,” Seizure and non-seizure EEG signals detection using 1-D convolutional neural network architecture of deep learning algorithm,”2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT), pp. 1–4. IEEE ,2019, doi: 10.1109/ICASERT.2019.8934564. 16. Chua, K.C., Chandran, V., Acharya, U.R., Lim, C.M, “ Application of higher order spectra to identify epileptic EEG,” J. Med. Syst. 35(6), 1563–1571 (2011), doi: 10.1007/s10916-010-9433-z
  • 40.