La enfermedad de Parkinson es el segundo trastorno neurodegenerativo más común y afecta a más de 7 millones de personas en todo el mundo. En este trabajo, clasificamos a los sujetos con la enfermedad de Parkinson utilizando datos de la pulsación de los dedos en un teclado. Utilizamos una base de datos gratuita de Physionet con más de 9 millones de registros, preprocesada para eliminar los datos atípicos. En la etapa de extracción de características, obtuvimos 48 características. Utilizamos Google Colaboratory para entrenar, validar y probar nueve algoritmos de aprendizaje supervisado que detectan la enfermedad. Como resultado, conseguimos un grado de precisión superior al 98 %.
2. Classification of Subjects with Parkinson's
Disease using Finger Tapping Dataset
Escuela Superior Politécnica del Litoral, ESPOL, Guayaquil, Ecuador
Facultad de Ingeniería en Electricidad y Computación, FIEC
Víctor Asanza , Nadia N. Sánchez-Pozo, Leandro L. Lorente-Leyva, Diego Hernan Peluffo-
Ordóñez, Fancis R. Loayza and Enrique Peláez
4. Topics
• Introduction
• Related Work
• Dataset
• Methodology
• Results and Discussion
• Conclusions
Classification of Subjects with Parkinson's
Disease using Finger Tapping Dataset
5. Introduction
• Parkinson's disease (PD) is the second most
common neurodegenerative disorder (Váradi,
2020).
• Tremor, bradykinesia, muscle stiness,
balance disorders, oral problems when
speaking.
• The World Health Organization (WHO)
estimated:
• By 2030 around 12 million people suffer
from PD.
• It is essential to detect this disease at an early
stage to improve the quality of life.
6. Related Work
• Chen et al. use Magnetic Resonance Imaging (MRI) in 2-dimensional pixels for feature
and biomarker extraction.
• Support Vector Machine (SVM) algorithm (93%).
• Gharehchopogh et al. use NNs and a voice dataset.
• Multilayer Perceptron (MLP) (93.22%).
• Wroge et al. use two data preprocessing models, Audio-Visual Emotion Recognition
Challenge (AVEC).
• Gradient-powered classifier (GBC) (86%).
• Urcuqui et al. conducted an experiment in which they asked volunteers to walk a four-
meter route three times. They video-recorded them and analyzed the data from the
movements of their legs.
• Random forest algorithm (RF) (82%).
• Wang et al. use walking dataset
• Backpropagation neural network algorithm (BPNN) (42%)
7. Dataset
• Dataset provided by Physionet (Goldberger et al., 2000).
• All subjects performed the keystroke test voluntarily after signing informed consent.
• TAPPY app records how many times and which keys were pressed or released.
8. Dataset
• BirthYear: Year of birth (e.g., 1952)
• Gender: male or female (e.g., Female)
• Parkinsons: patient or control (e.g., True)
• Tremors: yes or not (e.g., True)
• DiagnosisYear: the evolution of the disease in
years (e.g., 2000)
• Sided: hemibody affected, right, left, or none,
performed by self-evaluation (e.g., Left)
• Impact: how much the disease impacts your daily
life, little, moderately, or severely (e.g., Severe)
• Levodopa: the binary response in the use of
levodopa drug, yes or not (e.g., True)
• DA: the binary response in the use of dopamine
agonist drug, yes or not (e.g., True)
• Other: the binary response in the use of other drugs,
yes or not (e.g., False)
The folder Archived Users contains one .txt file for each one of the 227 volunteers.
Each file contains the following demographic information:
9. Methodology
Data Preprocessing Feature Extraction Classification Algorithm
Archived Users
227
Txt Files
Tappy Data
622
Txt Files
File Intersection by UserKey
(217/227 valid files)
Min-max normalization
. CSV file creation
48 Features
127 Examples
Feature Extraction Algorithm
a)
c)
b)
• 162 were patients with PD
• 55 were healthy controls.
10. Methodology
We processed the 48 columns representing 48 features as follows:
• BirthYear and DiagnosisYear columns were changed to numeric type.
• Columns with True or False values were coded as binary data (0, 1): Female, Tremors, Levadopa, DA,
MAOB and Other.
• The Side variable was coded (one-hot encoded): Side Left, Side None, Side Right.
• The categorical Impact variable was coded (one-hot encoded): Impact Medium, Impact Mild, Impact
None, Impact Severe
• The Hold time, Latency time and Flight time; were coded (one-hot encoded) and divided into
address groups: "LL", address "LR", address "LS", address "RL", address "RR", address "RS", address
"SL", address "SR" and "SS".
• Finally, the labels are the diagnosis of PD.
Data Preprocessing Feature Extraction Classification Algorithm
11. Methodology
• The resulting .CSV le with 217 rows and 48 columns is
• our Dataset, which we used for the training and
validation process of the classification algorithms.
• 70% - Training Set
• 30% - Testing Set
Classification Algorithm
Subject Parkinsonian (1) or Subject
without Parkinson (0)
Features (F) F1 … F48 Label
217
Examples
7 … 13 1
… … … …
5 … 20 0
Training Set
70%
152
Examples
Testing Set
30%
65
Examples
Dataset
Data Preprocessing Feature Extraction Classification Algorithm
18. Conclusions
• Using ML techniques, we are able to classify patients with PD with an accuracy of 98.04%.
• 48 features for each participant
• Applications that need high accuracy could use the NB or MLP algorithms.
• Applications that need real-time responses
• SVM and NB (values of about 10 ns).
• Implementation of machine learning models in the classification of patients with PD
showed positive results, especially the NB model.
• We do not perform hyperparameter optimization, all are scikit-learn defaults.
• We recommend preprocessing the data extensively because a cleaner dataset improves
prediction accuracy when working with classification algorithms.
• In terms of future work, we propose to add an automatic feature selection stage to identify
the features that contribute to the classifiers.
• We suggest using Deep Learning-based methods as classifiers.
20. For more information
Víctor Asanza
Mail: vasanza@espol.edu.ec
Facultad de Ingeniería en Electricidad y Computación, FIEC
Escuela Superior Politécnica del Litoral, ESPOL
Campus Gustavo Galindo Km 30.5 Vía Perimetral, P.O. Box 09-01-5863
090150 Guayaquil, Ecuador