Parkinson Voice Dataset with ML

•

0 likes•15 views

This document discusses building a machine learning model to predict Parkinson's disease using voice recordings. It describes the dataset containing recordings from 80 subjects, with 40 having Parkinson's. 44 acoustic features were extracted from each recording. The model faces challenges from multicollinearity between correlated features and from the small replicated dataset. Various techniques are proposed to address these issues, including feature selection and engineering, dimensionality reduction, modeling techniques like neural networks, and constraining the model using a causal diagram. Evaluation on a separate test set aims for high sensitivity while balancing specificity and accuracy.

Health & Medicine

Prediction model from
Parkinson Dataset with
replicated features
Sermkiat Lolak, M.D.
Datascience for Healthcare, Department of Biostatistics and Clinical
Epidemiology,
Faculty of Medicine Ramathibodi Hospital, Mahidol University

Symptoms
Gait Disturbance
Motor disabilities ,
Bradykinesia
depression, apathy,
and sleep disorder
Quality of Life

Early diagnosis —> Early Treatment
Medical
Surgical

Diagnosis
Clinical symptoms
Imaging
Laboratory
Biosensor signal

Biosensor signal
Motor related
Gait
Gesture
Voice

Voice
PD, Laryngeal / Diaphragm control can
produce vocal tremor
Vocal fold stiffness and bowing cause changes
in vocal fold mass and tension.
Fatigue across a prolonged voice loading task.
Not sustained phonation , Unstable
fundamental frequency (high jitter) and
amplitude (excess shimmer).

PD Voice Dataset
80 subjects,
40 of them
with PD
phonation
of /a/ vowel

44 features were extracted from
each voice recording.
27 acoustic features
4 different characteristic families.
Features within each family are
highly correlated.

Replicated Data
Record 3 times
each subject ->
240 records
Multicollinearity
problem

Harmonic-to-Noise
Ratio (HNR)
Weak laryngeal control : incomplete
glottal closure.
Excess noise from unphonated air
leaks from the glottis.
This leads to lower HNR values.

MFCC
Derivatives Thirteen Mel Frequency
Cepstral Coefficients (MFCCs)
MFCCs : speech and speaker
recognition.
PD : problem with articulation
Low MFCC coefficients used for PD
diagnosis and tracking

IID & Statistical
Learning
Learning : Assumption of
Independent and Identical
distribution of data
Generalization
If not meeting assumption : Weaker
model

Learning
Machine Learning / Neural network :
Estimate conditional probability
(Lex Fridman / Judea Pearl)
Disentangled :
Entangled :
P(x1, x2, . . , xn) = ∏
n
i=1
P(xi |PA)
P(x1, x2, . . , xn) = ∏
n
i=1
P(xi |xi+1, . . , xn)
Bernhard Schölkopf (2019)

Problem of this
dataset
Time series
Highly correlated features

Correlated Voice
recording features
Delta 0 Delta 1 Delta 2
y
x1 x2 x3
y
Z
Common Cause Principle : Reichenbach (1956)

What to do?
Instrument variable : Z -> X -> Y
Mediator : X -> M -> Y
Reduce dimension :
Representative, Mean , PCA ,
Autoencoder
Modelling : eg. Neural network

Time series
Voice.
_t
PD_t PD_t+1
Voice
_t+1
Time series graph
HMM ?
Voice.
_t
Voice
_t+1
Base
line V.
Structural Causal
Diagram

What to do?
Model : RNN (LSTM , GRU) , CNN
Block information flow : Conditioning
Use only 1 session / person in training

Features
Engineering
Normalization :
Easier to
learn , esp. SVM
Use the mean
of the family
features

Experiments setting
Features engineering
Features selection
Model selection

Modelling
Linear : SVM
Tree based / Ensemble : Random Forest
Neural Network

Experiments
Seperate test set
Experiment each steps while fixing
other steps constant

Evaluation Metric
Early or
preemptive
diagnosis
Focus on high
Sensitivity (Recall)
AUC

Selected model
Neural Network 7 layers
Normalization , Representative
feature selection , Use only 1
session of patient (session 3) in
training

This model Naranjo et al*
Sensitivity /
Recall
1.0 0.825
Specificity 0.867 0.900
Precision 0.882 0.891
AUC 0.982 0.951
Accuracy 0.933 0.862
*Computer Methods and Programs in Biomedicine, 2017-04-01, Volume 142, Pages 147-156

Small dataset
Multicollinearity
Explicit Structural Causal Diagram
Advanced / mixed features engineering
and modelling
Discussion

Low specificity
Constrain: Orange
Hyperparameter tuning

Application
Transfer Learning
Application ,
training on mobile
phone voice
Acknowledgement
disease - go see
doctor

Similar to Parkinson Voice Dataset with ML

Pioneers advancing health research, prevention and treatment will help us understand emerging best practices where targeted assessments, monitoring and interventions can transfer into significant healthcare and quality of life outcomes. -- Chair: Alvaro Fernandez, CEO & Co-Founder of SharpBrains -- Dr. Madeleine S Goodkind, staff psychologist at New Mexico VA Health Care System -- Dr. Randy McIntosh, Vice-president of Research and Director of Baycrest’s Rotman Research Institute -- Chris Berka, CEO and Co-Founder of Advanced Brain Monitoring (ABM) Presentation @ The 2015 SharpBrains Virtual Summit http://sharpbrains.com/summit-2015/agenda

How to measure and improve brain-based outcomes that matter in health care

SharpBrains

20211008 修論中間発表

Tomoya Koike

Medical applications of dsp

kanusinghal3

3. speech processing algorithms for perception improvement of hearing impaire...

k srikanth

Joel Yancey Poster (Buonomano-Blair).compressed

Joel Yancey

In this paper, a system, developed for speech encoding, analysis, synthesis and gender identification is presented. A typical gender recognition system can be divided into front-end system and back-end system. The task of the front-end system is to extract the gender related information from a speech signal and represents it by a set of vectors called feature. Features like power spectrum density, frequency at maximum power carry speaker information. The feature is extracted using First Fourier Transform (FFT) algorithm. The task of the back-end system (also called classifier) is to create a gender model to recognize the gender from his/her speech signal in recognition phase. This paper also presents the digital processing of a speech signals (pronounced “A” and “B”) which are taken from 10 persons, 5 of them are Male and the rest of them are Female. Power Spectrum Estimation of the signal is examined .The frequency at maximum power of the English Phonemes is extracted from the estimated power spectrum. The system uses threshold technique as identification tool. The recognition accuracy of this system is 80% on average.

GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL

IJCSEIT Journal

Hearing After Hemispherectomy

The Brain Recovery Project

Recently, a wide range of speech signal processing algorithms (dysphonia measures) aiming to detect patients with Parkinson’s disease (PD). So we have computed 19 dysphonia measures from sustained vowels collected from 375 voice samples from healthy and people suffer from PD. All the features are analysed and the more relevant ones are selected by the Principal component analysis (PCA) to classify the subjects in 4 classes according to the UPDRS (unified Parkinson’s disease Rating Scale) score. We used kfolds cross validation method with (k=4) validation scheme; 75% for training and 25% for testing, along with the Support Vector Machines (SVM) with its different types of kernels. The best result obtained was 92.5% using the PCA and the linear SVM.

Voice Assessments for Detecting Patients with Parkinson’s Diseases in Differe...

IJECEIAES

Telemonitoring of Four Characteristic Parameters of Acoustic Vocal Signal in...

International Journal of Engineering Inventions www.ijeijournal.com

this presentation is about a research paper which deals with the development of a deep-learning model to replicate the human auditor system. A lot of interesting facts about the human auditory cortex has been found out through the model. Ultimately, the model is able to replicate the human both task-wise and structure-wise. In other words, appropriate information about the brain was obtained through the model which was performing like the human.

TASK-OPTIMIZED DEEP NEURAL NETWORK TO REPLICATE THE HUMAN AUDITORY CORTEX

Sairam Adithya

As increasing numbers of people choose to have their genomes sequenced and made available for research, more genomic data is available for analysis by machine learning approaches. Single Nucleotide Polymorphisms (SNPs) are known to be a major factor influencing many physical traits, diseases and other phenotypes. Using publicly available data and tools we predict phenotype from genotype using SNP data (1 to 2 million SNPs). We utilize data analysis and machine learning approaches only, no domain knowledge, so that our automated approach may be generally used to predict different phenotypes from genotype. In the first application of our method we predicted eye color with 87% accuracy.

Predicting phenotype from genotype with machine learning

Patricia Francis-Lyon

Linking brain systems and mental functions requires accurate descriptions of behavioral tasks and fine demarcations of brain regions. Functional Magnetic Resonance Imaging (fMRI) has opened the possibility to investigate how brain activity is modulated by behavior. However, to date, no data collection has systematically addressed the functional mapping of cognitive mechanisms at a fine spatial scale. Most studies so far are bound to one single task, in which functional responses to a handful of contrasts are analyzed and reported as a group average brain map. The Individual Brain Charting (IBC) project stands for a high-resolution (1.5mm), multi-task fMRI dataset, intended to provide an objective basis for the establishment of a neurocognitive atlas based on the individual mapping of the human brain. This data collection refers to a permanent cohort during performance of a wide variety of tasks across many sessions. Data up to the third release---comprising 28 tasks---are publicly available in the OpenNeuro repository (ds002685). Derived statistical maps from the first and second releases can be found in NeuroVault (id6618) and they amount for 205 canonical contrasts described on the basis of 113 cognitive concepts taken from the Cognitive Atlas. These derivatives reveal all together a comprehensive brain coverage of regions engaged in cognitive processes as well as a successful encoding of the functional networks reported by the original studies. As the dataset becomes larger and the ensuing collection of concepts gets richer, finer subject-specific, cognitive topographies can be extracted from the data. We thus explore this individual-functional-atlasing approach in order to link functional segregation of specialized brain regions to elementary mental functions. Results show that individual topographies---common to all tasks---are consistently mapped within and, to a lesser extent, across participants. Besides, prediction scores associated with the reconstruction of contrasts of one task from the remaining ones reveal the quantitative contribution of each task to these common representations. Yet, scores decreased when subjects were permuted between train and test, confirming that topographies are driven by subject-specific variability. Lastly, we demonstrate how cognitive mapping can benefit from contrasts accumulation, by analyzing the functional fingerprints of a set of individualized regions-of-interest from the language network.

Individual functional atlasing of the human brain with multitask fMRI data: l...

Ana Luísa Pinho

Deep learning application to medical imaging: Perspectives as a physician

Hongyoon Choi

Individual functional atlasing of the human brain with multitask fMRI data: l...

Ana Luísa Pinho

Challenges of methodological variability in EEG

RADHA KUMARI

Yeasin

Cody Behles

Unified Framework for Learning Representation from EEG Data

FedEx Institute of Technology

2016 bioinformatics i_wim_vancriekinge_vupload

Prof. Wim Van Criekinge

Bio info-r-matics

Chia-Hsin Liu

Pan1 3rd Brian Litt

MedicineAndHealthNeurolog

Similar to Parkinson Voice Dataset with ML (20)

How to measure and improve brain-based outcomes that matter in health care

20211008 修論中間発表

Medical applications of dsp

3. speech processing algorithms for perception improvement of hearing impaire...

Joel Yancey Poster (Buonomano-Blair).compressed

GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL

Hearing After Hemispherectomy

Voice Assessments for Detecting Patients with Parkinson’s Diseases in Differe...

Telemonitoring of Four Characteristic Parameters of Acoustic Vocal Signal in...

TASK-OPTIMIZED DEEP NEURAL NETWORK TO REPLICATE THE HUMAN AUDITORY CORTEX

Predicting phenotype from genotype with machine learning

Individual functional atlasing of the human brain with multitask fMRI data: l...

Deep learning application to medical imaging: Perspectives as a physician

Individual functional atlasing of the human brain with multitask fMRI data: l...

Challenges of methodological variability in EEG

Yeasin

Unified Framework for Learning Representation from EEG Data

2016 bioinformatics i_wim_vancriekinge_vupload

Bio info-r-matics

Pan1 3rd Brian Litt

Recently uploaded

World Hypertension Day 17th may 2024 ppt

desktoppc

Factors Affecting child behavior in Pediatric Dentistry

Dr Simran Deepak Vangani

A thorough review of supernormal conduction.pptx

Sergio Pinski

DECIPHERING COMMON ECG FINDINGS IN ED.pptx

drwaque

Anuman- An inference for helpful in diagnosis and treatment

abdeli bhadarva

ANATOMY OF THE LOWER URINARY TRACT AND MALE [Autosaved] [Autosaved].pptx

Bright Chipili

Plastic waste has become a pressing global concern in recent decades, posing significant challenges to our environment due to its non-biodegradable nature and causing significant pollution and damage to our planet. Recycling plastic waste is one of the most effective solutions to this dilemma, which is why the aim of our project was to create a system that detects plastic waste using a large dataset with labeled data and one of the most famous deep learning neural networks, “Convolutional Neural Networks,” to classify and speed up the waste collection process and provide an easier recycling process. Thanks to our work, we have achieved 97% accuracy.

CNN-based plastic waste detection system

BOHRInternationalJou1

End Feel -joint end feel - Normal and Abnormal end feel

dranji1

5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now

Sherrylee83

Linearity concept of significance, standard deviation, chi square test, students T- test, ANOVA test , pharmaceutical science, statistical analysis, statistical methods, optimization technique, modern pharmaceutics, pharmaceutics, mpharm 1 unit i sem, 1 year m pharm, applications of chi square test, application of standard deviation , pharmacy, method to compare dissolution profile, statistical analysis of dissolution profile, important statical analysis, m. pharmacy, graphical representation of standard deviation, graph of chi square test, graph of T test , graph of ANOVA test ,formulation of t test, formulation of chi square test, formula of standard deviation.

linearity concept of significance, standard deviation, chi square test, stude...

KavyasriPuttamreddy

In-service education (Nursing Mangement)

Monika Kanwar

Book Trailer: PGMEE in a Nutshell (CEE MD/MS PG Entrance Examination)

Dr. Aryan (Anish Dhakal)

Is Rheumatoid Arthritis a Metabolic Disorder.pptx

Samar Tharwat

Denture base resins materials and its mechanism of action

Dr.shiva sai vemula

Aptopadesha Pramana / Pariksha: The Verbal Testimony

Dr KHALID B.M

รายการตำรับยาแผนไทยแห่งชาติ ฉบับ พ.ศ. ๒๕๖๔ National Thai Traditional Medicine Formulary 2021 Edition กองคุ้มครองและส่งเสริมภูมิปัญญาการแพทย์แผนไทยและแพทย์ฟื้นบ้านไทย กรมการแพทย์แผนไทยและการแพทย์ทางเลือก กระทรวงสาธารณสุข

รายการตํารับยาแผนไทยแห่งชาติ ฉบับ พ.ศ. 2564.pdf

Vorawut Wongumpornpinit

Cas 28578-16-7 PMK ethyl glycidate ( new PMK powder) best suppler

Sherrylee83

Learning Objectives: Contrast the neural and local control of circulations through special regions. Compare the physiological significance of variation in circulations through special regions. Content Overview: This lecture provides a comprehensive exploration of circulation in specialized regions of the body, including coronary, cerebral, splanchnic, cutaneous, fetal, skeletal muscle, and pulmonary circulations. The physiological and anatomical considerations, regulation mechanisms, and unique features of each type of circulation are discussed. Key concepts such as autoregulation, local metabolic factors, and neural influences are covered in detail. Key Topics: Coronary Circulation: Anatomy and physiology of coronary blood supply. Phasic changes in blood flow during systole and diastole. Regulation by autonomic nervous system and local metabolic factors (hypoxia, adenosine). Cerebral Circulation: Control by local metabolic factors (CO2, H+). Autoregulation and hyperemia. Anatomy of cerebral blood supply and special features of cerebral blood vessels. Splanchnic Circulation: Anatomy and regulation of blood flow to the gastrointestinal organs. Role of sympathetic and parasympathetic nervous systems. Chemical factors and the concept of autoregulatory escape. Cutaneous Circulation: Thermoregulatory function and dense nervous innervation. Mechanisms of vasodilation and vasoconstriction. Blood reservoir function and responses to mechanical stimuli. Fetal Circulation: Unique aspects of fetal blood flow including umbilical vessels, ductus venosus, foramen ovale, and ductus arteriosus. Changes in circulation at birth and the stimuli for these changes. Skeletal Muscle Circulation: Regulation during rest and exercise. Role of nervous and local metabolic control. Circulatory adjustments during physical activity. Pulmonary Circulation: Characteristics and regulation of pulmonary blood flow. Zones of blood flow in the lungs and the effect of gravity. Response to hypoxia and other factors influencing pulmonary vascular resistance

Circulation through Special Regions -characteristics and regulation

MedicoseAcademics

Overview: Join Dr. Faiza in this in-depth lecture on cardiac impulse and the rhythmical excitation of the heart. This presentation covers the intricate pathways and mechanisms that ensure the efficient conduction of electrical impulses through the heart, leading to coordinated myocardial contractions. Attendees will gain a comprehensive understanding of the normal sequence of cardiac depolarization, the specialized conductive pathways, and the ionic basis of pacemaker potentials. Learning Objectives: Trace the normal sequence of cardiac depolarization via specialized conductive pathways and cardiac myocytes. Illustrate the velocity of conduction of cardiac impulses through these pathways. Draw and label a typical pacemaker action potential. Describe the ionic basis of various phases of the pacemaker potential. Key Topics: Generation of Impulse: Mechanisms of rhythmical excitation in the heart. Conduction of Impulse: How impulses are transmitted and why atrial syncytium contracts ahead of ventricular syncytium. Specialized Excitatory & Conductive System: Detailed examination of the SA node, AV node, bundle branches, and Purkinje fibers. SA Nodal Action Potential: Phases of pacemaker potential, ionic movements, and self-excitation mechanisms. Transmission Pathways: Role of anterior and posterior fascicles, internodal, and interatrial pathways. Clinical Significance: Implications of delayed conduction and conditions like Stokes-Adams syndrome and ectopic pacemakers. Target Audience: This lecture is tailored for medical students, physiology scholars, healthcare professionals, and anyone interested in cardiac physiology and the mechanisms of cardiac impulse conduction. It is particularly beneficial for those preparing for exams or seeking to enhance their clinical knowledge in Applied physiology of cardiovascular system and internal medicine.

Cardiac Impulse: Rhythmical Excitation and Conduction in the Heart

MedicoseAcademics

Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx

gauripg8

Recently uploaded (20)

World Hypertension Day 17th may 2024 ppt

Factors Affecting child behavior in Pediatric Dentistry

A thorough review of supernormal conduction.pptx

DECIPHERING COMMON ECG FINDINGS IN ED.pptx

Anuman- An inference for helpful in diagnosis and treatment

ANATOMY OF THE LOWER URINARY TRACT AND MALE [Autosaved] [Autosaved].pptx

CNN-based plastic waste detection system

End Feel -joint end feel - Normal and Abnormal end feel

5CL-ADB powder supplier 5cl adb 5cladba 5cl raw materials vendor on sale now

linearity concept of significance, standard deviation, chi square test, stude...

In-service education (Nursing Mangement)

Book Trailer: PGMEE in a Nutshell (CEE MD/MS PG Entrance Examination)

Is Rheumatoid Arthritis a Metabolic Disorder.pptx

Denture base resins materials and its mechanism of action

Aptopadesha Pramana / Pariksha: The Verbal Testimony

รายการตํารับยาแผนไทยแห่งชาติ ฉบับ พ.ศ. 2564.pdf

Cas 28578-16-7 PMK ethyl glycidate ( new PMK powder) best suppler

Circulation through Special Regions -characteristics and regulation

Cardiac Impulse: Rhythmical Excitation and Conduction in the Heart

Final CAPNOCYTOPHAGA INFECTION by Gauri Gawande.pptx

Parkinson Voice Dataset with ML

1. Prediction model from Parkinson Dataset with replicated features Sermkiat Lolak, M.D. Datascience for Healthcare, Department of Biostatistics and Clinical Epidemiology, Faculty of Medicine Ramathibodi Hospital, Mahidol University

2. Michale J. Fox

3. Source : Wikipedia

4. Symptoms Gait Disturbance Motor disabilities , Bradykinesia depression, apathy, and sleep disorder Quality of Life

5. Early diagnosis —> Early Treatment Medical Surgical

6. Diagnosis Clinical symptoms Imaging Laboratory Biosensor signal

7. Biosensor signal Motor related Gait Gesture Voice

8. Voice PD, Laryngeal / Diaphragm control can produce vocal tremor Vocal fold stiffness and bowing cause changes in vocal fold mass and tension. Fatigue across a prolonged voice loading task. Not sustained phonation , Unstable fundamental frequency (high jitter) and amplitude (excess shimmer).

9. PD Voice Dataset 80 subjects, 40 of them with PD phonation of /a/ vowel

10. 44 features were extracted from each voice recording. 27 acoustic features 4 different characteristic families. Features within each family are highly correlated.

11. Replicated Data Record 3 times each subject -> 240 records Multicollinearity problem

12. Harmonic-to-Noise Ratio (HNR) Weak laryngeal control : incomplete glottal closure. Excess noise from unphonated air leaks from the glottis. This leads to lower HNR values.

13. MFCC Derivatives Thirteen Mel Frequency Cepstral Coefficients (MFCCs) MFCCs : speech and speaker recognition. PD : problem with articulation Low MFCC coefficients used for PD diagnosis and tracking

14. IID & Statistical Learning Learning : Assumption of Independent and Identical distribution of data Generalization If not meeting assumption : Weaker model

15. Learning Machine Learning / Neural network : Estimate conditional probability (Lex Fridman / Judea Pearl) Disentangled : Entangled : P(x1, x2, . . , xn) = ∏ n i=1 P(xi |PA) P(x1, x2, . . , xn) = ∏ n i=1 P(xi |xi+1, . . , xn) Bernhard Schölkopf (2019)

16. Problem of this dataset Time series Highly correlated features

17. Correlated Voice recording features Delta 0 Delta 1 Delta 2 y x1 x2 x3 y Z Common Cause Principle : Reichenbach (1956)

18. What to do? Instrument variable : Z -> X -> Y Mediator : X -> M -> Y Reduce dimension : Representative, Mean , PCA , Autoencoder Modelling : eg. Neural network

19. Time series Voice. _t PD_t PD_t+1 Voice _t+1 Time series graph HMM ? Voice. _t Voice _t+1 Base line V. Structural Causal Diagram

20. What to do? Model : RNN (LSTM , GRU) , CNN Block information flow : Conditioning Use only 1 session / person in training

21. Features Engineering Normalization : Easier to learn , esp. SVM Use the mean of the family features

22. Features Selection Random Entropy Gini

23. Experiments setting Features engineering Features selection Model selection

24. Modelling Linear : SVM Tree based / Ensemble : Random Forest Neural Network

25. Experiments Seperate test set Experiment each steps while fixing other steps constant

26. Evaluation Metric Early or preemptive diagnosis Focus on high Sensitivity (Recall) AUC

27. Selected model Neural Network 7 layers Normalization , Representative feature selection , Use only 1 session of patient (session 3) in training

28.

29. This model Naranjo et al* Sensitivity / Recall 1.0 0.825 Specificity 0.867 0.900 Precision 0.882 0.891 AUC 0.982 0.951 Accuracy 0.933 0.862 *Computer Methods and Programs in Biomedicine, 2017-04-01, Volume 142, Pages 147-156

30. Small dataset Multicollinearity Explicit Structural Causal Diagram Advanced / mixed features engineering and modelling Discussion

31. Low specificity Constrain: Orange Hyperparameter tuning

32. Application Transfer Learning Application , training on mobile phone voice Acknowledgement disease - go see doctor

Parkinson Voice Dataset with ML

Recommended

Recommended

More Related Content

Similar to Parkinson Voice Dataset with ML

Similar to Parkinson Voice Dataset with ML (20)

Recently uploaded

Recently uploaded (20)

Parkinson Voice Dataset with ML