Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Convolutional Neural Network to Model
Articulation Impairments in Patients with
Parkinson’s Disease
Juan Camilo V´asquez-C...
Outline
Introduction
Methods
Experimental framework
Results
Conclusion
2 / 32
Outline
Introduction
Methods
Experimental framework
Results
Conclusion
3 / 32
Introduction: Parkinson’s Disease
Second most prevalent neurologi-
cal disorder worldwide.
Patients develop several motor
...
Introduction: Speech impairments
Speech impairments in PD patients: hypokinetic dysarthria
Phonation
Articulation
Prosody ...
Introduction: Imprecise articulation
One of the most deviant speech dimensions in PD.
Reduced velocity of lip, tongue, and...
Introduction: Hypothesis
PD patients have difficulties to begin and to stop the vocal fold
vibration, and such difficulties ...
Introduction: Aims
To model the time-frequency (TF) information provided by
the onset and offset transitions: short-time F...
Outline
Introduction
Methods
Experimental framework
Results
Conclusion
9 / 32
Methods
Transitions
detection
Time frequency
representations
Convolutional
neural network
10 / 32
Methods: Transitions detection
Transitions
detection
Time frequency
representations
Convolutional
neural network
Onset tra...
Methods: Time-frequency representation
Transitions
detection
Time frequency
representations
Convolutional
neural network
0...
Methods: Time-frequency representation
Transitions
detection
Time frequency
representations
Convolutional
neural network
0...
Methods: Convolutional neural network
Transitions
detection
Time frequency
representations
Convolutional
neural network
Co...
Outline
Introduction
Methods
Experimental framework
Results
Conclusion
15 / 32
Data
Three databases with recordings in three languages: Span-
ish, German, and Czech.
Diadochokinetic exercises, isolated...
Data
Language Description
Spanish 50 Patients and 50 Healthy controls.
Balanced in age (60 years old) and gender.
Patients...
Experiments and validation
Classification of PD patients vs. HC subjects in the same
language.
10 fold cross-validation: 8 ...
Experiments and validation
Results are compared respect to previous studies1.
Support vector machine
1Juan Camilo V´asquez...
Outline
Introduction
Methods
Experimental framework
Results
Conclusion
20 / 32
Results: same language for train and test
TFR Onset Offset Onset+Offset
Spanish
CNN-STFT 85.3 81.6 85.9
CNN-CWT 84.2 81.8 ...
Results: same language for train and test
TFR Onset Offset Onset+Offset
Spanish
CNN-STFT 85.3 81.6 85.9
CNN-CWT 84.2 81.8 ...
Results: same language for train and test
TFR Onset Offset Onset+Offset
Spanish
CNN-STFT 85.3 81.6 85.9
CNN-CWT 84.2 81.8 ...
Results: same language for train and test
50 100 150
Time (ms)
0
1000
2000
3000
4000Frequency(Hz)
50 100 150
Time (ms)
Low...
Results: same language for train and test
Speech tasks Spanish German Czech
read text 85.0 70.3 88.5
monologue 85.6 70.3 8...
Results: different language for train and test
Test Lang. TFR onset offset onset+offset
Train with Spanish
German CNN-STFT...
Results: different language for train and test
Test Lang. TFR onset offset onset+offset
Train with Spanish
German CNN-STFT...
Results: different language for train and test
Test Lang. TFR onset offset onset+offset
Train with Spanish
German CNN-STFT...
Outline
Introduction
Methods
Experimental framework
Results
Conclusion
29 / 32
Conclusion
A deep learning approach is proposed to model articulation
impairments of PD patients.
Voiced-Unvoiced transiti...
Conclusion
The proposed method is able to classify PD patients and
HC subjects and improves the baseline when the language...
Convolutional Neural Network to Model
Articulation Impairments in Patients with
Parkinson’s Disease
Juan Camilo V´asquez-C...
Upcoming SlideShare
Loading in …5
×

Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease

227 views

Published on

Slides from my talk at INTERSPEECH 2017, Sthockolm

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease

  1. 1. Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease Juan Camilo V´asquez-Correa1,2 Juan Rafael Orozco-Arroyave1,2, Elmar N¨oth2 1GITA research group, University of Antioquia UdeA. 2Pattern recognition Lab. Friedrich Alexander Universit¨at. Erlangen-N¨urnberg. jcamilo.vasquez@udea.edu.co 18th INTERSPEECH, 2017 1 / 32
  2. 2. Outline Introduction Methods Experimental framework Results Conclusion 2 / 32
  3. 3. Outline Introduction Methods Experimental framework Results Conclusion 3 / 32
  4. 4. Introduction: Parkinson’s Disease Second most prevalent neurologi- cal disorder worldwide. Patients develop several motor and non-motor impairments. (O. Hornykiewicz 1998). Speech impairments are one of the earliest manifestations. 4 / 32
  5. 5. Introduction: Speech impairments Speech impairments in PD patients: hypokinetic dysarthria Phonation Articulation Prosody Intelligibility pataka pataka 5 / 32
  6. 6. Introduction: Imprecise articulation One of the most deviant speech dimensions in PD. Reduced velocity of lip, tongue, and jaw movements. Strong indication of the literature statement: imprecise con- sonants caused by reduced range of movements of ar- ticulators pa ta ka 6 / 32
  7. 7. Introduction: Hypothesis PD patients have difficulties to begin and to stop the vocal fold vibration, and such difficulties can be observed on speech sig- nals by modeling the transitions between voiced and unvoiced sounds Voiced Voiced UnvoicedVoicedUnvoiced Unvoiced Onset transition Offset transition 7 / 32
  8. 8. Introduction: Aims To model the time-frequency (TF) information provided by the onset and offset transitions: short-time Fourier trans- form (STFT) and continuous wavelet transform (CWT). To “learn” features from time-frequency representations: con- volutional neural network (CNN). Why TF and feature-learning? both have been successfully used in several paralinguistics tasks: emotion, deception, depression, and others. 8 / 32
  9. 9. Outline Introduction Methods Experimental framework Results Conclusion 9 / 32
  10. 10. Methods Transitions detection Time frequency representations Convolutional neural network 10 / 32
  11. 11. Methods: Transitions detection Transitions detection Time frequency representations Convolutional neural network Onset transition Offset transition Onset and offset are detected according to the presence of the fundamental frequency. 11 / 32
  12. 12. Methods: Time-frequency representation Transitions detection Time frequency representations Convolutional neural network 0.000.020.040.060.080.100.120.14 Time (s) 500 1000 1500 2000 2500 3000 3500 4000 Frequency(Hz) 0.000.020.040.060.080.100.120.14 Time (s) STFT of onset for a PD patient (left) and a HC subject (right) Play PD Play HC 12 / 32
  13. 13. Methods: Time-frequency representation Transitions detection Time frequency representations Convolutional neural network 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s) 100 200 300 400 500 Scale 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Time (s) CWT of onset for a PD patient (left) and a HC subject (right) 13 / 32
  14. 14. Methods: Convolutional neural network Transitions detection Time frequency representations Convolutional neural network Convolution layer I Convolution layer IIMax-pool. layer 1 Max-pool layer 2 Fully conected MLP Input layer PD vs. HC Feature maps 1 Feature maps 2 CNN learns high–level representations from the low–level raw data 14 / 32
  15. 15. Outline Introduction Methods Experimental framework Results Conclusion 15 / 32
  16. 16. Data Three databases with recordings in three languages: Span- ish, German, and Czech. Diadochokinetic exercises, isolated sentences, read texts, and monologues. 16 / 32
  17. 17. Data Language Description Spanish 50 Patients and 50 Healthy controls. Balanced in age (60 years old) and gender. Patients in middle state of the disease. German 88 Patients and 88 Healthy controls. Balanced in age (64 years old). patients in low and middle state of the disease. Czech 20 Patients and 15 Healthy controls. All male speakers. Patients diagnosed during recording session. Table: Databases 17 / 32
  18. 18. Experiments and validation Classification of PD patients vs. HC subjects in the same language. 10 fold cross-validation: 8 for training, 1 to optimize hyper-parameters, and 1 for test. Cross-language classification. One language used for train and validation and other language used for test. 18 / 32
  19. 19. Experiments and validation Results are compared respect to previous studies1. Support vector machine 1Juan Camilo V´asquez-Correa et al. “Effect of acoustic conditions on algorithms to detect Parkinson’s disease from speech”. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP),. 2017, pp. 5065–5069. 19 / 32
  20. 20. Outline Introduction Methods Experimental framework Results Conclusion 20 / 32
  21. 21. Results: same language for train and test TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8 21 / 32
  22. 22. Results: same language for train and test TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8 22 / 32
  23. 23. Results: same language for train and test TFR Onset Offset Onset+Offset Spanish CNN-STFT 85.3 81.6 85.9 CNN-CWT 84.2 81.8 85.2 Baseline 69.3 69.6 71.6 German CNN-STFT 70.3 68.0 75.0 CNN-CWT 68.0 66.9 70.5 Baseline 72.7 70.9 74.0 Czech CNN-STFT 77.9 80.4 84.4 CNN-CWT 89.2 87.7 89.4 Baseline 75.3 74.4 78.8 23 / 32
  24. 24. Results: same language for train and test 50 100 150 Time (ms) 0 1000 2000 3000 4000Frequency(Hz) 50 100 150 Time (ms) Low Energy High Energy Figure: Output of the CNN after the last max–pool layer: PD patient (left) and a HC speaker (right) 24 / 32
  25. 25. Results: same language for train and test Speech tasks Spanish German Czech read text 85.0 70.3 88.5 monologue 85.6 70.3 89.1 /pa-ta-ka/ 85.4 70.7 89.2 25 / 32
  26. 26. Results: different language for train and test Test Lang. TFR onset offset onset+offset Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7 26 / 32
  27. 27. Results: different language for train and test Test Lang. TFR onset offset onset+offset Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7 27 / 32
  28. 28. Results: different language for train and test Test Lang. TFR onset offset onset+offset Train with Spanish German CNN-STFT 51.7 50.2 54.7 German Baseline 53.7 55.0 54.1 Czech CNN-CWT 55.2 55.4 57.9 Czech Baseline 60.3 57.4 60.4 Train with German Spanish CNN-STFT 58.0 55.7 55.8 Spanish Baseline 53.5 53.5 53.6 Czech CNN-STFT 53.0 52.4 53.0 Czech Baseline 50.9 51.7 52.6 Train with Czech Spanish CNN-CWT 53.8 56.3 56.7 Spanish Baseline 53.4 51.6 52.4 German CNN-STFT 54.0 51.8 54.0 German Baseline 51.2 51.0 50.7 28 / 32
  29. 29. Outline Introduction Methods Experimental framework Results Conclusion 29 / 32
  30. 30. Conclusion A deep learning approach is proposed to model articulation impairments of PD patients. Voiced-Unvoiced transitions are modeled with CNNs using STFT and CWT. 30 / 32
  31. 31. Conclusion The proposed method is able to classify PD patients and HC subjects and improves the baseline when the language used for train and test is the same. Additional approaches should be proposed when the train and test language are different. Recurrent neural networks and other architectures may be considered to assess co-articulation. Deep learning approaches trained with phonation, articula- tion, and prosody information may be addressed to evaluate specific speech impairments. 31 / 32
  32. 32. Convolutional Neural Network to Model Articulation Impairments in Patients with Parkinson’s Disease Juan Camilo V´asquez-Correa1,2 Juan Rafael Orozco-Arroyave1,2, Elmar N¨oth2 1GITA research group, University of Antioquia UdeA. 2Pattern recognition Lab. Friedrich Alexander Universit¨at. Erlangen-N¨urnberg. jcamilo.vasquez@udea.edu.co 18th INTERSPEECH, 2017 32 / 32

×