Axa Assurance Maroc - Insurer Innovation Award 2024
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP Dataset for Early Detection of Alzheimer’s Disease
1. Analysis of Hierarchical Multi-Content Text
Classi
fi
cation Model on B-SHARP Dataset
for Early Detection of Alzheimer’s Disease
Asia-Paci
fi
c Chapter of the Association for Computational Linguistic
s
Presented by Jinho D. Cho
i
October 28, 2020
♠
Renxuan A. Li, ♦
Ihab Hajjar, ♦
Felicia Goldstein, ♠
Jinho D. Choi
♠
Department of Computer Science, ♦
Department of Neurolog
y
Emory University, Atlanta GA, USA
jinho.choi@emory.edu
3. Mild Cognition Impairment (MCI)
3
Mild Cognitiv
e
Impairment
Mil
d
Dementia
Moderat
e
Dementia
Sever
e
Dementia
Impairment does not Interfere
with activities or daily living
First work to detec
t
MCI using NLP
Impairment starts Interfering
with activities or daily living
4. B-SHARP Dataset
4
Brain, Stress, Hypertension, and Aging Research Program
Collect 1-2 minute recordings for 3 tasks from MCI patients and Control subjects.
Task 1
:
Daily Activity
Task 2
:
Room Environment
Task 3
:
Picture Description
5. B-SHARP Dataset
5
1st-visit 2nd-visit 3rd-visit Recordings MoCA BNT
Control 185 100 50 385 26.2 (±2.6) 14.2 (±1.2)
MCI 141 68 28 265 21.5 (±3.5) 13.4 (±1.5)
Total 326 168 78 650 24.2 (±3.8) 13.9 (±1.4)
Subject make multiple visits to take more recordings.
The term between the previous and each visit is 1 year.
Tokens Sentences Nouns Verbs Conjuncts Complex Discourse
Q1
Control 186.6 (±60.4) 10.4 (±4.5) 28.1 (±9.6) 30.4 (±11.5) 8.5 (±4.5) 2.3 (±1.7) 8.1 (±5.4)
MCI 175.6 (±54.5) 9.8 (±4.1) 23.7 (±8.3) 29.3 (±10.4) 8.5 (±4.2) 2.0 (±1.6) 9.2 (±6.0)
Q2
Control 191.5 (±11.8) 11.7 (±4.7) 41.1 (±13.3) 24.3 (±11.2) 6.6 (±4.5) 3.6 (±2.7) 7.1 (±4.8)
MCI 178.6 (±11.7) 11.6 (±4.7) 36.7 (±12.1) 23.2 (±10.6) 6.4 (±4.4) 2.9 (±2.3) 8.4 (±5.3)
Q3
Control 193.4 (±63.4) 12.6 (±5.4) 39.5 (±13.5) 28.4 (±10.1) 8.0 (±4.8) 3.3 (±2.1) 6.1 (±5.5)
MCI 187.8 (±63.4) 12.7 (±5.1) 36.2 (±13.2) 27.7 (±10.9) 7.2 (±4.2) 2.6 (±2.0) 7.3 (±5.5)
All
Control 578.1 (±149.8) 34.5 (±10.7) 110.5 (±27.9) 84.2 (±25.4) 23.5 (±10.1) 9.3 (±4.5) 21.4 (±13.0)
MCI 548.7 (±140.6) 34.0 (±10.5) 98.1 (±26.1) 81.2 (±24.1) 22.5 (±9.7) 7.7 (±4.2) 25.3 (±15.0)
p 0.0110 0.5541 < 0.0001 0.1277 0.2046 < 0.0001 0.0006
Table 1: Average counts and their standard deviations of linguistic features per transcript in the B-SHARP dataset.
7. Experiments
7
5-folds Cross Validation
Transformer Encoder
s
BERT (Devlin et al., 2019
)
RoBERTa (Liu et al., 2020
)
ALBERT (Lan et al., 2019)
CV0 CV1 CV2 CV3 CV4 ALL
Control 77 77 77 77 77 385
MCI 53 53 53 53 53 265
Control 37 37 37 37 37 185
MCI 27 28 28 29 29 141
Recordings
Subjects
Subjects in each set are mutually exclusive to the other sets.
8. Evaluation
8
BERT RoBERTa ALBERT
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
ACC 67.6 (±0.4) 69.0 (±1.2) 67.7 (±0.7) 69.0 (±1.5) 69.9 (±0.2) 65.2 (±0.3) 67.6 (±1.5) 69.5 (±0.3) 66.6 (±1.3)
SEN 48.9 (±1.8) 57.1 (±2.5) 41.5 (±3.6) 44.3 (±4.5) 55.3 (±1.2) 37.1 (±3.7) 45.9 (±1.9) 52.2 (±0.6) 37.4 (±3.3)
SPE 80.4 (±1.2) 77.3 (±2.8) 85.2 (±3.0) 85.8 (±2.1) 79.7 (±0.7) 84.5 (±3.0) 82.6 (±3.7) 81.4 (±0.3) 86.8 (±3.3)
Table 3: Model performance on the individual tasks. ACC: accuracy, SEN: sensitivity, SPE: specificity.
CNN BERTe RoBERTae ALBERTe Be + Re Ae + Re Be + Ae + Re
ACC 69.5 (±0.2) 69.9 (±1.1) 71.6 (±1.5) 69.7 (±2.9) 72.2 (±0.7) 71.5 (±1.9) 74.1 (±0.3)
SEN 49.2 (±0.8) 57.6 (±3.4) 48.5 (±6.1) 46.2 (±8.3) 56.5 (±2.5) 51.7 (±1.3) 60.9 (±5.2)
SPE 83.5 (±0.9) 77.4 (±4.8) 87.5 (±1.8) 85.4 (±0.5) 83.1 (±0.9) 86.7 (±3.4) 84.0 (±2.4)
Table 4: Performance of ensemble models. Berte/RoBERTae/ALBERTe use transcript embeddings from all 3 tasks
trained by the BERT/RoBERTa/ALBERT models in Table 3, respectively. Be+Re uses transcript embeddings from
both Berte and RoBERTae (so the total of 6 embeddings), Ae+Re uses transcript embeddings from both ALBERTe
and RoBERTae (6 embeddings), and Be+Ae+Re uses transcript embeddings from all three models (9 embeddings).
BERT RoBERTa ALBERT
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
ACC 67.6 (±0.4) 69.0 (±1.2) 67.7 (±0.7) 69.0 (±1.5) 69.9 (±0.2) 65.2 (±0.3) 67.6 (±1.5) 69.5 (±0.3) 66.6 (±1.3)
SEN 48.9 (±1.8) 57.1 (±2.5) 41.5 (±3.6) 44.3 (±4.5) 55.3 (±1.2) 37.1 (±3.7) 45.9 (±1.9) 52.2 (±0.6) 37.4 (±3.3)
SPE 80.4 (±1.2) 77.3 (±2.8) 85.2 (±3.0) 85.8 (±2.1) 79.7 (±0.7) 84.5 (±3.0) 82.6 (±3.7) 81.4 (±0.3) 86.8 (±3.3)
Table 3: Model performance on the individual tasks. ACC: accuracy, SEN: sensitivity, SPE: specificity.
CNN BERTe RoBERTae ALBERTe Be + Re Ae + Re Be + Ae + Re
ACC 69.5 (±0.2) 69.9 (±1.1) 71.6 (±1.5) 69.7 (±2.9) 72.2 (±0.7) 71.5 (±1.9) 74.1 (±0.3)
SEN 49.2 (±0.8) 57.6 (±3.4) 48.5 (±6.1) 46.2 (±8.3) 56.5 (±2.5) 51.7 (±1.3) 60.9 (±5.2)
SPE 83.5 (±0.9) 77.4 (±4.8) 87.5 (±1.8) 85.4 (±0.5) 83.1 (±0.9) 86.7 (±3.4) 84.0 (±2.4)
Table 4: Performance of ensemble models. Berte/RoBERTae/ALBERTe use transcript embeddings from all 3 tasks
trained by the BERT/RoBERTa/ALBERT models in Table 3, respectively. Be+Re uses transcript embeddings from
both Berte and RoBERTae (so the total of 6 embeddings), Ae+Re uses transcript embeddings from both ALBERTe
and RoBERTae (6 embeddings), and Be+Ae+Re uses transcript embeddings from all three models (9 embeddings).
Performance on the Individual Tasks
Performance of the Ensemble Models
9. Conclusion
9
Introduced the new dataset, B-SHARP
,
for the detection of Mild Cognitive Impairment (MCI)
Presented Hierarchical Multi-Content Classi
fi
cation Mode
l
to jointly learn multiple documents from different tasks
Achieved the state-of-the-art results with an ensemble mode
l
using three types of transformer encoders
Please visit our lab webpag
e
http://nlp.cs.emory.edu