SlideShare a Scribd company logo
Hierarchical Transformer
for Early Detection of
Alzheimer’s Disease
Renxuan Li
Advisor: Dr. Jinho Choi
Emory University, Department of Computer Science
Content
• Classification task: MCI detection
• Background and Previous Work
• Dataset: B-SHARP
• Models for Individual Tasks
• Hierarchical Transformer Model
• Attention Analysis
• Ensemble Model Analysis
Introduction and motivation
• In a short word, we are trying to help the diagnosis of the early stage of
Alzheimer’s Disease(AD) with Natural Language Processing (NLP) methods.
• This study is essentially an MCI/normal classification task
• Early stage of AD is also known as Mild Cognitive Impairment (MCI)
• AD is not reversible or curable, but there are treatments to slow down the
development of AD, therefore, MCI diagnosis is very crucial.
• Our study focus on predicting subjects’ brain-health condition by processing their
transcribed speech.
General structure for classification tasks
Classification layers
• All classification layers in this study are Multi-Layer Perceptron with
different input size.
• MLP in formula: 𝑜𝑢𝑡 = 𝑖𝑛𝑝𝑢𝑡 ∗ 𝑊 + 𝑏
Related works: DementiaBank
• Tasks: Cookie Theft Picture Description
• 99 control subjects, 194 dementia subjects;
243 control recordings and 309 dementia
recordings; 552 in total.
• Yearly visits from subjects.
• Include grammatical features
• Cleaned up transcripts
Previous works: CNN
• Convolutional Neural Network (CNN)
• Captures local features well
• May use filters of different sizes
https://arxiv.org/pdf/1408.5882.pdf
Previous work: C-LSTM
• LSTM: Long Short Term Memory
Network, contains information of
sequence over time
• C-LSTM: Add LSTM layer over CNN
https://www.aclweb.org/anthology/N18-2110.pdf
B-SHARP dataset
• Brain Stress Hypertension Aging
Research Program
• Focus on MCI
• 185 normal subjects, 141 MCI
patients
• 650 transcripts in total: 385 Normal
and 265 MCI
• Transcribed using online tool
without manual correction
Speech Protocol
• In this study, we use only three of out five tasks of the speech protocol to do
classification.
• Question 1: Describe subjects’ events of the day
• Question 2: Describe the examination room
• Question 3: Describe the picture: Circus Procession
Sample Transcript
Q1
Q2
Q3
Language features
• We count 7 major average measures: # tokens, # sentences, # nouns, #verb,
# Conjunctions, #Complex structures, # Discourse.
• We see in all categories except # Discourse, the control subjects have
higher values, which means longer, and more complex speech.
Challenge
• 1. Transcription error: our text is not corrected manually.
• 2. Long sequence: Control entire transcript has on average 578 tokens and
MCI transcript has on average 548 tokens. Too long to fit into transformer
model directly
Approach
• CNN Baseline
• On individual Tasks
• BERT
• RoBERTa
• ALBERT
• Pipeline Hierarchical Transformer Model
• Hierarchical Transformer Joint Learning
CNN with larger filters
• We add filters of size 3,5,7,9,11,13,15, meaning that we can capture
relations 15 words apart.
• Static FastText Embedding with embedding size 300
• Channel size 600
self attention and
transformer encoder
• Transformer encoder:
• Based entirely on Self-
Attention Mechanism
• Multi-head attention of 12
heads
https://www.kdnuggets.com/2019/03/deconstructing-bert-part-2-visualizing-inner-workings-
attention.html
BERT for sequence classification
• Base model: 12 layers of transformer encoder, 12 attention heads each
• Pretrain tasks: Mask Language Model, Next Sentence Prediction
• Trained on BookCurpus (800M words) English Wiki(2500M words)
• Normally sequence over 512 cause memory overflow (on our machine)
• Only trained on individual tasks in speech protocol
BERT for
sequence
classification:
architecture
https://arxiv.org/pdf/1810.04805.pdf
RoBERTa for sequence classification
• Same architecture as BERT
• Dynamic Masked Language Model
• More careful parameter choice
• Longer training and bigger batch
• Normally sequence over length 512 cause memory overflow (on our
machine)
• Only trained on individual tasks in speech protocol
ALBERT for sequence classification
• Similar Architecture with BERT and RoBERTa
• Compressed embedding size.
• Share parameters across layers
• 9 times smaller than BERT-base
• Can fit three models onto our machine
Pipeline Hierarchical Transformer Model
• Each layer of model are kept
independent during training.
• Take [CLS] tokens from each
individual transformer models and
feed into an MLP after
concatenation
• Transformer trained on their own.
Hierarchical Transformer Joint Learning
• Entire model gets trained
• Train from initially saved individual
models
Data split
• Dataset is too small for conventional
8:1:1 train:dev:test set split,
therefore we use Five-Fold Cross
Validation.
• Each subject only appears in one set
Set0 Set1 Set2 Set3 Set4
# MCI 53 53 53 53 53
# Control 77 77 77 77 77
avg
MoCA
MCI
21.8 20.0 21.4 22.3 22.5
avg
MoCA
Control
26.1 26.5 25.7 26.9 26.7
MoCA: Montreal Cognitive Assessment.
Higher scores means better brain-condition
Transformers settings
• BERT: “bert-base-uncased”, 12 layers, 12 attention heads, 768 output
length, 768 embed size.
• RoBERTa: “roberta-base”, 12 layers, 12 attention heads, 768 output length,
768 embed size.
• ALBERT: “albert-base-v1”, 12 layers, 12 attention heads, 768 output length,
128 embed size.
Experiment set up
• Every model is trained three times, average and standard derivation are recorded.
• Three metric:
• Accuracy:
!"#$%&'()(*$+,"#$-$./)(*$
011
• Sensitivity:
!"#$%&'()(*$
!"#$%&'()(*$+2/3'$-$./)(*$
• Specificity:
!"#$-$./)(*$
!"#$-$./)(*$+2/3'$%&'()(*$
Performance on Individual tasks
• All three models performs good on Q2
• Overall, RoBERTa does the best.
• BERT does better on Q3
Performance on entire transcript
• We see gradual and substantial improvement on classification
performance
• RoBERTa ensemble model does the best among single model ensemble
• Sensitivity Improvement
• Specificity remains level
• Joint Learning not as good as Pipeline style
Analysis
Attention
analysis
RoBERTa model
analysis by
samples
Finding
Validation
Ensemble
analysis
RoBERTa
ensemble
analysis
B+R+A
ensemble
analysis
Attention analysis:
measures and methods
• We rank the tokens from a
transcript from highest ACR to
the lowest ACR.
• If a token has high ACR, it’s
getting more attended during
training, and more related to
how our models predict.
Attention Change Ratio:
On Roberta q1 model
MCI
• 1 Yeah 142.50%
• 2 ina 129.14%
• 3 tests 124.17%
• 4 Okay 122.25%
• 5 And 112.76%
• 6 ud 80.30%
• 7 now 75.60%
• 8 no 70.89%
• 9 body 66.21%
• 10 me 60.03%
Normal
• 1 friend 182.31%
• 2 arrived 169.92%
• 3 From 145.92%
• 4 Tiffany 140.62%
• 5 visit 137.53%
• 6 taken 122.23%
• 7 tests 114.66%
• 8 downstairs 109.35%
• 9 snack 107.38%
• 10 how 93.17%
By POS tag category
On Roberta q2 model
MCI
• 1 door 147.34%
• 2 picture 131.71%
• 3 nine 121.43%
• 4 dog 118.22%
• 5 elt 115.01%
• 6 OK 114.02%
• 7 is 108.72%
• 8 hole 88.07%
• 9 did 85.81%
• 10 wall 81.19%
Nomral
• 1 holder 546.90%
• 2 ah 415.62%
• 3 screen 342.11%
• 4 stool 301.36%
• 5 sink 300.02%
• 6 counter 256.21%
• 7 area 244.19%
• 8 chart 237.63%
• 9 Tiffany 191.25%
• 10 cabinet 180.24%
By POS tag category
On Roberta q3 model
MCI
• 1 ine 168.21%
• 2 anger 164.96%
• 3 stick 163.53%
• 4 glasses 137.93%
• 5 hat 118.84%
• 6 for 116.56%
• 7 view 103.13%
• 8 circle(circus) 94.01%
• 9 Ride 88.52%
• 10 outfit 86.98%
Normal
• 1 vest 184.92%
• 2 glasses 168.59%
• 3 white 167.55%
• 4 (tr)icycle 151.47%
• 5 collar 144.58%
• 6 uff 132.31%
• 7 handc 123.84%
• 8 tr(icycle) 121.67%
• 9 pocket 116.91%
• 10 cap 113.91%
By POS tag category
Q1 on all transcripts
Q2 on all transcripts
Q2 on all transcripts
Q3 on all transcripts
Ensemble analysis:
Concept of vote
Vote for q1 Vote for q2 Vote for q3
RoBERTa
ensemble: vote
distribution
Task Comparison: Measures
• Correct Vote Ratio for question i: among 466 transcripts that the
ensemble model correctly classifies, in how much percentage that
the vote of question i is correct.
• Normalized Correct Vote Ratio: for each transcript, if the ensemble
model predicts correctly, then there will be 1 “point” given evenly to
the questions whose votes are correct. NCV is the average “point” a
question get for each transcript.
• % of Dominant Cases: percentage that the vote for question i is the
only correct one.
RoBERTa ensemble: Task Comparison
B+R+A ensemble
86.6% ARE
MAJORITY VOTES
6-CORRECT
SITUATION:34.50%
5-CORRECT
SITUATION 28.51%
RARELY ALL 9
INPUTS AGREE
(0.21%)
NO SITUATIONS
THAT LESS THAN 4
VOTES ARE
CORRECT.
B+R+A Ensemble: which input is more
important?
• The top three inputs form a set of transcripts
• Again, individual model accuracy (IMA) does not always match
importance in ensemble model
Questions?
Bibliography
• [1] James T. Becker, Francois Boller, Oscar L. Lopez, Judith Saxton, andKaren L. McGonigle. The
natural history of alzheimer’s disease: descrip-tion of study cohort and accuracy of
diagnosis.Archives of Neurology,51(6):585–594, 1994.
• [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.BERT: Pre-training of Deep
Bidirectional Transformers for Lan-guage Understanding. InProceedings of the Conference of the
NorthAmerican Chapter of the Association for Computational Linguis-tics: Human Language
Technologies, pages 4171–4186, 2019. URLhttps://www.aclweb.org/anthology/N19-1423.
• [3] Serge Gauthier, Michel Panisset, Josephine Nalban-toglu, and JudesPoirie. Alzheimer’s dis-
ease: current knowledge, management and re-search.Canadian Medical Association Journal,
157(8):1047–1052, 1997.54
• [4] George G Glenner.Alzheimers disease.Springer, 1990.
Bibliography
• [5] Sweta Karleka, Tong Niu, and Mohit Bansal. Detecting linguistic charac-teristics of alzheimer’s dementia by interpreting
neural models. InPro-ceedings of the 2018 Conference of the North American Chapter of the As-sociation for
Computational Linguistics: Human Language Technologies,Volume 2 (Short Papers), pages 701–707, New Orleans,
Louisiana, June2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2110.
URLhttps://www.aclweb.org/anthology/N18-2110.
• [6] Yoon Kim.Convolutional Neural Networks for Sentence Clas-sification.InProceedings of the 2014 Conference on
EmpiricalMethods in Natural Language Processing (EMNLP), 2014.URLhttps://www.aclweb.org/anthology/D14-1181.
• [7] Taku Kudo and John Richardson. SentencePiece: A simple and lan-guage independent subword tokenizer and
detokenizer for Neural TextProcessing. InProceedings of the 2018 Conference on Empirical Methodsin Natural Language
Processing: System Demonstrations, pages 66–71,2018. URLhttps://www.aclweb.org/anthology/D18-2012.[8]Zhenzhong
Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel,
• [8] Piyush Sharma, and Radu Soricut. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.arXiv,
11942(1909),2019.
Bibliography
• [9]Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, DanqiChen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and
Veselin Stoyanov.RoBERTa: A Robustly Optimized BERT Pretraining Approach. InProceedings of the International
Conference on Learning Representations,2020. URLhttps://openreview.net/forum?id=SyxS0T4tvS.
• [10]Frank Rudzicz, Rosalie Wang, Momotaz Begum, and Alex Mihailidis.Speech recognition in alzheimer’s disease with
personal assistive robots.InProceedings of the 5th Workshop on Speech and Language Processingfor Assistive
Technologies), page 20–28, Baltimore, Maryland, USA, June2014. Association for Computational Linguistics. doi:
10.3115/v1/W14-1904. URLhttps://www.aclweb.org/anthology/W14-1904.
• [11] Richard Suzman and John Beard. Global health and aging., 2011.
• [12]Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, MohammadNorouzi, Wolfgang Macherey, Maxim Krikun, Yuan
Cao, Qin Gao,Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaob]ing Liu, Lukasz Kaiser, Stephan Gouws,
Yoshikiyo Kato, Taku Kudo,Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, WeiWang, Cliff Young, Jason Smith,
Jason Riesa, Alex Rudnick, OriolVinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’sNeural Machine
Translation System: Bridging the Gap between Hu-man and Machine Translation.arXiv, 1609(08144),
2016.URLhttp://arxiv.org/abs/1609.08144.

More Related Content

Similar to Hierarchical Transformer for Early Detection of Alzheimer’s Disease

Classifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsClassifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer Pairs
Jinho Choi
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
Ayodele Odubela
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
Young Seok Kim
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
Sanghamitra Deb
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
PriyadharshiniG41
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
Darius Barušauskas
 
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
eMadrid network
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarization
JEE HYUN PARK
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
ananth
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
BigML, Inc
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Saurabh Kaushik
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
Xavier Ochoa
 
Toxic Comment Classification using Neural Network and Machine Learning
Toxic Comment Classification using Neural Network and Machine LearningToxic Comment Classification using Neural Network and Machine Learning
Toxic Comment Classification using Neural Network and Machine Learning
Cammy Soh Hui Shan
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Márton Miháltz
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
Luca Zavarella
 
lecture-05.pptx
lecture-05.pptxlecture-05.pptx
lecture-05.pptx
SSSSSSSSSSSS5
 
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMachine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
Mark Lefevre, CQF
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
AkshayaNagarajan10
 

Similar to Hierarchical Transformer for Early Detection of Alzheimer’s Disease (20)

Classifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsClassifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer Pairs
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
2022_03_28 EDUCON 2022 “Replication of an Evaluation of Teacher Training in t...
 
a deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarizationa deep reinforced model for abstractive summarization
a deep reinforced model for abstractive summarization
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
Deep Learning for Machine Translation
Deep Learning for Machine TranslationDeep Learning for Machine Translation
Deep Learning for Machine Translation
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Toxic Comment Classification using Neural Network and Machine Learning
Toxic Comment Classification using Neural Network and Machine LearningToxic Comment Classification using Neural Network and Machine Learning
Toxic Comment Classification using Neural Network and Machine Learning
 
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
 
lecture-05.pptx
lecture-05.pptxlecture-05.pptx
lecture-05.pptx
 
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMachine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
 

More from Jinho Choi

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Jinho Choi
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Jinho Choi
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Jinho Choi
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Jinho Choi
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
Jinho Choi
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Jinho Choi
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
Jinho Choi
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
Jinho Choi
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
Jinho Choi
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
Jinho Choi
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
Jinho Choi
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
Jinho Choi
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
Jinho Choi
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
Jinho Choi
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
Jinho Choi
 
Topological Sort
Topological SortTopological Sort
Topological Sort
Jinho Choi
 
Tries - Put
Tries - PutTries - Put
Tries - Put
Jinho Choi
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Jinho Choi
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Jinho Choi
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
Jinho Choi
 

More from Jinho Choi (20)

Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
Analysis of Hierarchical Multi-Content Text Classification Model on B-SHARP D...
 
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...Competence-Level Prediction and Resume & Job Description Matching Using Conte...
Competence-Level Prediction and Resume & Job Description Matching Using Conte...
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-b...
 
The Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference ResolutionThe Myth of Higher-Order Inference in Coreference Resolution
The Myth of Higher-Order Inference in Coreference Resolution
 
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
 
Abstract Meaning Representation
Abstract Meaning RepresentationAbstract Meaning Representation
Abstract Meaning Representation
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
CKY Parsing
CKY ParsingCKY Parsing
CKY Parsing
 
CS329 - WordNet Similarities
CS329 - WordNet SimilaritiesCS329 - WordNet Similarities
CS329 - WordNet Similarities
 
CS329 - Lexical Relations
CS329 - Lexical RelationsCS329 - Lexical Relations
CS329 - Lexical Relations
 
Automatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue ManagementAutomatic Knowledge Base Expansion for Dialogue Management
Automatic Knowledge Base Expansion for Dialogue Management
 
Attention is All You Need for AMR Parsing
Attention is All You Need for AMR ParsingAttention is All You Need for AMR Parsing
Attention is All You Need for AMR Parsing
 
Graph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to DialogueGraph-to-Text Generation and its Applications to Dialogue
Graph-to-Text Generation and its Applications to Dialogue
 
Real-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue UnderstandingReal-time Coreference Resolution for Dialogue Understanding
Real-time Coreference Resolution for Dialogue Understanding
 
Topological Sort
Topological SortTopological Sort
Topological Sort
 
Tries - Put
Tries - PutTries - Put
Tries - Put
 
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's DiseaseMulti-modal Embedding Learning for Early Detection of Alzheimer's Disease
Multi-modal Embedding Learning for Early Detection of Alzheimer's Disease
 
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue ContextsBuilding Widely-Interpretable Semantic Networks for Dialogue Contexts
Building Widely-Interpretable Semantic Networks for Dialogue Contexts
 
How to make Emora talk about Sports Intelligently
How to make Emora talk about Sports IntelligentlyHow to make Emora talk about Sports Intelligently
How to make Emora talk about Sports Intelligently
 

Recently uploaded

Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
ViralQR
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.Welocme to ViralQR, your best QR code generator.
Welocme to ViralQR, your best QR code generator.
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Hierarchical Transformer for Early Detection of Alzheimer’s Disease

  • 1. Hierarchical Transformer for Early Detection of Alzheimer’s Disease Renxuan Li Advisor: Dr. Jinho Choi Emory University, Department of Computer Science
  • 2. Content • Classification task: MCI detection • Background and Previous Work • Dataset: B-SHARP • Models for Individual Tasks • Hierarchical Transformer Model • Attention Analysis • Ensemble Model Analysis
  • 3. Introduction and motivation • In a short word, we are trying to help the diagnosis of the early stage of Alzheimer’s Disease(AD) with Natural Language Processing (NLP) methods. • This study is essentially an MCI/normal classification task • Early stage of AD is also known as Mild Cognitive Impairment (MCI) • AD is not reversible or curable, but there are treatments to slow down the development of AD, therefore, MCI diagnosis is very crucial. • Our study focus on predicting subjects’ brain-health condition by processing their transcribed speech.
  • 4. General structure for classification tasks
  • 5. Classification layers • All classification layers in this study are Multi-Layer Perceptron with different input size. • MLP in formula: 𝑜𝑢𝑡 = 𝑖𝑛𝑝𝑢𝑡 ∗ 𝑊 + 𝑏
  • 6. Related works: DementiaBank • Tasks: Cookie Theft Picture Description • 99 control subjects, 194 dementia subjects; 243 control recordings and 309 dementia recordings; 552 in total. • Yearly visits from subjects. • Include grammatical features • Cleaned up transcripts
  • 7. Previous works: CNN • Convolutional Neural Network (CNN) • Captures local features well • May use filters of different sizes https://arxiv.org/pdf/1408.5882.pdf
  • 8. Previous work: C-LSTM • LSTM: Long Short Term Memory Network, contains information of sequence over time • C-LSTM: Add LSTM layer over CNN https://www.aclweb.org/anthology/N18-2110.pdf
  • 9. B-SHARP dataset • Brain Stress Hypertension Aging Research Program • Focus on MCI • 185 normal subjects, 141 MCI patients • 650 transcripts in total: 385 Normal and 265 MCI • Transcribed using online tool without manual correction
  • 10. Speech Protocol • In this study, we use only three of out five tasks of the speech protocol to do classification. • Question 1: Describe subjects’ events of the day • Question 2: Describe the examination room • Question 3: Describe the picture: Circus Procession
  • 12.
  • 13. Language features • We count 7 major average measures: # tokens, # sentences, # nouns, #verb, # Conjunctions, #Complex structures, # Discourse. • We see in all categories except # Discourse, the control subjects have higher values, which means longer, and more complex speech.
  • 14. Challenge • 1. Transcription error: our text is not corrected manually. • 2. Long sequence: Control entire transcript has on average 578 tokens and MCI transcript has on average 548 tokens. Too long to fit into transformer model directly
  • 15. Approach • CNN Baseline • On individual Tasks • BERT • RoBERTa • ALBERT • Pipeline Hierarchical Transformer Model • Hierarchical Transformer Joint Learning
  • 16. CNN with larger filters • We add filters of size 3,5,7,9,11,13,15, meaning that we can capture relations 15 words apart. • Static FastText Embedding with embedding size 300 • Channel size 600
  • 17. self attention and transformer encoder • Transformer encoder: • Based entirely on Self- Attention Mechanism • Multi-head attention of 12 heads https://www.kdnuggets.com/2019/03/deconstructing-bert-part-2-visualizing-inner-workings- attention.html
  • 18. BERT for sequence classification • Base model: 12 layers of transformer encoder, 12 attention heads each • Pretrain tasks: Mask Language Model, Next Sentence Prediction • Trained on BookCurpus (800M words) English Wiki(2500M words) • Normally sequence over 512 cause memory overflow (on our machine) • Only trained on individual tasks in speech protocol
  • 20. RoBERTa for sequence classification • Same architecture as BERT • Dynamic Masked Language Model • More careful parameter choice • Longer training and bigger batch • Normally sequence over length 512 cause memory overflow (on our machine) • Only trained on individual tasks in speech protocol
  • 21. ALBERT for sequence classification • Similar Architecture with BERT and RoBERTa • Compressed embedding size. • Share parameters across layers • 9 times smaller than BERT-base • Can fit three models onto our machine
  • 22. Pipeline Hierarchical Transformer Model • Each layer of model are kept independent during training. • Take [CLS] tokens from each individual transformer models and feed into an MLP after concatenation • Transformer trained on their own.
  • 23. Hierarchical Transformer Joint Learning • Entire model gets trained • Train from initially saved individual models
  • 24. Data split • Dataset is too small for conventional 8:1:1 train:dev:test set split, therefore we use Five-Fold Cross Validation. • Each subject only appears in one set Set0 Set1 Set2 Set3 Set4 # MCI 53 53 53 53 53 # Control 77 77 77 77 77 avg MoCA MCI 21.8 20.0 21.4 22.3 22.5 avg MoCA Control 26.1 26.5 25.7 26.9 26.7 MoCA: Montreal Cognitive Assessment. Higher scores means better brain-condition
  • 25. Transformers settings • BERT: “bert-base-uncased”, 12 layers, 12 attention heads, 768 output length, 768 embed size. • RoBERTa: “roberta-base”, 12 layers, 12 attention heads, 768 output length, 768 embed size. • ALBERT: “albert-base-v1”, 12 layers, 12 attention heads, 768 output length, 128 embed size.
  • 26. Experiment set up • Every model is trained three times, average and standard derivation are recorded. • Three metric: • Accuracy: !"#$%&'()(*$+,"#$-$./)(*$ 011 • Sensitivity: !"#$%&'()(*$ !"#$%&'()(*$+2/3'$-$./)(*$ • Specificity: !"#$-$./)(*$ !"#$-$./)(*$+2/3'$%&'()(*$
  • 27. Performance on Individual tasks • All three models performs good on Q2 • Overall, RoBERTa does the best. • BERT does better on Q3
  • 28. Performance on entire transcript • We see gradual and substantial improvement on classification performance • RoBERTa ensemble model does the best among single model ensemble • Sensitivity Improvement • Specificity remains level • Joint Learning not as good as Pipeline style
  • 30. Attention analysis: measures and methods • We rank the tokens from a transcript from highest ACR to the lowest ACR. • If a token has high ACR, it’s getting more attended during training, and more related to how our models predict. Attention Change Ratio:
  • 31. On Roberta q1 model MCI • 1 Yeah 142.50% • 2 ina 129.14% • 3 tests 124.17% • 4 Okay 122.25% • 5 And 112.76% • 6 ud 80.30% • 7 now 75.60% • 8 no 70.89% • 9 body 66.21% • 10 me 60.03% Normal • 1 friend 182.31% • 2 arrived 169.92% • 3 From 145.92% • 4 Tiffany 140.62% • 5 visit 137.53% • 6 taken 122.23% • 7 tests 114.66% • 8 downstairs 109.35% • 9 snack 107.38% • 10 how 93.17%
  • 32. By POS tag category
  • 33. On Roberta q2 model MCI • 1 door 147.34% • 2 picture 131.71% • 3 nine 121.43% • 4 dog 118.22% • 5 elt 115.01% • 6 OK 114.02% • 7 is 108.72% • 8 hole 88.07% • 9 did 85.81% • 10 wall 81.19% Nomral • 1 holder 546.90% • 2 ah 415.62% • 3 screen 342.11% • 4 stool 301.36% • 5 sink 300.02% • 6 counter 256.21% • 7 area 244.19% • 8 chart 237.63% • 9 Tiffany 191.25% • 10 cabinet 180.24%
  • 34. By POS tag category
  • 35. On Roberta q3 model MCI • 1 ine 168.21% • 2 anger 164.96% • 3 stick 163.53% • 4 glasses 137.93% • 5 hat 118.84% • 6 for 116.56% • 7 view 103.13% • 8 circle(circus) 94.01% • 9 Ride 88.52% • 10 outfit 86.98% Normal • 1 vest 184.92% • 2 glasses 168.59% • 3 white 167.55% • 4 (tr)icycle 151.47% • 5 collar 144.58% • 6 uff 132.31% • 7 handc 123.84% • 8 tr(icycle) 121.67% • 9 pocket 116.91% • 10 cap 113.91%
  • 36. By POS tag category
  • 37. Q1 on all transcripts
  • 38. Q2 on all transcripts
  • 39. Q2 on all transcripts
  • 40. Q3 on all transcripts
  • 41. Ensemble analysis: Concept of vote Vote for q1 Vote for q2 Vote for q3
  • 43. Task Comparison: Measures • Correct Vote Ratio for question i: among 466 transcripts that the ensemble model correctly classifies, in how much percentage that the vote of question i is correct. • Normalized Correct Vote Ratio: for each transcript, if the ensemble model predicts correctly, then there will be 1 “point” given evenly to the questions whose votes are correct. NCV is the average “point” a question get for each transcript. • % of Dominant Cases: percentage that the vote for question i is the only correct one.
  • 45. B+R+A ensemble 86.6% ARE MAJORITY VOTES 6-CORRECT SITUATION:34.50% 5-CORRECT SITUATION 28.51% RARELY ALL 9 INPUTS AGREE (0.21%) NO SITUATIONS THAT LESS THAN 4 VOTES ARE CORRECT.
  • 46. B+R+A Ensemble: which input is more important? • The top three inputs form a set of transcripts • Again, individual model accuracy (IMA) does not always match importance in ensemble model
  • 48. Bibliography • [1] James T. Becker, Francois Boller, Oscar L. Lopez, Judith Saxton, andKaren L. McGonigle. The natural history of alzheimer’s disease: descrip-tion of study cohort and accuracy of diagnosis.Archives of Neurology,51(6):585–594, 1994. • [2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.BERT: Pre-training of Deep Bidirectional Transformers for Lan-guage Understanding. InProceedings of the Conference of the NorthAmerican Chapter of the Association for Computational Linguis-tics: Human Language Technologies, pages 4171–4186, 2019. URLhttps://www.aclweb.org/anthology/N19-1423. • [3] Serge Gauthier, Michel Panisset, Josephine Nalban-toglu, and JudesPoirie. Alzheimer’s dis- ease: current knowledge, management and re-search.Canadian Medical Association Journal, 157(8):1047–1052, 1997.54 • [4] George G Glenner.Alzheimers disease.Springer, 1990.
  • 49. Bibliography • [5] Sweta Karleka, Tong Niu, and Mohit Bansal. Detecting linguistic charac-teristics of alzheimer’s dementia by interpreting neural models. InPro-ceedings of the 2018 Conference of the North American Chapter of the As-sociation for Computational Linguistics: Human Language Technologies,Volume 2 (Short Papers), pages 701–707, New Orleans, Louisiana, June2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-2110. URLhttps://www.aclweb.org/anthology/N18-2110. • [6] Yoon Kim.Convolutional Neural Networks for Sentence Clas-sification.InProceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing (EMNLP), 2014.URLhttps://www.aclweb.org/anthology/D14-1181. • [7] Taku Kudo and John Richardson. SentencePiece: A simple and lan-guage independent subword tokenizer and detokenizer for Neural TextProcessing. InProceedings of the 2018 Conference on Empirical Methodsin Natural Language Processing: System Demonstrations, pages 66–71,2018. URLhttps://www.aclweb.org/anthology/D18-2012.[8]Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, • [8] Piyush Sharma, and Radu Soricut. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.arXiv, 11942(1909),2019.
  • 50. Bibliography • [9]Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, DanqiChen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov.RoBERTa: A Robustly Optimized BERT Pretraining Approach. InProceedings of the International Conference on Learning Representations,2020. URLhttps://openreview.net/forum?id=SyxS0T4tvS. • [10]Frank Rudzicz, Rosalie Wang, Momotaz Begum, and Alex Mihailidis.Speech recognition in alzheimer’s disease with personal assistive robots.InProceedings of the 5th Workshop on Speech and Language Processingfor Assistive Technologies), page 20–28, Baltimore, Maryland, USA, June2014. Association for Computational Linguistics. doi: 10.3115/v1/W14-1904. URLhttps://www.aclweb.org/anthology/W14-1904. • [11] Richard Suzman and John Beard. Global health and aging., 2011. • [12]Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, MohammadNorouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao,Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaob]ing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo,Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, WeiWang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, OriolVinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google’sNeural Machine Translation System: Bridging the Gap between Hu-man and Machine Translation.arXiv, 1609(08144), 2016.URLhttp://arxiv.org/abs/1609.08144.