Multi-modal NLP Systems in Healthcare

Multi-modal NLP Systems
In Healthcare
November, 2021
Jekaterina Novikova

Background: Main Approaches
1. Linguistic and Acoustic Feature extraction and generation
a. Extracting pre-defined and/or hand-crafted features from speech and text
b. Generating automatic representations of speech and language
c. Comparing hand-crafted and automatic representations
d. Hybrid approach - combination of the above
2. Multi-modality
a. Predictive power of linguistic and acoustic information on cognitive impairment
b. Vulnerability and predictive power of different linguistic modalities (syntactic vs lexical)
3. Model development
a. Cross-language detection of cognitive impairment
b. Semi-supervised models in the absence of labels
c. Model with removed age bias
4. Challenges in Model Development: QA and validation
a. Effect of ASR errors on the features and on model predictive power
b. Effect of heterogeneous data on model predictive power
c. Automatic noise removal

Part 1.
Linguistic and Acoustic
Feature Extraction and Generation

● Extracting acoustic features (pitch, energy, pauses) and representations (MFCC) from speech
● Generating transcripts using ASR
● Extracting linguistic features from the transcripts (syntactic / semantic / lexical)
● ML models based on hand-crafted features
the boy is handing the girl &uh cookies and she's
telling him to be quiet i guess.
1a. Extracting hand-crafted features from speech

Motivation: using existing knowledge, generate new features and improve models performance.
Example: Several publications have shown that healthy (HC) and cognitively impaired (AD+MCI) subjects
pause before different kinds of words. Pausing can signify word finding difficulty. We investigate the context in
which pauses occur, not just the pauses themselves.
Our method:
1. Identify which words around the pause contain
the most distinguishing information
2. Extract features from tokens within that range
shown to be the most distinguishing.
3. Improve transcript level classification performance.
1a. Generating new hand-crafted features
Pause
Extraction
Feature
Extraction
... ...
Eyre, B., Balagopalan, A., &
Novikova, J.. Fantastic Features
and Where to Find Them:
Detecting Cognitive Impairment
with a Subsequence
Classiﬁcation Guided Approach.
In Proceedings of the Sixth
Workshop on Noisy User-generated
Text (W-NUT 2020) at EMNLP
2020

Motivation: it is difficult to get access to large amounts of data that are required
for achieving state-of-the-art performance for ML models. Transfer learning approach
lets achieve higher accuracy due to very deep neural models pre-trained on huge
Amounts of data.
Automatic representations: are extracted from language using pre-trained large
neural language models, such as BERT.
Our method: fine-tune pre-trained models for the tasks of interest, which in our case
is the task of detecting cognitive impairment from language.
Results: ADReSS dataset
1b. Generating automatic representations of speech and language
Features (automatic
representations)
extracted
Balagopalan, A., Eyre, B., Rudzicz, F., & Novikova, J. To BERT or Not To BERT: Comparing Speech and Language-based Approaches for Alzheimer’s Disease
Detection. In Proceedings of INTERSPEECH 2020

1c. Comparing hand-crafted and automatic representations
Balagopalan, A., Novikova, J. Comparing Acoustic-based Approaches for Alzheimer’s Disease Detection. In Proceedings of INTERSPEECH 2021
Motivation: it is difficult to get access to large amounts of audio data
that are required for achieving state-of-the-art performance for ML
models. Transfer learning approach lets achieve higher accuracy on
acoustic data classification.
Automatic representations: are extracted from audio of human
speech using transfer learning methods like wav2vec2.
Our method: compare audio classification approaches and decide
which one is the most promising for the AD detection task.
Results: ADReSS dataset

Motivation: some information that is known to be important for the task of our interest is not encoded in large
pre-trained neural language models.
Our method:
1. Identify linguistic features not encoded well via probing tasks
2. Combine representations from final BERT layer and these features, and finetune
Results: Experiments done with the AD detection task on the DementiaBank dataset. FS1 here are the features
identified as under-represented in a BERT model.
1d. Hybrid approach to feature extraction
Balagopalan, A., Novikova, J. Augmenting BERT Carefully with Underrepresented Linguistic Features. In: NeurIPS Workshop on Machine Learning for Health
ML4H, 2020

Motivation: linguistic features can be of different modalities, and each modality
has a specific influence and importance in cognitive impairment classification.
Our method:
1. Divide linguistic features into non-overlapping subsets according to their
modalities
2. Let neural networks learn low-dimensional representations that agree with
each other
3. These representations are passed into a classifier network
Results: we illustrate the effectiveness of modality division when our developed ML
model (Control vs AD classification) is seeking to find a consensus among
modalities. Presented modalities are: acoustic, semantic and syntactic.
2a. Multi-modality. Predictive power of linguistic and acoustic
information on cognitive impairment
Zhu, Z., Novikova, J., & Rudzicz, F.. Detecting cognitive impairments by agreeing on interpretations of linguistic features. In Proceedings of the 2019
Conference of the North American Chapter of the Association for Computational Linguistics NAACL 2019

Motivation: Understanding the vulnerability of linguistic features extracted from
noisy text is important.
Our method:
1. Analysing vulnerability of lexical and syntactic features to various levels of
text alterations such as deletion, insertion and substitution
2. Feature significance and the impact of alterations on feature predictive
power
3. Computing coefficient of importance for lexical and syntactic features
separately for several text classification tasks
Results:
● Values of lexical features are easily affected by even slight changes in text.
Syntactic features, however, are more robust to such modifications.
● But lower changes of syntactic features result in stronger effects on
classification performance.
2a. Multi-modality. Vulnerability and predictive power of
different linguistic modalities
J. Novikova, A. Balagopalan, K. Shkaruta and F. Rudzicz. Lexical Features Are More Vulnerable, Syntactic Features Have More Predictive Power. In: The 5th
Workshop on Noisy User-generated Text at EMNLP 2019, Hong Kong, 2019

Motivation: Most developments are made in
resource-rich languages (especially English).
Multi-language clinical speech datasets are small.
Our method:
Develop cross-language model:
1. We use Optimal Transport (OT) domain
adaptation systems to adapt French and
Mandarin to English.
2. Utilize out-of-domain, single-speaker, healthy
speech data,
3. Train aphasia detection models on English data,
test on French and Mandarin
Results: Such a model improves aphasia detection over
unilingual baselines and direct feature transfer.
3a. Models. Cross-language detection of cognitive
impairment
Aparna Balagopalan, Jekaterina Novikova, Matthew B A Mcdermott, Bret Nestor, Tristan Naumann, Marzyeh Ghassemi ; Proceedings of the Machine
Learning for Health NeurIPS Workshop, PMLR 116:202-219, 2020

Motivation: acquiring sufficient labeled data can be expensive or difficult,
especially from people with cognitive impairment.
Our method:
Develop Transductive Consensus Networks (TCNs), suitable for
semi-supervised learning:
1. ePhysicians try to produce indistinguishable representations for each
modality
2. Discriminators recognize modal-specific information retained in
representations
3. Classifier trains the networks to make a correct decision
Results: TCNs outperform or align with the best benchmark algorithms given
20 to 200 labeled samples on the Bank Marketing and the DementiaBank
datasets (Controls vs AD classification).
3b. Models. Semi-supervised models in the absence of
labels
Z. Zhu, J. Novikova, and F. Rudzicz. Semi-supervised classification by reaching consensus among modalities. In: NeurIPS Workshop on Interpretability and
Robustness in Audio, Speech, and Language IRASL, Montreal, 2018
Model F1-macro
QDA 0.5243
RF 0.6184
GP 0.6775
MLP 0.7528
CN 0.7998*

Motivation: DNN classifiers are able to estimate age from linguistic features, and
could bias on them to detect dementia.
Our method:
We put forward four fair representation learning models that learn low-dimensional
representations of data samples containing as little age information as possible.
Results: Our best models compromise as little as 2.56% accuracy (on the
DementiaBank dataset) and 1.54% accuracy (on the FamousPeople dataset).
Moreover, they have better fairness scores than statistical adjustment methods
3c. Models. Removing age bias from the model
Zhu, Z., Novikova, J., & Rudzicz, F. (2018). Isolating effects of age with fair representation learning when assessing dementia. arXiv preprint arXiv:1807.07217.

Part 4.
Challenges in Model Development:
QA and Validation

Motivation: Errors in ASR may affect predictive performance of the ML
models.
Our method:
We introduce three types of artificial errors to the manual transcripts of the
DemBank and Healthy Aging datasets:
Deletions. Words are missed in the transcript.
Insertions. New word is introduced.
Substitutions. Word is replaced with another one.
Results: Simulated deletion errors have a strong effect on classiﬁcation
performance when detecting cognitive impairment from speech and
language.
4a. QA and Validation. Effect of ASR errors
Balagopalan, A., Shkaruta, K., and Novikova, J. Impact of ASR on Alzheimer's Disease Detection: All Errors are Equal, but Deletions are More Equal than Others.
In: The 6th Workshop on Noisy User-generated Text at EMNLP 2020.

Motivation:
Datasets we have are small enough. We need more data to train more accurate models. Previous work shows
that same-task data of healthy participants helps improve AD detection on a single-task dataset of pathological
speech
Our method:
Adding in a large amount of healthy data from different tasks.
Results:
● Increase of up to 9% in F1 scores.
● Effect is especially pronounced when
data come from healthy subjects of
age > 60.
4b. QA and Validation. Effect of heterogeneous data
A.Balagopalan, J.Novikova, F.Rudzicz and M.Ghassemi. The Effect of Heterogeneous Data for Alzheimer's Disease Detection from Speech. In: NeurIPS Workshop
on Machine Learning for Health ML4H, Montreal, 2018

Multi-modal NLP Systems in Healthcare

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Multi-modal NLP Systems in Healthcare

Similar to Multi-modal NLP Systems in Healthcare (20)

Recently uploaded

Recently uploaded (14)

Multi-modal NLP Systems in Healthcare