Revisiting publication at Nature Neuroscience in 2023 by Giordani, B. L. et al. on "Intermediate acoustic-to-semantic representations link behavioral and neural responses to natural sounds"
2. Overview
What does “acoustic-to-semantic representation” stand for?
Functional fingerprint in the Auditory Cortex (AC) related to assigning meaning to sound.
Behavioral fingerprint according to a certain metric related to sound discrimination.
Understand how the brain transforms sound waves into meaningful semantic representations?
What is the fundamental question?
What are “acoustic-to-semantic representations” important for?
What are the different encoding stages of the acoustic-to-semantic transformation process?
4. Proposed Computational Models
●
Biophysical Models
●
Psychophysical Models
Acoustic output
Semantic output
●
Natural Language Processing (NLP) models
●
Deep Neural Networks (DNN) models
Assessing validity: evaluate the ability to explain behavioural and/or
neural observations of human listeners
5. Encoding natural sounds:
State-of-the-Art
Biophysical Models (variants of time frequency analysis of the sound)
●
Explain fMRI patterns in Heschl’s gyrus and early auditory areas
●
Superior Temporal Gyrus (STG) exhibit preferential responses to predefined category
of sounds (speech, vocalization, music and action sounds)
DNN trained on speech and music-genre recognition
●
Explain fMRI patterns in STG better than biophysical models
6. Goals
●
Systematic model comparison framework to assess biophysical, psychophysical, NLP
and DNN models
●
Identify a hierarchy of acoustic-to-semantic representations
Approach
Compare predictions of behavioural responses and 7T fMRI responses
to natural sounds, under a cross-validation framework,
using three different features-extraction techniques:
1. Acoustic (Biophysical + Psychophysical)
2. Semantic (NLP)
3. DNN
7. Feature-extraction diagram:
Figure 1 - Panel a
Models arranged according to the cerebral-sound-processing hierarchy.
5 Acoustical Models
3 NLP Models
3 DNN
9. FMRI data: Figure 1 – Panel c
Six auditory cortical ROIs
HG=Heschl’s gyrus
PT=Planum Temporale
PP=Planum Polare
m/p/aSTG = middle/posterior/anterior
Superior Temporal Gyrus
●
N=5
●
Incidental 1N-back sound-repetition detection task
●
Six categories of sounds:
1. human non-speech vocal
2. speech
3. animal cries
4. musical instruments
5. scenes from nature
6. tool sounds
●
288 sounds: 72 per category
10. Data-analysis framework
●
Model-component distances: between-stimulus cosine distance
●
Behavioural data: distance matrices of perceived between-stimulus dissimilarity
●
fMRI data: ROI-specific, between-stimulus euclidian distance of beta-values
●
Split-half CV for fMRI data
11. Representations in models and brain
Results: Figure 2
●
MDS Points = Stimuli
Highlights:
●
MSR: Overlap in Acoustics
and DNN, but speech
●
MSR: Clusters in NLP
●
FMRI: HG and pSTG
resambling respectively
Acoustics and DNN
●
FMRI: dissimilarity in STG
between speech and other
categories
12. Representations
in behavioural data
Results: Figure 2
●
Significant predictions by all single-class models
●
Sound category (top): R2
CV higher for DNN
●
Word category (bottom): R2
CV higher for NLP
Variance-partioning analysis indicate that
DNN incorporates a large part of the
perceived-sound-dissimilarity variance
predicted by the other models
Commonality-Analysis Approach
13. Representations
in fMRI data
Results: Figure 3
●
HG:Significant predictions by all single-class models
●
HG: only Aco predicted unique variance and
Aco+DNN predicted common variance
●
STG: single-class and unique DNN gave better
predictions
Variance-partioning analysis showed that
DNN predicted variance could not be predicted
by the other models.
Commonality-Analysis Approach
14. Prediction of fMRI data by hidden DNN
representations of perceived sound:
Figure 5
●
Behavioural and fMRI responses are best predicted by intermediate layers in HG and pSTG.
●
Incremental contributions of late layers for fMRI resposes in STG
15. Prediction of behavioral data by hidden
DNN representations of fMRI
responses: Figure 6
●
Significant predictions of sound-dissimilarity variance by all ROIs and unique HG and pSTG
16. Conclusions
●
DNN provide the best predictions in both behavioural and neural datasets
●
DNN outperformed for the sound dissimilarity task, but NLP outperformed for word-
dissimilarity task in behavioural + non-primary STG responses and Acoustic Model
matched DNN in HG responses. This dissociation might indicate the contribution of
DNN neither acoustic nor semantic, they are intermediate.
●
The layer-by-layer DNN analysis show that intermediate layers contribute
maximally to the predictions in HG and STG, suggesting the encoding of medium-
level auditory features as lower dimensional manifolds.