SlideShare a Scribd company logo
MODEL TO
REPLICATE HUMAN
AUDITORY SYSTEM
AN OVERVIEW OF TASK-
OPTIMIZED NEURAL
NETWORK TO REPLICATE
HUMAN AUDITORY
BEHAVIOUR

 Introduction to the article and some insights on it
 Mechanism of hearing by the ear and brain
 Organisation in the auditory cortex
 Earlier models on cortex and need for a new one
 Deep-learning model based on 6 contexts
 The method of generating cortical responses by model and
traditional method with comparisons
 Findings and inferences through different perspectives and
analyses
 Achievements by the model
 Added features in model in comparison to its precursors
 Disadvantages of model and future directions
ROAD MAP

All about the
procedure!!

 This paper is about the development of a
Convolutional Neural Network (CNN) model which
is task optimised to perform some real-world
auditory tasks.
 This model can be used to understand the
architecture of human auditory system. (why need
model?)
 This model also generates the fMRI voxel responses
throughout auditory cortex better than the standard
method of using spectrotemporal filter.
 The model provides details about the primary and
non-primary responses
Introduction

 There are two types of auditory responses namely
the primary and non-primary responses.
 PRIMARY RESPONSE is the response obtained from
cochlea and the primary pathway (purely auditory).
This represents the primary auditory cortex (A1).
Carry out simpler tasks. (reason given later)
 NON-PRIMARY RESPONSE is the response
obtained from the non-primary pathway (mixed
senses). This represents the regions beyond the
auditory cortex. Carry out complex tasks
Auditory responses

 The neuronal processing occurring the human ear
transforms sound into cortical representations.
 They render out behaviourally important sounds
explicit (like ampilfier)
 The organisation of the human auditory cortex
remains unsolved and there are no competent
models to explain the process this process of
transformation of auditory sounds to representations
Ear mechanism

 This question was debatable with researches
favouring each sides (distributive or hierarchical).
 Formisano and Staeren proposed an anatomical
distributive organisation
 There is a tripartite hierarchical organisation seen in
non-human type animals which carry out simple
tasks.
 However, these cannot confirm whether the auditory
cortex is distributive or hierarchical.
Organisation of auditory
cortex

DEEP LEARNING
TASK

 Some early models used linear filtering of
cochleogram (sound in image format) using 1-2
stages.
 However the process of transformation is non-linear.
So those methods failed to address the purpose.
 So it is essential to develop models which carry out
non-linear functions.
 Model has to provide answer for the organisation
and the transformation.
Early models and cause of failure
Sl.no Name Description Our case scenario
1 Data Type of data provided to
input
Cochleogram
2 Task Operation required to do on
the input
Classification(multiple)
Regression(prediction)
3 Model The mathematical relation
between input and output.
This varies based on the task
and complexity and may
involve layers
CNN (Convolutional Neural
Network)
4 Error Kind of a compiler which
finds error between two
different quantities
Comparison of the model’s
classification with human’s
classification
5 Algorithm A kind of learning procedure
which tries to reduce the
error computed before
Stochastic Gradient descent
6 Evaluation Finding how good the model
has performed
Comparison with human
behaviour

 The data for which the CNN is trained is known as
cochleogram.
 The cochleogram is the
visual representation of the
sound signals.
 The cochleogram is a
spectro-temporal
representation of speech.
 A 2-second sound signal is
taken as input.
Data

 There are two tasks to be performed namely word
identification and music genre recognition
 The task is made difficult by introducing
background noises with the music/word sound
 The task is to find one word out of the 587 or to find
one genre out of 41 categories.
 Also the model produces cortical responses.
Task

 The model contains Convolution, Pooling, Dense,
Filter response normalisation and Dropout layers.
 It is a hierarchical model. The layers present in
CNN (convolution and pooling) perform non-
linear operations.
 The model had five convolutional, three pooling,
two normalization, and two fully connected layers.
 The processing (7 shared layers) are same for both
but have different FC layers(5 different). So models
parameter reduces by half.
 The hyper parameters were task optimized.
Model

 This model was derived from two-steps
 First step involved 180 architectures each being 12
layered and single tasked
 The second step involved 7 architectures of 12 layers
and dual tasked.
Model selection

 In order to evaluate the likeliness of the models
response with that of the human, the model is
compared with that of the human
 For WORD IDENTIFICATION, the human is
allowed to use an UI which will auto-complete the
word (to ensure that it belongs to one of 587 classes)
 For GENRE IDENTIFICATION, the human is
allowed to list down five preferences of genre (top 5).
 The error here is the wrong predictions.
 A interesting feature observed was that the model
made error pattern like human.
Error

 The algorithm used here is the stochastic gradient
descent.
 The role of the algorithm is to find the optimum
values of the parameters such that the loss is very
less (theoretically 0)
 The word stochastic refers to the way of taking the
input)- one at a time is stochastic
 The gradient descent refers to the attempt of
reducing the gradient by finding the local minima of
the gradient
Algorithm

 The confusion matrix is used to evaluate the performance
of the model in the genre recognition task. (41 classes)
 The confusion matrix is matrix
with rows and columns equal to
classes and it compares the truth
with model prediction and has 4 fields.
 The same can be plotted for
word identification but the graph
will be erroneous due to 587 classes.
Evaluation

Cortical response

 The next task to be done by the model is to generate fMRI
voxel responses throughout auditory cortex. In short, it
has to produce cortical responses.
 The voxel is a single unit block in a 3-D image (mine
craft).
 The data used here are 165 natural sounds heard
regularly in which 52 were words and music.
 The model was trained for these sounds and the voxels
generated for each of these sounds were collected.
 These were compared with the standard method of
spectrotemporal filter
Cortical responses

 Listening and hearing…..
 An important process in the processing of the auditory
signals is the ‘attention’.
 Taking in the required signal and eliminating the rest
unwanted ones.
 Hence a filter is formed inside the auditory cortex with two
functions. Like neurons which respond maximally to given
input frequencies.
 To incorporate information about both the timing (rhythm)
and the frequency content of the relevant auditory stimulus
stream.
 To enhance the sensory representation of attended stimuli
along these two feature dimensions.
Spectrotemporal filters

 The response/prediction from each of the layer in the time-
averaged model was taken into consideration.
 This is done by using the linear
regression, by using the ‘linear’
activation function in each of
the layers.
 The predictions from each layer
were linearly combined to artificially
create a ‘voxel’.
 As a result, we have a voxel’s response
for all 165 sounds from all layers.
 The BOLD curve looks inactive for 2-s,
hence the average is used.
Method of extraction

 The comparisons were made using four elements:-
 The trained model with perfect weights
 The untrained model with random weights
 The traditional spectrotemporal filter model
 The random model from selection
Comparisons

 The comparison must be done with the truth
 The truth is obtained by feeding the same to a fMRI
machine to get
the voxels
 At first, the BOLD variance for all the 4 methods
 This was done for correcting both the reliability of the
measured voxel response and the predicted voxel
response
 The comparisons for made on all voxels and some
specified voxels.
 As expected, the trained model has high variance and was
better than spectrotemporal model and untrained one.
BOLD variance

 Then the median variance was taken for the same
 The trained model (70%) had more variance than the
spectrotemporal filter (55%)
 The filter model had the highest number of
parameters it can withstand.
 And it eventually saturated.
 The untrained and random model was worse than
the trained model and spectrotemporal filter model.
 The trained model had the highest variance on all
ROI and proved to be better than traditional one.
Median variance

Findings and Inferences

 The trained deep learning model performed the best
and was far better than the spectrotemporal one.
 The reason for this improved voxel response is due
to the hierarchical organisation of the model.
 The convolution and the pooling layers of the model
produced a receptive field (spectrum of signals)
similar to that of the cortical system.
 Also the model performed better than the
spectrotemporal model in the region of interests.
Findings

 So this says that the model is able to respond to the
natural sounds better than that of the
spectrotemporal filter model throughout the
auditory cortex
 This is due to the hierarchical organisation of the
model.
 The task optimization has resulted in a good cortical
response
Inference

 The responses obtained from the later layers of the
network were non-linear when compared to other
layers.
 So in order to assess this property, it is essential to
compare the response from each layer of the model.
 The median variance for individual layers were
taken into consideration for comparison.
 And based on these, there were some important
findings which lead to some inferences about the
organisation of the human auditory system.
Procedure to assess the
hierarchical organisation

 The median variance increased
for all layers and then deceased
for the last layers.
 All layers except the first and
last performed better than the
spectrotemporal filter model
 All layers except the last layer
in the trained model had more
variance compared to the untrained
even though their dependencies
with data were the same.
The intermediate layers made the best prediction whereas the final
layers made poor predictions.
Findings

 The receptive fields of some of the layers in the network
were similar to that in auditory cortex and this maybe the
reason for their high performance.
 The task optimisation has helped in replicating some of
the cortical properties onto the model.
 As per the task, the neurons in the final layers involved in
perpetual decisions.
 Such neurons maybe present in the auditory cortex but
their organisation maybe not accessible by conventional
fMRI.
 Or these might be beyond the auditory cortex either on
other brain lobes
Inferences

Summary map
 The variance of the layers were
plotted using special images.
 The heatmaps of the variance and
predictions of the individual
layers were mapped onto the
probabilistic map which involves
three anatomically defined
regions of the primary auditory
cortex. This is done for individual
test subject.
 The average taken over all subject
is a summary map.
 This is relating the model and
human cortex.
 The black outlines are the
anatomical regions.

Findings
 The intermediate layers best
predicted the voxels and this
constitutes to the primary
auditory cortex(core)
 The last layer of the network
constitutes to the region
away from auditory cortex
(non-core) .
 The same results were not
seen in an untrained model
with random weights.
 Also the same results were
seen when words and music
were removed from training
data.

 This gives the reason that the intermediate and the last
layers of network generates primary and non-primary
responses.
 Also the intermediate layers perform simpler tasks when
compared to the later layers (reason given later)
 The same results were seen i.e. the primary voxel best
from intermediate and non-primary voxel best from last
even when word and music were removed.
 This suggests that the hierarchical structure of the model
helped it in generating better cortical responses for
everyday sounds
Inference

 These are four functionally defined Region Of
Interests (ROI’s) namely:-
 frequency selective
 pitch selective
 word selective
 music selective
Regions of interest

VOXEL TYPE/LAYER INTERMEDIATE
LAYER
DEEP LAYER
FREQUENCY-SELECTIVE  
PITCH-SELECTIVE  
MUSIC-SELECTIVE  
SPEECH-SELECTIVE  
Findings

 The frequency voxels which were best explained by the
intermediate layers are found early in hierarchy and the
speech voxels which were best explained by the later layers
were found later in the hierarchy.
 This can be the reason for which intermediate layer does
simpler function and the later layers perform complex
functions.
 As before the untrained network was lower than that of the
trained network and also the spectrotemporal model.
 The dependencies did not affect the performance of the
model suggesting that the task optimization was critical to
map the features in the layers to the auditory cortex.
 The ROI analysis supports hierarchy organisation
Inference

HENCE BOTH THE MODEL
AND THE HUMAN CORTEX
ARE ORGANISED
HIERARCHICALLY!!

 The representation of the acoustic features by the
network were compared with that of the
spectrotemporal model.
 To check whether the representations of both models
were linearly decodable.
 For this, the data was divided into two subsets for
which the first one was used for mapping and
second for quality checking.
Acoustic features

 The ability for the network layers to extract spectral
information from the data decreased
as the layers progressed.
 The extraction ability was
constant for the spectrotemporal
model which peaked at the
intermediate layer.
 The prediction of the later layer
is worse than the earlier and this
was prominent in the untrained model.
Findings and inference

 It is essential that the model performs well on real
world task in order to replicate the auditory cortex
 The model was analysed layer-wise on the existing
task and a new speaker identification task for which
model wasn’t trained.
 This was done by fixing the weights and optimizing
by using the softmax activation function in the layers
which took output from a previous layer and gave it
to the next layer.
Real-world task performance

Findings
 The findings were contrary to
that seen previously
 The performance improved
from early to the deeper
layers of the network.
 The same level of performance
was seen also in the speaker
identification task except
for final layer.
 This suggests that the network
representations are task-
generalised.
(same for most auditory tasks)

 All of the previous findings and analyses portray the process of
transformation from cochlea to cortex
 The role of the cortex is to transform acoustic features obtained
from the cochlea into meaningful representations and the role
of this transformation is unknown
 These analyses suggest that the task-related information which
were not clear/explained in cochlea (implicit) and when these
went to the auditory cortex which transforms into
representations which were well clear/explained (explicit)
 In simpler terms, the transformation has provided some
meaning and explanation to the information using which both
the brain and the model figured out the output.
Inference

 The input data involved the incorporation of
background noise with the sound signal
 They were added at different SNR (Signal to Noise
Ratio)
 The analysis done on this constitutes to the SNC
(Signal to Noise Characteristics).
 The signals were categorised according to the SNR
and were fed to the network for analysis.
 The objective is to find the role of noise in processing
information from the signal.
SNC

 The signals with less noise were
well classified by the intermediate
layers as well as the deep layers.
 But, the signals with more noise
were well classified by the
deep layers only.
 The later layers of the model are
insensitive to noise or they are
noise-immune
Findings and inference

 The data used here was the same as of fMRI but the words and
music were excluded (113 samples).
 These sounds were divided into
two subsets based on stationarity
(the stability of mean, SD etc.)
 They divided the cochleogram
into categories and taking
standard deviation over time.
 Then the individual layer
response for the two sets of
sounds were measured.
Later the same was compared with voxels
From the fMRI machine
Noise-stimuli sensitivity

 The deep layers of the network trained on these natural sounds
had exhibited a greater
response for the non-stationary
sounds when compared to that
of the stationary sounds.
 However the same effect was not
observed in the untrained network.
 From the fMRI, the responses to
stationary and non-stationary
responses were similar in the
primary areas (A1), but more response
was seen to non-stationary sounds
in the non-primary areas.
Findings

 There is a differentiation between the primary and
the non-primary regions functionally and these
proofs support to that of the similar (intermediate-
primary and deep-non primary)
 There is a suppression of sound in the later layers of
the model and in the non-primary regions and hence
this contributed for better response to non-stationary
sounds by the deep layers and non-primary cortex.
 This has helped the model to predict responses to
natural sounds even though they were affected with
noise.
Inference

Task-performance
 It was found that
networks with better
performance on a real-
world visual object
recognition task better
predict cortical responses
in the visual stream.
 To prove the same, 57
different models from
stage-1 were taken at 14
different training points
(798) for either word or
genre task
 The median variance was
measured for each layer.

 The performance of a network on a task strongly
correlated with the variance it explained in auditory
cortical responses.
 The word task had a Spearman correlation of 0.87
and the genre task had a Spearman correlation of
0.85
 These results suggests that the task-based
optimization of deep neural networks can help yield
more predictive models of sensory systems.
Continued…

Conclusions

 The model performed as good as that of humans in
the task of word recognition and genre identification.
 The model produced human-like error patterns.
 The task optimization resulted in the model
replicating the auditory cortex in one aspect
(branching of layers for specific tasks).
 The predicted fMRI responses throughout the
auditory cortex way better than that of the standard
method (spectrotemporal filter)
Achievements

 Task optimization resulted in better cortical responses by
the model, without which the predictions were poor
(untrained model)
 Intermediate layers of model predicted the primary
response and deep layers of model predicted non-
primary response.
 The model has proven that the organisation of the human
auditory cortex is hierarchical.
 The model was general and the hierarchical organisation
and task optimization made it general and powerful.
Continued…

 The model had some non-linear operations like
normalization and pooling and this is the reason for
its improved response, as a matter of fact research
says that the inner operations in cortex is non-linear,
the model was better than filter which didn’t have
these features.
 An alternative method for evaluating the cortex
organisation was provided by the model (model and
human on same task, both performed same so model
architecture is similar to human)
Continued…

 The task optimization resulted in powerful models
which can replicate the visual and auditory system.
 The primary visual responses were best given by the
early layers of the model and the primary auditory
responses were best given by the intermediate layers
of the model.
 This suggests that the auditory cortex is present
deeper in the computational hierarchy compared to
the visual.
 This is in accordance with the fact that the auditory
cortex has more subcortical nuclei.
Comparisons with the visual
system

 This deep learning model (12 layers) is deeper when
compared to its ancestral models (2 or 3 layers)
 This depth helped in a good representation of complex
real-world tasks and better cortical responses
 The branching of network in deep layers as a result of
task optimisation goes in accordance with the fact of
functional segregation in the non-primary cortex.
 The model could perform other sound related tasks even
though not trained on them.
 The parameters were based only on half of the data and
the model performed better for the untrained data also.
Advantages

 The individual units used in the model are less
readily understood.
 The choice of task wasn’t so important for analysis of
human cortex. The genre task was taken into
consideration due to readily available large dataset,
but this task had some discrepancies that the task is
culture biased.
 The model couldn’t replicate the human in terms of
learning; humans learn by experience and feedback
whereas machine learns by data.
Disadvantages

 The model was able to prove that the human cortex has
hierarchical organisation, but an even better one is
required to prove if it is tripartite or not as seen in
animals.
 Research says that the auditory cortex has more
subcortical nuclei; this can be proven by predicting the
subcortical responses by the early layers of the model.
 Training the model for additional music-related tasks, or
tasks not specific to speech or music, could yield a more
complete model of human behaviour.
 Improving the model from the learning point of view can
make the model more correlated to that of the human.
Future updates

REFERENCE…
Kell, A. J. E., Yamins, D. L. K., Shook, E. N., Norman-Haignere, S. V., &
McDermott, J. H. (2018). A Task-Optimized Neural Network Replicates Human
Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing
Hierarchy. Neuron, 98(3), 630–644.e16. doi:10.1016/j.neuron.2018.03.044
**All information have been taken from this research article.**
TASK-OPTIMIZED DEEP NEURAL NETWORK TO REPLICATE THE HUMAN AUDITORY CORTEX

More Related Content

Similar to TASK-OPTIMIZED DEEP NEURAL NETWORK TO REPLICATE THE HUMAN AUDITORY CORTEX

L005.neural networks
L005.neural networksL005.neural networks
L005.neural networks
EasyMedico.com
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Waqas Tariq
 
Speaker identification system using close set
Speaker identification system using close setSpeaker identification system using close set
Speaker identification system using close set
eSAT Publishing House
 
Speaker identification system using close set
Speaker identification system using close setSpeaker identification system using close set
Speaker identification system using close set
eSAT Journals
 
Soft computing BY:- Dr. Rakesh Kumar Maurya
Soft computing BY:- Dr. Rakesh Kumar MauryaSoft computing BY:- Dr. Rakesh Kumar Maurya
Soft computing BY:- Dr. Rakesh Kumar Maurya
Rakesh Kumar Maurya Maurya
 
Bat Algorithm: A Novel Approach for Global Engineering Optimization
Bat Algorithm: A Novel Approach for Global Engineering OptimizationBat Algorithm: A Novel Approach for Global Engineering Optimization
Bat Algorithm: A Novel Approach for Global Engineering Optimization
Xin-She Yang
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
tsysglobalsolutions
 
A monkey model of auditory scene analysis
A monkey model of auditory scene analysisA monkey model of auditory scene analysis
A monkey model of auditory scene analysis
PradeepD32
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
CSCJournals
 
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...
CSCJournals
 
fMRI Segmentation Using Echo State Neural Network
fMRI Segmentation Using Echo State Neural NetworkfMRI Segmentation Using Echo State Neural Network
fMRI Segmentation Using Echo State Neural Network
CSCJournals
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
IJERA Editor
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrell
zukun
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrell
zukun
 
14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron
Andres Mendez-Vazquez
 
Theories of Speech Perception
Theories of Speech PerceptionTheories of Speech Perception
Theories of Speech Perception
Asma Agha Mashkoor
 
ADFUNN
ADFUNNADFUNN
ADFUNN
adfunn
 
H43014046
H43014046H43014046
H43014046
IJERA Editor
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptx
naveen356604
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
DrPrafullNarooka
 

Similar to TASK-OPTIMIZED DEEP NEURAL NETWORK TO REPLICATE THE HUMAN AUDITORY CORTEX (20)

L005.neural networks
L005.neural networksL005.neural networks
L005.neural networks
 
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Parameters Optimization for Improving ASR Performance in Adverse Real World N...
Parameters Optimization for Improving ASR Performance in Adverse Real World N...
 
Speaker identification system using close set
Speaker identification system using close setSpeaker identification system using close set
Speaker identification system using close set
 
Speaker identification system using close set
Speaker identification system using close setSpeaker identification system using close set
Speaker identification system using close set
 
Soft computing BY:- Dr. Rakesh Kumar Maurya
Soft computing BY:- Dr. Rakesh Kumar MauryaSoft computing BY:- Dr. Rakesh Kumar Maurya
Soft computing BY:- Dr. Rakesh Kumar Maurya
 
Bat Algorithm: A Novel Approach for Global Engineering Optimization
Bat Algorithm: A Novel Approach for Global Engineering OptimizationBat Algorithm: A Novel Approach for Global Engineering Optimization
Bat Algorithm: A Novel Approach for Global Engineering Optimization
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
A monkey model of auditory scene analysis
A monkey model of auditory scene analysisA monkey model of auditory scene analysis
A monkey model of auditory scene analysis
 
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...
 
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...
 
fMRI Segmentation Using Echo State Neural Network
fMRI Segmentation Using Echo State Neural NetworkfMRI Segmentation Using Echo State Neural Network
fMRI Segmentation Using Echo State Neural Network
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrell
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrell
 
14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron
 
Theories of Speech Perception
Theories of Speech PerceptionTheories of Speech Perception
Theories of Speech Perception
 
ADFUNN
ADFUNNADFUNN
ADFUNN
 
H43014046
H43014046H43014046
H43014046
 
soft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptxsoft computing BTU MCA 3rd SEM unit 1 .pptx
soft computing BTU MCA 3rd SEM unit 1 .pptx
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 

More from Sairam Adithya

COUNTERS(Synchronous & Asynchronous)
COUNTERS(Synchronous & Asynchronous)COUNTERS(Synchronous & Asynchronous)
COUNTERS(Synchronous & Asynchronous)
Sairam Adithya
 
SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)
SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)
SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)
Sairam Adithya
 
Medical waste segregation using deep learning
Medical waste segregation using deep learningMedical waste segregation using deep learning
Medical waste segregation using deep learning
Sairam Adithya
 
OpenCV presentation series- part 5
OpenCV presentation series- part 5OpenCV presentation series- part 5
OpenCV presentation series- part 5
Sairam Adithya
 
OpenCV presentation series- part 4
OpenCV presentation series- part 4OpenCV presentation series- part 4
OpenCV presentation series- part 4
Sairam Adithya
 
OpenCV presentation series- part 3
OpenCV presentation series- part 3OpenCV presentation series- part 3
OpenCV presentation series- part 3
Sairam Adithya
 
OpenCV presentation series- part 2
OpenCV presentation series- part 2OpenCV presentation series- part 2
OpenCV presentation series- part 2
Sairam Adithya
 
OpenCV presentation series- part 1
OpenCV presentation series- part 1OpenCV presentation series- part 1
OpenCV presentation series- part 1
Sairam Adithya
 
A Brief Introduction to Diabetes Mellitus
A Brief Introduction to Diabetes MellitusA Brief Introduction to Diabetes Mellitus
A Brief Introduction to Diabetes Mellitus
Sairam Adithya
 
Detection of medical instruments project- PART 2
Detection of medical instruments project- PART 2Detection of medical instruments project- PART 2
Detection of medical instruments project- PART 2
Sairam Adithya
 
Detection of medical instruments project- PART 1
Detection of medical instruments project- PART 1Detection of medical instruments project- PART 1
Detection of medical instruments project- PART 1
Sairam Adithya
 

More from Sairam Adithya (11)

COUNTERS(Synchronous & Asynchronous)
COUNTERS(Synchronous & Asynchronous)COUNTERS(Synchronous & Asynchronous)
COUNTERS(Synchronous & Asynchronous)
 
SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)
SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)
SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)
 
Medical waste segregation using deep learning
Medical waste segregation using deep learningMedical waste segregation using deep learning
Medical waste segregation using deep learning
 
OpenCV presentation series- part 5
OpenCV presentation series- part 5OpenCV presentation series- part 5
OpenCV presentation series- part 5
 
OpenCV presentation series- part 4
OpenCV presentation series- part 4OpenCV presentation series- part 4
OpenCV presentation series- part 4
 
OpenCV presentation series- part 3
OpenCV presentation series- part 3OpenCV presentation series- part 3
OpenCV presentation series- part 3
 
OpenCV presentation series- part 2
OpenCV presentation series- part 2OpenCV presentation series- part 2
OpenCV presentation series- part 2
 
OpenCV presentation series- part 1
OpenCV presentation series- part 1OpenCV presentation series- part 1
OpenCV presentation series- part 1
 
A Brief Introduction to Diabetes Mellitus
A Brief Introduction to Diabetes MellitusA Brief Introduction to Diabetes Mellitus
A Brief Introduction to Diabetes Mellitus
 
Detection of medical instruments project- PART 2
Detection of medical instruments project- PART 2Detection of medical instruments project- PART 2
Detection of medical instruments project- PART 2
 
Detection of medical instruments project- PART 1
Detection of medical instruments project- PART 1Detection of medical instruments project- PART 1
Detection of medical instruments project- PART 1
 

Recently uploaded

Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
deepaannamalai16
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
BoudhayanBhattachari
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
JomonJoseph58
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
nitinpv4ai
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
ssuser13ffe4
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 

Recently uploaded (20)

Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.HYPERTENSION - SLIDE SHARE PRESENTATION.
HYPERTENSION - SLIDE SHARE PRESENTATION.
 
B. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdfB. Ed Syllabus for babasaheb ambedkar education university.pdf
B. Ed Syllabus for babasaheb ambedkar education university.pdf
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
Stack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 MicroprocessorStack Memory Organization of 8086 Microprocessor
Stack Memory Organization of 8086 Microprocessor
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
Bonku-Babus-Friend by Sathyajith Ray (9)
Bonku-Babus-Friend by Sathyajith Ray  (9)Bonku-Babus-Friend by Sathyajith Ray  (9)
Bonku-Babus-Friend by Sathyajith Ray (9)
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
math operations ued in python and all used
math operations ued in python and all usedmath operations ued in python and all used
math operations ued in python and all used
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 

TASK-OPTIMIZED DEEP NEURAL NETWORK TO REPLICATE THE HUMAN AUDITORY CORTEX

  • 2. AN OVERVIEW OF TASK- OPTIMIZED NEURAL NETWORK TO REPLICATE HUMAN AUDITORY BEHAVIOUR
  • 3.   Introduction to the article and some insights on it  Mechanism of hearing by the ear and brain  Organisation in the auditory cortex  Earlier models on cortex and need for a new one  Deep-learning model based on 6 contexts  The method of generating cortical responses by model and traditional method with comparisons  Findings and inferences through different perspectives and analyses  Achievements by the model  Added features in model in comparison to its precursors  Disadvantages of model and future directions ROAD MAP
  • 5.   This paper is about the development of a Convolutional Neural Network (CNN) model which is task optimised to perform some real-world auditory tasks.  This model can be used to understand the architecture of human auditory system. (why need model?)  This model also generates the fMRI voxel responses throughout auditory cortex better than the standard method of using spectrotemporal filter.  The model provides details about the primary and non-primary responses Introduction
  • 6.   There are two types of auditory responses namely the primary and non-primary responses.  PRIMARY RESPONSE is the response obtained from cochlea and the primary pathway (purely auditory). This represents the primary auditory cortex (A1). Carry out simpler tasks. (reason given later)  NON-PRIMARY RESPONSE is the response obtained from the non-primary pathway (mixed senses). This represents the regions beyond the auditory cortex. Carry out complex tasks Auditory responses
  • 7.   The neuronal processing occurring the human ear transforms sound into cortical representations.  They render out behaviourally important sounds explicit (like ampilfier)  The organisation of the human auditory cortex remains unsolved and there are no competent models to explain the process this process of transformation of auditory sounds to representations Ear mechanism
  • 8.   This question was debatable with researches favouring each sides (distributive or hierarchical).  Formisano and Staeren proposed an anatomical distributive organisation  There is a tripartite hierarchical organisation seen in non-human type animals which carry out simple tasks.  However, these cannot confirm whether the auditory cortex is distributive or hierarchical. Organisation of auditory cortex
  • 10.   Some early models used linear filtering of cochleogram (sound in image format) using 1-2 stages.  However the process of transformation is non-linear. So those methods failed to address the purpose.  So it is essential to develop models which carry out non-linear functions.  Model has to provide answer for the organisation and the transformation. Early models and cause of failure
  • 11. Sl.no Name Description Our case scenario 1 Data Type of data provided to input Cochleogram 2 Task Operation required to do on the input Classification(multiple) Regression(prediction) 3 Model The mathematical relation between input and output. This varies based on the task and complexity and may involve layers CNN (Convolutional Neural Network) 4 Error Kind of a compiler which finds error between two different quantities Comparison of the model’s classification with human’s classification 5 Algorithm A kind of learning procedure which tries to reduce the error computed before Stochastic Gradient descent 6 Evaluation Finding how good the model has performed Comparison with human behaviour
  • 12.   The data for which the CNN is trained is known as cochleogram.  The cochleogram is the visual representation of the sound signals.  The cochleogram is a spectro-temporal representation of speech.  A 2-second sound signal is taken as input. Data
  • 13.   There are two tasks to be performed namely word identification and music genre recognition  The task is made difficult by introducing background noises with the music/word sound  The task is to find one word out of the 587 or to find one genre out of 41 categories.  Also the model produces cortical responses. Task
  • 14.
  • 15.   The model contains Convolution, Pooling, Dense, Filter response normalisation and Dropout layers.  It is a hierarchical model. The layers present in CNN (convolution and pooling) perform non- linear operations.  The model had five convolutional, three pooling, two normalization, and two fully connected layers.  The processing (7 shared layers) are same for both but have different FC layers(5 different). So models parameter reduces by half.  The hyper parameters were task optimized. Model
  • 16.   This model was derived from two-steps  First step involved 180 architectures each being 12 layered and single tasked  The second step involved 7 architectures of 12 layers and dual tasked. Model selection
  • 17.   In order to evaluate the likeliness of the models response with that of the human, the model is compared with that of the human  For WORD IDENTIFICATION, the human is allowed to use an UI which will auto-complete the word (to ensure that it belongs to one of 587 classes)  For GENRE IDENTIFICATION, the human is allowed to list down five preferences of genre (top 5).  The error here is the wrong predictions.  A interesting feature observed was that the model made error pattern like human. Error
  • 18.   The algorithm used here is the stochastic gradient descent.  The role of the algorithm is to find the optimum values of the parameters such that the loss is very less (theoretically 0)  The word stochastic refers to the way of taking the input)- one at a time is stochastic  The gradient descent refers to the attempt of reducing the gradient by finding the local minima of the gradient Algorithm
  • 19.   The confusion matrix is used to evaluate the performance of the model in the genre recognition task. (41 classes)  The confusion matrix is matrix with rows and columns equal to classes and it compares the truth with model prediction and has 4 fields.  The same can be plotted for word identification but the graph will be erroneous due to 587 classes. Evaluation
  • 21.   The next task to be done by the model is to generate fMRI voxel responses throughout auditory cortex. In short, it has to produce cortical responses.  The voxel is a single unit block in a 3-D image (mine craft).  The data used here are 165 natural sounds heard regularly in which 52 were words and music.  The model was trained for these sounds and the voxels generated for each of these sounds were collected.  These were compared with the standard method of spectrotemporal filter Cortical responses
  • 22.   Listening and hearing…..  An important process in the processing of the auditory signals is the ‘attention’.  Taking in the required signal and eliminating the rest unwanted ones.  Hence a filter is formed inside the auditory cortex with two functions. Like neurons which respond maximally to given input frequencies.  To incorporate information about both the timing (rhythm) and the frequency content of the relevant auditory stimulus stream.  To enhance the sensory representation of attended stimuli along these two feature dimensions. Spectrotemporal filters
  • 23.   The response/prediction from each of the layer in the time- averaged model was taken into consideration.  This is done by using the linear regression, by using the ‘linear’ activation function in each of the layers.  The predictions from each layer were linearly combined to artificially create a ‘voxel’.  As a result, we have a voxel’s response for all 165 sounds from all layers.  The BOLD curve looks inactive for 2-s, hence the average is used. Method of extraction
  • 24.   The comparisons were made using four elements:-  The trained model with perfect weights  The untrained model with random weights  The traditional spectrotemporal filter model  The random model from selection Comparisons
  • 25.   The comparison must be done with the truth  The truth is obtained by feeding the same to a fMRI machine to get the voxels  At first, the BOLD variance for all the 4 methods  This was done for correcting both the reliability of the measured voxel response and the predicted voxel response  The comparisons for made on all voxels and some specified voxels.  As expected, the trained model has high variance and was better than spectrotemporal model and untrained one. BOLD variance
  • 26.   Then the median variance was taken for the same  The trained model (70%) had more variance than the spectrotemporal filter (55%)  The filter model had the highest number of parameters it can withstand.  And it eventually saturated.  The untrained and random model was worse than the trained model and spectrotemporal filter model.  The trained model had the highest variance on all ROI and proved to be better than traditional one. Median variance
  • 28.   The trained deep learning model performed the best and was far better than the spectrotemporal one.  The reason for this improved voxel response is due to the hierarchical organisation of the model.  The convolution and the pooling layers of the model produced a receptive field (spectrum of signals) similar to that of the cortical system.  Also the model performed better than the spectrotemporal model in the region of interests. Findings
  • 29.   So this says that the model is able to respond to the natural sounds better than that of the spectrotemporal filter model throughout the auditory cortex  This is due to the hierarchical organisation of the model.  The task optimization has resulted in a good cortical response Inference
  • 30.   The responses obtained from the later layers of the network were non-linear when compared to other layers.  So in order to assess this property, it is essential to compare the response from each layer of the model.  The median variance for individual layers were taken into consideration for comparison.  And based on these, there were some important findings which lead to some inferences about the organisation of the human auditory system. Procedure to assess the hierarchical organisation
  • 31.   The median variance increased for all layers and then deceased for the last layers.  All layers except the first and last performed better than the spectrotemporal filter model  All layers except the last layer in the trained model had more variance compared to the untrained even though their dependencies with data were the same. The intermediate layers made the best prediction whereas the final layers made poor predictions. Findings
  • 32.   The receptive fields of some of the layers in the network were similar to that in auditory cortex and this maybe the reason for their high performance.  The task optimisation has helped in replicating some of the cortical properties onto the model.  As per the task, the neurons in the final layers involved in perpetual decisions.  Such neurons maybe present in the auditory cortex but their organisation maybe not accessible by conventional fMRI.  Or these might be beyond the auditory cortex either on other brain lobes Inferences
  • 33.  Summary map  The variance of the layers were plotted using special images.  The heatmaps of the variance and predictions of the individual layers were mapped onto the probabilistic map which involves three anatomically defined regions of the primary auditory cortex. This is done for individual test subject.  The average taken over all subject is a summary map.  This is relating the model and human cortex.  The black outlines are the anatomical regions.
  • 34.  Findings  The intermediate layers best predicted the voxels and this constitutes to the primary auditory cortex(core)  The last layer of the network constitutes to the region away from auditory cortex (non-core) .  The same results were not seen in an untrained model with random weights.  Also the same results were seen when words and music were removed from training data.
  • 35.   This gives the reason that the intermediate and the last layers of network generates primary and non-primary responses.  Also the intermediate layers perform simpler tasks when compared to the later layers (reason given later)  The same results were seen i.e. the primary voxel best from intermediate and non-primary voxel best from last even when word and music were removed.  This suggests that the hierarchical structure of the model helped it in generating better cortical responses for everyday sounds Inference
  • 36.   These are four functionally defined Region Of Interests (ROI’s) namely:-  frequency selective  pitch selective  word selective  music selective Regions of interest
  • 37.  VOXEL TYPE/LAYER INTERMEDIATE LAYER DEEP LAYER FREQUENCY-SELECTIVE   PITCH-SELECTIVE   MUSIC-SELECTIVE   SPEECH-SELECTIVE   Findings
  • 38.   The frequency voxels which were best explained by the intermediate layers are found early in hierarchy and the speech voxels which were best explained by the later layers were found later in the hierarchy.  This can be the reason for which intermediate layer does simpler function and the later layers perform complex functions.  As before the untrained network was lower than that of the trained network and also the spectrotemporal model.  The dependencies did not affect the performance of the model suggesting that the task optimization was critical to map the features in the layers to the auditory cortex.  The ROI analysis supports hierarchy organisation Inference
  • 39.  HENCE BOTH THE MODEL AND THE HUMAN CORTEX ARE ORGANISED HIERARCHICALLY!!
  • 40.   The representation of the acoustic features by the network were compared with that of the spectrotemporal model.  To check whether the representations of both models were linearly decodable.  For this, the data was divided into two subsets for which the first one was used for mapping and second for quality checking. Acoustic features
  • 41.   The ability for the network layers to extract spectral information from the data decreased as the layers progressed.  The extraction ability was constant for the spectrotemporal model which peaked at the intermediate layer.  The prediction of the later layer is worse than the earlier and this was prominent in the untrained model. Findings and inference
  • 42.   It is essential that the model performs well on real world task in order to replicate the auditory cortex  The model was analysed layer-wise on the existing task and a new speaker identification task for which model wasn’t trained.  This was done by fixing the weights and optimizing by using the softmax activation function in the layers which took output from a previous layer and gave it to the next layer. Real-world task performance
  • 43.  Findings  The findings were contrary to that seen previously  The performance improved from early to the deeper layers of the network.  The same level of performance was seen also in the speaker identification task except for final layer.  This suggests that the network representations are task- generalised. (same for most auditory tasks)
  • 44.   All of the previous findings and analyses portray the process of transformation from cochlea to cortex  The role of the cortex is to transform acoustic features obtained from the cochlea into meaningful representations and the role of this transformation is unknown  These analyses suggest that the task-related information which were not clear/explained in cochlea (implicit) and when these went to the auditory cortex which transforms into representations which were well clear/explained (explicit)  In simpler terms, the transformation has provided some meaning and explanation to the information using which both the brain and the model figured out the output. Inference
  • 45.   The input data involved the incorporation of background noise with the sound signal  They were added at different SNR (Signal to Noise Ratio)  The analysis done on this constitutes to the SNC (Signal to Noise Characteristics).  The signals were categorised according to the SNR and were fed to the network for analysis.  The objective is to find the role of noise in processing information from the signal. SNC
  • 46.   The signals with less noise were well classified by the intermediate layers as well as the deep layers.  But, the signals with more noise were well classified by the deep layers only.  The later layers of the model are insensitive to noise or they are noise-immune Findings and inference
  • 47.   The data used here was the same as of fMRI but the words and music were excluded (113 samples).  These sounds were divided into two subsets based on stationarity (the stability of mean, SD etc.)  They divided the cochleogram into categories and taking standard deviation over time.  Then the individual layer response for the two sets of sounds were measured. Later the same was compared with voxels From the fMRI machine Noise-stimuli sensitivity
  • 48.   The deep layers of the network trained on these natural sounds had exhibited a greater response for the non-stationary sounds when compared to that of the stationary sounds.  However the same effect was not observed in the untrained network.  From the fMRI, the responses to stationary and non-stationary responses were similar in the primary areas (A1), but more response was seen to non-stationary sounds in the non-primary areas. Findings
  • 49.   There is a differentiation between the primary and the non-primary regions functionally and these proofs support to that of the similar (intermediate- primary and deep-non primary)  There is a suppression of sound in the later layers of the model and in the non-primary regions and hence this contributed for better response to non-stationary sounds by the deep layers and non-primary cortex.  This has helped the model to predict responses to natural sounds even though they were affected with noise. Inference
  • 50.  Task-performance  It was found that networks with better performance on a real- world visual object recognition task better predict cortical responses in the visual stream.  To prove the same, 57 different models from stage-1 were taken at 14 different training points (798) for either word or genre task  The median variance was measured for each layer.
  • 51.   The performance of a network on a task strongly correlated with the variance it explained in auditory cortical responses.  The word task had a Spearman correlation of 0.87 and the genre task had a Spearman correlation of 0.85  These results suggests that the task-based optimization of deep neural networks can help yield more predictive models of sensory systems. Continued…
  • 53.   The model performed as good as that of humans in the task of word recognition and genre identification.  The model produced human-like error patterns.  The task optimization resulted in the model replicating the auditory cortex in one aspect (branching of layers for specific tasks).  The predicted fMRI responses throughout the auditory cortex way better than that of the standard method (spectrotemporal filter) Achievements
  • 54.   Task optimization resulted in better cortical responses by the model, without which the predictions were poor (untrained model)  Intermediate layers of model predicted the primary response and deep layers of model predicted non- primary response.  The model has proven that the organisation of the human auditory cortex is hierarchical.  The model was general and the hierarchical organisation and task optimization made it general and powerful. Continued…
  • 55.   The model had some non-linear operations like normalization and pooling and this is the reason for its improved response, as a matter of fact research says that the inner operations in cortex is non-linear, the model was better than filter which didn’t have these features.  An alternative method for evaluating the cortex organisation was provided by the model (model and human on same task, both performed same so model architecture is similar to human) Continued…
  • 56.   The task optimization resulted in powerful models which can replicate the visual and auditory system.  The primary visual responses were best given by the early layers of the model and the primary auditory responses were best given by the intermediate layers of the model.  This suggests that the auditory cortex is present deeper in the computational hierarchy compared to the visual.  This is in accordance with the fact that the auditory cortex has more subcortical nuclei. Comparisons with the visual system
  • 57.   This deep learning model (12 layers) is deeper when compared to its ancestral models (2 or 3 layers)  This depth helped in a good representation of complex real-world tasks and better cortical responses  The branching of network in deep layers as a result of task optimisation goes in accordance with the fact of functional segregation in the non-primary cortex.  The model could perform other sound related tasks even though not trained on them.  The parameters were based only on half of the data and the model performed better for the untrained data also. Advantages
  • 58.   The individual units used in the model are less readily understood.  The choice of task wasn’t so important for analysis of human cortex. The genre task was taken into consideration due to readily available large dataset, but this task had some discrepancies that the task is culture biased.  The model couldn’t replicate the human in terms of learning; humans learn by experience and feedback whereas machine learns by data. Disadvantages
  • 59.   The model was able to prove that the human cortex has hierarchical organisation, but an even better one is required to prove if it is tripartite or not as seen in animals.  Research says that the auditory cortex has more subcortical nuclei; this can be proven by predicting the subcortical responses by the early layers of the model.  Training the model for additional music-related tasks, or tasks not specific to speech or music, could yield a more complete model of human behaviour.  Improving the model from the learning point of view can make the model more correlated to that of the human. Future updates
  • 60.  REFERENCE… Kell, A. J. E., Yamins, D. L. K., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy. Neuron, 98(3), 630–644.e16. doi:10.1016/j.neuron.2018.03.044 **All information have been taken from this research article.**