this presentation is about a research paper which deals with the development of a deep-learning model to replicate the human auditor system. A lot of interesting facts about the human auditory cortex has been found out through the model. Ultimately, the model is able to replicate the human both task-wise and structure-wise. In other words, appropriate information about the brain was obtained through the model which was performing like the human.
A framework for approaches to transfer of mind substrateKarlos Svoboda
This document outlines a framework for discussing approaches to transferring a mind's substrate. It summarizes recent developments in neural prosthesis that could allow functional replacement of brain parts, potentially leading to a form of "mind-substrate transfer." It reviews two main proposed approaches to mind-substrate transfer: 1) Reconstruction from a brain scan, which would involve scanning the brain at high resolution and simulating its functioning. 2) Reconstruction from behavior, which would involve collecting behavioral information about an individual to parametrize a generic substrate. It argues that an underlying question is what constitutes a person's identity and whether identity could be transferred between original and synthetic substrates.
This document describes a brain-computer interface (BCI) design that uses electroencephalogram (EEG) signals from a single mental task. The method extracts spectral power from 4 brainwave bands (delta/theta, alpha, beta, gamma) across 6 electrode channels. It then uses the power and power differences as features for a neural network classifier to detect the mental task versus a resting state. Testing on 4 subjects performing 4 tasks showed classification accuracy up to 97.5% was possible when using each subject's most suitable task. The proposed BCI could potentially be used to move a cursor or select letters to allow communication.
This document provides an overview of using soft computing techniques for DNA sequence classification. It discusses DNA and DNA sequencing. It then introduces common soft computing techniques used for classification, including neural networks, fuzzy logic, and genetic algorithms. The document proposes using these soft computing methods for DNA sequence classification and describes related studies. It outlines a methodology using neural networks and genetic algorithms and analyzes the advantages of soft computing for this application. In conclusion, it states that soft computing techniques are well-suited for DNA sequence classification problems.
This document compares dynamic brain states measured using electro- and magneto-encephalography (EMEG) as human listeners perceive spoken words, to dynamic machine states generated by an Automatic Speech Recognition (ASR) system analyzing the same words. Using a novel multivariate pattern analysis technique called Spatiotemporal Searchlight Representational Similarity Analysis (ssRSA), it relates the similarity structures of incremental brain states in human superior temporal cortex to similarity structures of ASR-derived phonetic models, finding a significant correspondence. This suggests human and machine solutions to speech recognition relate acoustic input to phonetic labels in similar ways, both representing these regularities in terms of articulatory phonetic features.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity
The sophisticated signal processing techniques developed during last years for structural and functional imaging methods allow us to detect abnormalities of brain connectivity in brain disorders with unprecedented detail. Interestingly, recent works shed light on both functional and structural underpinnings of musical anhedonia (i.e., the individual's incapacity to enjoy listening to music). On the other hand, computational models based on brain simulation tools are being used more and more for mapping the functional consequences of structural abnormalities. The latter could help to better understand the mechanism that is impaired in people unable to derive pleasure from music, and formulate hypotheses on how music acquired reward value. The presentation gives an overview of today's studies and proposes a possible simulation pipeline to reproduce such scenario.
This document discusses and compares different feature extraction methods for speaker identification, including Mel Frequency Cepstral Coefficients (MFCC), Inverse Mel Frequency Cepstral Coefficients (IMFCC), and Linear Predictive Cepstral Coefficients (LPCC). It analyzes these features extracted using Gaussian filters and modeled with Gaussian Mixture Models (GMM) and Universal Background Models (UBM). The document finds that fusing MFCC and IMFCC features outperforms using LPCC features alone based on tests on the TIMIT database.
A framework for approaches to transfer of mind substrateKarlos Svoboda
This document outlines a framework for discussing approaches to transferring a mind's substrate. It summarizes recent developments in neural prosthesis that could allow functional replacement of brain parts, potentially leading to a form of "mind-substrate transfer." It reviews two main proposed approaches to mind-substrate transfer: 1) Reconstruction from a brain scan, which would involve scanning the brain at high resolution and simulating its functioning. 2) Reconstruction from behavior, which would involve collecting behavioral information about an individual to parametrize a generic substrate. It argues that an underlying question is what constitutes a person's identity and whether identity could be transferred between original and synthetic substrates.
This document describes a brain-computer interface (BCI) design that uses electroencephalogram (EEG) signals from a single mental task. The method extracts spectral power from 4 brainwave bands (delta/theta, alpha, beta, gamma) across 6 electrode channels. It then uses the power and power differences as features for a neural network classifier to detect the mental task versus a resting state. Testing on 4 subjects performing 4 tasks showed classification accuracy up to 97.5% was possible when using each subject's most suitable task. The proposed BCI could potentially be used to move a cursor or select letters to allow communication.
This document provides an overview of using soft computing techniques for DNA sequence classification. It discusses DNA and DNA sequencing. It then introduces common soft computing techniques used for classification, including neural networks, fuzzy logic, and genetic algorithms. The document proposes using these soft computing methods for DNA sequence classification and describes related studies. It outlines a methodology using neural networks and genetic algorithms and analyzes the advantages of soft computing for this application. In conclusion, it states that soft computing techniques are well-suited for DNA sequence classification problems.
This document compares dynamic brain states measured using electro- and magneto-encephalography (EMEG) as human listeners perceive spoken words, to dynamic machine states generated by an Automatic Speech Recognition (ASR) system analyzing the same words. Using a novel multivariate pattern analysis technique called Spatiotemporal Searchlight Representational Similarity Analysis (ssRSA), it relates the similarity structures of incremental brain states in human superior temporal cortex to similarity structures of ASR-derived phonetic models, finding a significant correspondence. This suggests human and machine solutions to speech recognition relate acoustic input to phonetic labels in similar ways, both representing these regularities in terms of articulatory phonetic features.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity
The sophisticated signal processing techniques developed during last years for structural and functional imaging methods allow us to detect abnormalities of brain connectivity in brain disorders with unprecedented detail. Interestingly, recent works shed light on both functional and structural underpinnings of musical anhedonia (i.e., the individual's incapacity to enjoy listening to music). On the other hand, computational models based on brain simulation tools are being used more and more for mapping the functional consequences of structural abnormalities. The latter could help to better understand the mechanism that is impaired in people unable to derive pleasure from music, and formulate hypotheses on how music acquired reward value. The presentation gives an overview of today's studies and proposes a possible simulation pipeline to reproduce such scenario.
This document discusses and compares different feature extraction methods for speaker identification, including Mel Frequency Cepstral Coefficients (MFCC), Inverse Mel Frequency Cepstral Coefficients (IMFCC), and Linear Predictive Cepstral Coefficients (LPCC). It analyzes these features extracted using Gaussian filters and modeled with Gaussian Mixture Models (GMM) and Universal Background Models (UBM). The document finds that fusing MFCC and IMFCC features outperforms using LPCC features alone based on tests on the TIMIT database.
The document provides an overview of artificial neural networks, including:
1) Biological neural networks can learn patterns and generalize, similar to artificial neural networks. Pigeons can distinguish paintings by artist with high accuracy.
2) Artificial neural networks are modeled after biological neurons and synapses. Feedforward networks use backpropagation to train weights and minimize error through multiple iterations.
3) Recurrent networks allow bidirectional information flow and memory over time. Elman networks and Hopfield networks are examples used for language processing and content-addressable memory.
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Speaker identification system using close seteSAT Journals
Abstract The present paper describes experiments conducted to evaluate the performance of speaker recognition. The experiments conducted using Neural Network shows that the complexity of speaker recognition increases when the numbers of speakers to be identified are large in numbers in the text independent situation. In the first experiment error rate was zero for 10 speaker’s classification and performance was good. By increasing the number of speakers the error rate increased, classification and performance were poor. After 25 speakers, the error rate was very high. For 100 speaker’s classification MATLAB NN tool did not support for display the confusion matrix. To overcome this problem the second experiment has been done. In this experiment a close set of 10 groups of 100 speakers (each group of 10 speakers) in terms of cell array in MATLAB has been defined and we observed that the best result of speaker identification was 100% in 20 continuous features of speaker’s voice, but it increased time complexity. In the third experiment speaker’s dialect and regions were also identified and classification performance was 100% at 97 epochs, validation performance was 0.0035046 at 91 epochs and the error rate was zero has found. Index Terms: text dependent, text independent, speaker identification, Neural Network, close set, MFCC, Speaker identification, close set.
The document provides information about soft computing techniques and artificial neural networks. It contains the following key points:
1. It introduces soft computing techniques such as neural networks, fuzzy logic, and genetic algorithms. It recommends books on these topics.
2. It discusses the biological neural network in the human brain and its characteristics such as the ability to learn and generalize knowledge.
3. It describes the goal of artificial neural networks is to simulate the human brain for functions like planning, thought, and speech recognition.
4. It outlines the basic biological components of neurons like the cell body, dendrites, axon, and synapse. It also introduces the characteristics of artificial neural networks and different neural network models like
Bat Algorithm: A Novel Approach for Global Engineering OptimizationXin-She Yang
The document introduces a new metaheuristic optimization algorithm called the Bat Algorithm (BA) which is inspired by the echolocation behavior of microbats. The BA is formulated based on echolocation characteristics such as loudness variation and pulse emission rates. The BA is tested on eight well-known nonlinear engineering optimization problems and is found to perform better than existing algorithms. The unique search features of the BA are analyzed and its potential for future research is discussed.
A monkey model of auditory scene analysisPradeepD32
My work impacts half the world who develop age-related hearing loss with difficulty understanding speech in noise. To understand how the brain solves the cocktail party problem, I need to record from neurons suitable only in animals. Monkeys are best suited for this given our similar auditory brains. I use sounds without semantics and employ fMRI to show that monkeys use similar brain regions as humans to separate overlapping sounds. This study is the first to show such evidence in any animal. Now, I can record from monkey neurons and generalize the results to humans!
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
Modern automatic speech recognition (ASR) systems typically use a bank of linear filters as the first step in performing frequency analysis of speech. On the other hand, the cochlea, which is responsible for frequency analysis in the human auditory system, is known to have a compressive non-linear frequency response which depends on input stimulus level. It will be shown in this paper that it presents a new method on the use of the gammachirp auditory filter based on a continuous wavelet analysis. The essential characteristic of this model is that it proposes an analysis by wavelet packet transformation on the frequency bands that come closer the critical bands of the ear that differs from the existing model based on an analysis by a short term Fourier transformation (STFT). The prosodic features such as pitch, formant frequency, jitter and shimmer are extracted from the fundamental frequency contour and added to baseline spectral features, specifically, Mel Frequency Cepstral Coefficients (MFCC) for human speech, Gammachirp Filterbank Cepstral Coefficient (GFCC) and Gammachirp Wavelet Frequency Cepstral Coefficient (GWFCC). The results show that the gammachirp wavelet gives results that are comparable to ones obtained by MFCC and GFCC. Experimental results show the best performance of this architecture. This paper implements the GW and examines its application to a specific example of speech. Implications for noise robust speech analysis are also discussed within AURORA databases.
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...CSCJournals
The two widespread concepts of noise reduction algorithms could be observed are spectral noise subtraction and adaptive filtering. They have the disadvantage that there is no parameter to distinguish between the speech and the noise components of same frequency. In this paper, an intelligent controller, BELBIC, based on mammalian limbic Emotional Learning algorithms is used for increasing the speech quality from a noisy environment. Here the learning ability to train the system to recognize and the output thus obtained would be the fundamental frequency of the speech spectrum thus reducing the noise level to minimum. The parameters on which the reduction of noise from the input speech spectrum depends have also been studied. The real time implementations have been done using Simulink and the results of the analysis thus obtained are included in the end.
fMRI Segmentation Using Echo State Neural NetworkCSCJournals
This research work proposes a new intelligent segmentation technique for functional Magnetic Resonance Imaging (fMRI). It has been implemented using an Echostate Neural Network (ESN). Segmentation is an important process that helps in identifying objects of the image. Existing segmentation methods are not able to exactly segment the complicated profile of the fMRI accurately. Segmentation of every pixel in the fMRI correctly helps in proper location of tumor. The presence of noise and artifacts poses a challenging problem in proper segmentation. The proposed ESN is an estimation method with energy minimization. The estimation property helps in better segmentation of the complicated profile of the fMRI. The performance of the new segmentation method is found to be better with higher peak signal to noise ratio (PSNR) of 61 when compared to the PSNR of the existing back-propagation algorithm (BPA) segmentation method which is 57.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The document discusses an unsupervised learning algorithm that learns visual representations from natural images and videos. When applied to images, the algorithm learns retinal ganglion cell properties, and when applied to sounds, it learns auditory nerve properties. The algorithm is also used to learn hierarchical representations in a model called RICA, which learns simple cell properties in early visual areas. RICA can be used for face and object recognition tasks.
The document discusses an unsupervised learning algorithm that learns visual representations from natural images and videos. When applied to images, the algorithm learns retinal ganglion cell properties, and when applied to sounds, it learns auditory nerve properties. The algorithm is also used to learn hierarchical representations in a model called RICA, which learns simple cell properties in early visual areas. RICA can be used for face and object recognition tasks.
Dr. Miao Kang proposes a new method called Modal Learning that adapts both the neuron activation functions and connection weights in a neural network. This approach aims to overcome limitations of single layer networks, significantly speed up learning, and simplify hardware requirements compared to traditional neural network approaches. Key contributions include ADFUNN, a single layer network that solves linearly inseparable problems faster and more simply than previous methods, and MADFUNN, a multi-layer version that achieves highly efficient results on large datasets compared to related works.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
soft computing BTU MCA 3rd SEM unit 1 .pptxnaveen356604
This document discusses hard computing and soft computing. Hard computing uses deterministic algorithms and mathematical models to produce accurate and predictable results, while soft computing can handle imprecision, uncertainty, and ambiguity. Soft computing techniques include fuzzy logic, neural networks, genetic algorithms, probabilistic reasoning, and evolutionary computation. These techniques aim to mimic human-like reasoning by tolerating uncertainty, learning and adapting, and integrating multiple methods. Examples of evolutionary computation algorithms provided are genetic algorithms, genetic programming, evolutionary strategies, differential evolution, and particle swarm optimization. Neural networks, ant colony optimization, and fuzzy logic are also summarized.
The document discusses artificial neural networks and machine learning. It describes different learning paradigms like supervised, unsupervised, and reinforcement learning. It also discusses factors that affect neural network performance such as transfer functions, training set size, network topology, and weight adjustment algorithms. Applications of neural networks include function approximation, classification, and data processing.
This presentation is all about counters, focusing on synchronous and asynchronous counters. The unique feature is the incorporation of the circuit images generated from MULTISIM software imparting practical knowledge to the users.
SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)Sairam Adithya
this presentation is about the sequential logic circuits, mainly concentrating on flip-flops and latches. a unique feature in this presentation is the incorporation of circuit images generated from Multisim software imparting practical knowledge to the users. this consists of both the active low and high versions of different circuits.
More Related Content
Similar to TASK-OPTIMIZED DEEP NEURAL NETWORK TO REPLICATE THE HUMAN AUDITORY CORTEX
The document provides an overview of artificial neural networks, including:
1) Biological neural networks can learn patterns and generalize, similar to artificial neural networks. Pigeons can distinguish paintings by artist with high accuracy.
2) Artificial neural networks are modeled after biological neurons and synapses. Feedforward networks use backpropagation to train weights and minimize error through multiple iterations.
3) Recurrent networks allow bidirectional information flow and memory over time. Elman networks and Hopfield networks are examples used for language processing and content-addressable memory.
Parameters Optimization for Improving ASR Performance in Adverse Real World N...Waqas Tariq
From the existing research it has been observed that many techniques and methodologies are available for performing every step of Automatic Speech Recognition (ASR) system, but the performance (Minimization of Word Error Recognition-WER and Maximization of Word Accuracy Rate- WAR) of the methodology is not dependent on the only technique applied in that method. The research work indicates that, performance mainly depends on the category of the noise, the level of the noise and the variable size of the window, frame, frame overlap etc is considered in the existing methods. The main aim of the work presented in this paper is to use variable size of parameters like window size, frame size and frame overlap percentage to observe the performance of algorithms for various categories of noise with different levels and also train the system for all size of parameters and category of real world noisy environment to improve the performance of the speech recognition system. This paper presents the results of Signal-to-Noise Ratio (SNR) and Accuracy test by applying variable size of parameters. It is observed that, it is really very hard to evaluate test results and decide parameter size for ASR performance improvement for its resultant optimization. Hence, this study further suggests the feasible and optimum parameter size using Fuzzy Inference System (FIS) for enhancing resultant accuracy in adverse real world noisy environmental conditions. This work will be helpful to give discriminative training of ubiquitous ASR system for better Human Computer Interaction (HCI). Keywords: ASR Performance, ASR Parameters Optimization, Multi-Environmental Training, Fuzzy Inference System for ASR, ubiquitous ASR system, Human Computer Interaction (HCI)
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Speaker identification system using close seteSAT Journals
Abstract The present paper describes experiments conducted to evaluate the performance of speaker recognition. The experiments conducted using Neural Network shows that the complexity of speaker recognition increases when the numbers of speakers to be identified are large in numbers in the text independent situation. In the first experiment error rate was zero for 10 speaker’s classification and performance was good. By increasing the number of speakers the error rate increased, classification and performance were poor. After 25 speakers, the error rate was very high. For 100 speaker’s classification MATLAB NN tool did not support for display the confusion matrix. To overcome this problem the second experiment has been done. In this experiment a close set of 10 groups of 100 speakers (each group of 10 speakers) in terms of cell array in MATLAB has been defined and we observed that the best result of speaker identification was 100% in 20 continuous features of speaker’s voice, but it increased time complexity. In the third experiment speaker’s dialect and regions were also identified and classification performance was 100% at 97 epochs, validation performance was 0.0035046 at 91 epochs and the error rate was zero has found. Index Terms: text dependent, text independent, speaker identification, Neural Network, close set, MFCC, Speaker identification, close set.
The document provides information about soft computing techniques and artificial neural networks. It contains the following key points:
1. It introduces soft computing techniques such as neural networks, fuzzy logic, and genetic algorithms. It recommends books on these topics.
2. It discusses the biological neural network in the human brain and its characteristics such as the ability to learn and generalize knowledge.
3. It describes the goal of artificial neural networks is to simulate the human brain for functions like planning, thought, and speech recognition.
4. It outlines the basic biological components of neurons like the cell body, dendrites, axon, and synapse. It also introduces the characteristics of artificial neural networks and different neural network models like
Bat Algorithm: A Novel Approach for Global Engineering OptimizationXin-She Yang
The document introduces a new metaheuristic optimization algorithm called the Bat Algorithm (BA) which is inspired by the echolocation behavior of microbats. The BA is formulated based on echolocation characteristics such as loudness variation and pulse emission rates. The BA is tested on eight well-known nonlinear engineering optimization problems and is found to perform better than existing algorithms. The unique search features of the BA are analyzed and its potential for future research is discussed.
A monkey model of auditory scene analysisPradeepD32
My work impacts half the world who develop age-related hearing loss with difficulty understanding speech in noise. To understand how the brain solves the cocktail party problem, I need to record from neurons suitable only in animals. Monkeys are best suited for this given our similar auditory brains. I use sounds without semantics and employ fMRI to show that monkeys use similar brain regions as humans to separate overlapping sounds. This study is the first to show such evidence in any animal. Now, I can record from monkey neurons and generalize the results to humans!
A Comparative Study: Gammachirp Wavelets and Auditory Filter Using Prosodic F...CSCJournals
Modern automatic speech recognition (ASR) systems typically use a bank of linear filters as the first step in performing frequency analysis of speech. On the other hand, the cochlea, which is responsible for frequency analysis in the human auditory system, is known to have a compressive non-linear frequency response which depends on input stimulus level. It will be shown in this paper that it presents a new method on the use of the gammachirp auditory filter based on a continuous wavelet analysis. The essential characteristic of this model is that it proposes an analysis by wavelet packet transformation on the frequency bands that come closer the critical bands of the ear that differs from the existing model based on an analysis by a short term Fourier transformation (STFT). The prosodic features such as pitch, formant frequency, jitter and shimmer are extracted from the fundamental frequency contour and added to baseline spectral features, specifically, Mel Frequency Cepstral Coefficients (MFCC) for human speech, Gammachirp Filterbank Cepstral Coefficient (GFCC) and Gammachirp Wavelet Frequency Cepstral Coefficient (GWFCC). The results show that the gammachirp wavelet gives results that are comparable to ones obtained by MFCC and GFCC. Experimental results show the best performance of this architecture. This paper implements the GW and examines its application to a specific example of speech. Implications for noise robust speech analysis are also discussed within AURORA databases.
An Approach to Reduce Noise in Speech Signals Using an Intelligent System: BE...CSCJournals
The two widespread concepts of noise reduction algorithms could be observed are spectral noise subtraction and adaptive filtering. They have the disadvantage that there is no parameter to distinguish between the speech and the noise components of same frequency. In this paper, an intelligent controller, BELBIC, based on mammalian limbic Emotional Learning algorithms is used for increasing the speech quality from a noisy environment. Here the learning ability to train the system to recognize and the output thus obtained would be the fundamental frequency of the speech spectrum thus reducing the noise level to minimum. The parameters on which the reduction of noise from the input speech spectrum depends have also been studied. The real time implementations have been done using Simulink and the results of the analysis thus obtained are included in the end.
fMRI Segmentation Using Echo State Neural NetworkCSCJournals
This research work proposes a new intelligent segmentation technique for functional Magnetic Resonance Imaging (fMRI). It has been implemented using an Echostate Neural Network (ESN). Segmentation is an important process that helps in identifying objects of the image. Existing segmentation methods are not able to exactly segment the complicated profile of the fMRI accurately. Segmentation of every pixel in the fMRI correctly helps in proper location of tumor. The presence of noise and artifacts poses a challenging problem in proper segmentation. The proposed ESN is an estimation method with energy minimization. The estimation property helps in better segmentation of the complicated profile of the fMRI. The performance of the new segmentation method is found to be better with higher peak signal to noise ratio (PSNR) of 61 when compared to the PSNR of the existing back-propagation algorithm (BPA) segmentation method which is 57.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
The document discusses an unsupervised learning algorithm that learns visual representations from natural images and videos. When applied to images, the algorithm learns retinal ganglion cell properties, and when applied to sounds, it learns auditory nerve properties. The algorithm is also used to learn hierarchical representations in a model called RICA, which learns simple cell properties in early visual areas. RICA can be used for face and object recognition tasks.
The document discusses an unsupervised learning algorithm that learns visual representations from natural images and videos. When applied to images, the algorithm learns retinal ganglion cell properties, and when applied to sounds, it learns auditory nerve properties. The algorithm is also used to learn hierarchical representations in a model called RICA, which learns simple cell properties in early visual areas. RICA can be used for face and object recognition tasks.
Dr. Miao Kang proposes a new method called Modal Learning that adapts both the neuron activation functions and connection weights in a neural network. This approach aims to overcome limitations of single layer networks, significantly speed up learning, and simplify hardware requirements compared to traditional neural network approaches. Key contributions include ADFUNN, a single layer network that solves linearly inseparable problems faster and more simply than previous methods, and MADFUNN, a multi-layer version that achieves highly efficient results on large datasets compared to related works.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
soft computing BTU MCA 3rd SEM unit 1 .pptxnaveen356604
This document discusses hard computing and soft computing. Hard computing uses deterministic algorithms and mathematical models to produce accurate and predictable results, while soft computing can handle imprecision, uncertainty, and ambiguity. Soft computing techniques include fuzzy logic, neural networks, genetic algorithms, probabilistic reasoning, and evolutionary computation. These techniques aim to mimic human-like reasoning by tolerating uncertainty, learning and adapting, and integrating multiple methods. Examples of evolutionary computation algorithms provided are genetic algorithms, genetic programming, evolutionary strategies, differential evolution, and particle swarm optimization. Neural networks, ant colony optimization, and fuzzy logic are also summarized.
The document discusses artificial neural networks and machine learning. It describes different learning paradigms like supervised, unsupervised, and reinforcement learning. It also discusses factors that affect neural network performance such as transfer functions, training set size, network topology, and weight adjustment algorithms. Applications of neural networks include function approximation, classification, and data processing.
Similar to TASK-OPTIMIZED DEEP NEURAL NETWORK TO REPLICATE THE HUMAN AUDITORY CORTEX (20)
This presentation is all about counters, focusing on synchronous and asynchronous counters. The unique feature is the incorporation of the circuit images generated from MULTISIM software imparting practical knowledge to the users.
SEQUENTIAL LOGIC CIRCUITS (FLIP FLOPS AND LATCHES)Sairam Adithya
this presentation is about the sequential logic circuits, mainly concentrating on flip-flops and latches. a unique feature in this presentation is the incorporation of circuit images generated from Multisim software imparting practical knowledge to the users. this consists of both the active low and high versions of different circuits.
Medical waste segregation using deep learningSairam Adithya
This is a project that I have made using CNN and web development. This project can detect the type of medical waste along with the suitable color bin and some relevant information about its disposal.
this is the last presentation in the OpenCV series. this presentation is about the inculcation of different shapes into the given image. It also includes automated shapes using haarcascades. tasks like face detection, face blocking, eye detection, eye blocking, smile detection, smile blocking and so on are displayed in this presentation. the code along with the output images are displayed in the presentation. Hope this presentation helps!!!.
Continuing the presentation series, the fourth part is about the blurring and sharpening of images. the manual method of doing the operations is given along with some functions for blurring. the next is about edge detection algorithms like Canny, Sobel, and Prewitt. also, the dilates and the eroded images are provided along with the canny ones.
I HAVE WORKED HARD FOR THIS PRESENTATION!! SO PLEASE SUPPORT GUYS!!!
The document discusses several basic image processing operations in OpenCV including flipping, rotating, resizing, cropping, and extracting color channels. Flipping uses cv2.flip() and takes an image and flip direction. Rotation uses cv2.rotate() and takes an image and rotation angle. Resizing uses cv2.resize() and takes an image and new dimensions. Cropping extracts a region of an image by specifying dimensions. Color channel extraction uses cv2.split() and cv2.merge() with NumPy arrays of zeros to isolate individual color channels in BGR order.
this presentation is about colormaps. the definition of colormap with the syntax for the function of applying colormaps is provided. the names for the 22 standard colormaps along with their indices are also provided. the code and output image for each of the colormap are also provided.
This is the first part of the presentation series on one of the powerful open sources libraries, the opencv. this presentation is about the introduction, installation, some basic functions on images and some basic image processing on the images
This presentation is about the introduction to Diabetes Mellitus. This lifestyle disease has become common in the current generation. This presentation is about diabetes, its classification, the definition of DM, individual types with causes, events, changes, symptoms and treatments.
Detection of medical instruments project- PART 2Sairam Adithya
this presentation is a continuation of the previous one. In this presentation, the work process for individual steps has been clearly explained with snippets of code taken from the source code. This is present along with output visualization, advantages and conclusion.
Detection of medical instruments project- PART 1Sairam Adithya
this presentation is about a project done by me and my colleague related to computer vision. This project is used to classify the uploaded images of biomedical instruments into prominent ones like ECG, EEG, x-ray machine, CT, MRI, and so on. A website has been developed on which the user can upload any image he is unknown of and the model will tell what instrument it is along with a paragraph explaining the instrument in a crisp manner
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
This presentation was provided by Rebecca Benner, Ph.D., of the American Society of Anesthesiologists, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
2. AN OVERVIEW OF TASK-
OPTIMIZED NEURAL
NETWORK TO REPLICATE
HUMAN AUDITORY
BEHAVIOUR
3.
Introduction to the article and some insights on it
Mechanism of hearing by the ear and brain
Organisation in the auditory cortex
Earlier models on cortex and need for a new one
Deep-learning model based on 6 contexts
The method of generating cortical responses by model and
traditional method with comparisons
Findings and inferences through different perspectives and
analyses
Achievements by the model
Added features in model in comparison to its precursors
Disadvantages of model and future directions
ROAD MAP
5.
This paper is about the development of a
Convolutional Neural Network (CNN) model which
is task optimised to perform some real-world
auditory tasks.
This model can be used to understand the
architecture of human auditory system. (why need
model?)
This model also generates the fMRI voxel responses
throughout auditory cortex better than the standard
method of using spectrotemporal filter.
The model provides details about the primary and
non-primary responses
Introduction
6.
There are two types of auditory responses namely
the primary and non-primary responses.
PRIMARY RESPONSE is the response obtained from
cochlea and the primary pathway (purely auditory).
This represents the primary auditory cortex (A1).
Carry out simpler tasks. (reason given later)
NON-PRIMARY RESPONSE is the response
obtained from the non-primary pathway (mixed
senses). This represents the regions beyond the
auditory cortex. Carry out complex tasks
Auditory responses
7.
The neuronal processing occurring the human ear
transforms sound into cortical representations.
They render out behaviourally important sounds
explicit (like ampilfier)
The organisation of the human auditory cortex
remains unsolved and there are no competent
models to explain the process this process of
transformation of auditory sounds to representations
Ear mechanism
8.
This question was debatable with researches
favouring each sides (distributive or hierarchical).
Formisano and Staeren proposed an anatomical
distributive organisation
There is a tripartite hierarchical organisation seen in
non-human type animals which carry out simple
tasks.
However, these cannot confirm whether the auditory
cortex is distributive or hierarchical.
Organisation of auditory
cortex
10.
Some early models used linear filtering of
cochleogram (sound in image format) using 1-2
stages.
However the process of transformation is non-linear.
So those methods failed to address the purpose.
So it is essential to develop models which carry out
non-linear functions.
Model has to provide answer for the organisation
and the transformation.
Early models and cause of failure
11. Sl.no Name Description Our case scenario
1 Data Type of data provided to
input
Cochleogram
2 Task Operation required to do on
the input
Classification(multiple)
Regression(prediction)
3 Model The mathematical relation
between input and output.
This varies based on the task
and complexity and may
involve layers
CNN (Convolutional Neural
Network)
4 Error Kind of a compiler which
finds error between two
different quantities
Comparison of the model’s
classification with human’s
classification
5 Algorithm A kind of learning procedure
which tries to reduce the
error computed before
Stochastic Gradient descent
6 Evaluation Finding how good the model
has performed
Comparison with human
behaviour
12.
The data for which the CNN is trained is known as
cochleogram.
The cochleogram is the
visual representation of the
sound signals.
The cochleogram is a
spectro-temporal
representation of speech.
A 2-second sound signal is
taken as input.
Data
13.
There are two tasks to be performed namely word
identification and music genre recognition
The task is made difficult by introducing
background noises with the music/word sound
The task is to find one word out of the 587 or to find
one genre out of 41 categories.
Also the model produces cortical responses.
Task
14.
15.
The model contains Convolution, Pooling, Dense,
Filter response normalisation and Dropout layers.
It is a hierarchical model. The layers present in
CNN (convolution and pooling) perform non-
linear operations.
The model had five convolutional, three pooling,
two normalization, and two fully connected layers.
The processing (7 shared layers) are same for both
but have different FC layers(5 different). So models
parameter reduces by half.
The hyper parameters were task optimized.
Model
16.
This model was derived from two-steps
First step involved 180 architectures each being 12
layered and single tasked
The second step involved 7 architectures of 12 layers
and dual tasked.
Model selection
17.
In order to evaluate the likeliness of the models
response with that of the human, the model is
compared with that of the human
For WORD IDENTIFICATION, the human is
allowed to use an UI which will auto-complete the
word (to ensure that it belongs to one of 587 classes)
For GENRE IDENTIFICATION, the human is
allowed to list down five preferences of genre (top 5).
The error here is the wrong predictions.
A interesting feature observed was that the model
made error pattern like human.
Error
18.
The algorithm used here is the stochastic gradient
descent.
The role of the algorithm is to find the optimum
values of the parameters such that the loss is very
less (theoretically 0)
The word stochastic refers to the way of taking the
input)- one at a time is stochastic
The gradient descent refers to the attempt of
reducing the gradient by finding the local minima of
the gradient
Algorithm
19.
The confusion matrix is used to evaluate the performance
of the model in the genre recognition task. (41 classes)
The confusion matrix is matrix
with rows and columns equal to
classes and it compares the truth
with model prediction and has 4 fields.
The same can be plotted for
word identification but the graph
will be erroneous due to 587 classes.
Evaluation
21.
The next task to be done by the model is to generate fMRI
voxel responses throughout auditory cortex. In short, it
has to produce cortical responses.
The voxel is a single unit block in a 3-D image (mine
craft).
The data used here are 165 natural sounds heard
regularly in which 52 were words and music.
The model was trained for these sounds and the voxels
generated for each of these sounds were collected.
These were compared with the standard method of
spectrotemporal filter
Cortical responses
22.
Listening and hearing…..
An important process in the processing of the auditory
signals is the ‘attention’.
Taking in the required signal and eliminating the rest
unwanted ones.
Hence a filter is formed inside the auditory cortex with two
functions. Like neurons which respond maximally to given
input frequencies.
To incorporate information about both the timing (rhythm)
and the frequency content of the relevant auditory stimulus
stream.
To enhance the sensory representation of attended stimuli
along these two feature dimensions.
Spectrotemporal filters
23.
The response/prediction from each of the layer in the time-
averaged model was taken into consideration.
This is done by using the linear
regression, by using the ‘linear’
activation function in each of
the layers.
The predictions from each layer
were linearly combined to artificially
create a ‘voxel’.
As a result, we have a voxel’s response
for all 165 sounds from all layers.
The BOLD curve looks inactive for 2-s,
hence the average is used.
Method of extraction
24.
The comparisons were made using four elements:-
The trained model with perfect weights
The untrained model with random weights
The traditional spectrotemporal filter model
The random model from selection
Comparisons
25.
The comparison must be done with the truth
The truth is obtained by feeding the same to a fMRI
machine to get
the voxels
At first, the BOLD variance for all the 4 methods
This was done for correcting both the reliability of the
measured voxel response and the predicted voxel
response
The comparisons for made on all voxels and some
specified voxels.
As expected, the trained model has high variance and was
better than spectrotemporal model and untrained one.
BOLD variance
26.
Then the median variance was taken for the same
The trained model (70%) had more variance than the
spectrotemporal filter (55%)
The filter model had the highest number of
parameters it can withstand.
And it eventually saturated.
The untrained and random model was worse than
the trained model and spectrotemporal filter model.
The trained model had the highest variance on all
ROI and proved to be better than traditional one.
Median variance
28.
The trained deep learning model performed the best
and was far better than the spectrotemporal one.
The reason for this improved voxel response is due
to the hierarchical organisation of the model.
The convolution and the pooling layers of the model
produced a receptive field (spectrum of signals)
similar to that of the cortical system.
Also the model performed better than the
spectrotemporal model in the region of interests.
Findings
29.
So this says that the model is able to respond to the
natural sounds better than that of the
spectrotemporal filter model throughout the
auditory cortex
This is due to the hierarchical organisation of the
model.
The task optimization has resulted in a good cortical
response
Inference
30.
The responses obtained from the later layers of the
network were non-linear when compared to other
layers.
So in order to assess this property, it is essential to
compare the response from each layer of the model.
The median variance for individual layers were
taken into consideration for comparison.
And based on these, there were some important
findings which lead to some inferences about the
organisation of the human auditory system.
Procedure to assess the
hierarchical organisation
31.
The median variance increased
for all layers and then deceased
for the last layers.
All layers except the first and
last performed better than the
spectrotemporal filter model
All layers except the last layer
in the trained model had more
variance compared to the untrained
even though their dependencies
with data were the same.
The intermediate layers made the best prediction whereas the final
layers made poor predictions.
Findings
32.
The receptive fields of some of the layers in the network
were similar to that in auditory cortex and this maybe the
reason for their high performance.
The task optimisation has helped in replicating some of
the cortical properties onto the model.
As per the task, the neurons in the final layers involved in
perpetual decisions.
Such neurons maybe present in the auditory cortex but
their organisation maybe not accessible by conventional
fMRI.
Or these might be beyond the auditory cortex either on
other brain lobes
Inferences
33.
Summary map
The variance of the layers were
plotted using special images.
The heatmaps of the variance and
predictions of the individual
layers were mapped onto the
probabilistic map which involves
three anatomically defined
regions of the primary auditory
cortex. This is done for individual
test subject.
The average taken over all subject
is a summary map.
This is relating the model and
human cortex.
The black outlines are the
anatomical regions.
34.
Findings
The intermediate layers best
predicted the voxels and this
constitutes to the primary
auditory cortex(core)
The last layer of the network
constitutes to the region
away from auditory cortex
(non-core) .
The same results were not
seen in an untrained model
with random weights.
Also the same results were
seen when words and music
were removed from training
data.
35.
This gives the reason that the intermediate and the last
layers of network generates primary and non-primary
responses.
Also the intermediate layers perform simpler tasks when
compared to the later layers (reason given later)
The same results were seen i.e. the primary voxel best
from intermediate and non-primary voxel best from last
even when word and music were removed.
This suggests that the hierarchical structure of the model
helped it in generating better cortical responses for
everyday sounds
Inference
36.
These are four functionally defined Region Of
Interests (ROI’s) namely:-
frequency selective
pitch selective
word selective
music selective
Regions of interest
38.
The frequency voxels which were best explained by the
intermediate layers are found early in hierarchy and the
speech voxels which were best explained by the later layers
were found later in the hierarchy.
This can be the reason for which intermediate layer does
simpler function and the later layers perform complex
functions.
As before the untrained network was lower than that of the
trained network and also the spectrotemporal model.
The dependencies did not affect the performance of the
model suggesting that the task optimization was critical to
map the features in the layers to the auditory cortex.
The ROI analysis supports hierarchy organisation
Inference
39.
HENCE BOTH THE MODEL
AND THE HUMAN CORTEX
ARE ORGANISED
HIERARCHICALLY!!
40.
The representation of the acoustic features by the
network were compared with that of the
spectrotemporal model.
To check whether the representations of both models
were linearly decodable.
For this, the data was divided into two subsets for
which the first one was used for mapping and
second for quality checking.
Acoustic features
41.
The ability for the network layers to extract spectral
information from the data decreased
as the layers progressed.
The extraction ability was
constant for the spectrotemporal
model which peaked at the
intermediate layer.
The prediction of the later layer
is worse than the earlier and this
was prominent in the untrained model.
Findings and inference
42.
It is essential that the model performs well on real
world task in order to replicate the auditory cortex
The model was analysed layer-wise on the existing
task and a new speaker identification task for which
model wasn’t trained.
This was done by fixing the weights and optimizing
by using the softmax activation function in the layers
which took output from a previous layer and gave it
to the next layer.
Real-world task performance
43.
Findings
The findings were contrary to
that seen previously
The performance improved
from early to the deeper
layers of the network.
The same level of performance
was seen also in the speaker
identification task except
for final layer.
This suggests that the network
representations are task-
generalised.
(same for most auditory tasks)
44.
All of the previous findings and analyses portray the process of
transformation from cochlea to cortex
The role of the cortex is to transform acoustic features obtained
from the cochlea into meaningful representations and the role
of this transformation is unknown
These analyses suggest that the task-related information which
were not clear/explained in cochlea (implicit) and when these
went to the auditory cortex which transforms into
representations which were well clear/explained (explicit)
In simpler terms, the transformation has provided some
meaning and explanation to the information using which both
the brain and the model figured out the output.
Inference
45.
The input data involved the incorporation of
background noise with the sound signal
They were added at different SNR (Signal to Noise
Ratio)
The analysis done on this constitutes to the SNC
(Signal to Noise Characteristics).
The signals were categorised according to the SNR
and were fed to the network for analysis.
The objective is to find the role of noise in processing
information from the signal.
SNC
46.
The signals with less noise were
well classified by the intermediate
layers as well as the deep layers.
But, the signals with more noise
were well classified by the
deep layers only.
The later layers of the model are
insensitive to noise or they are
noise-immune
Findings and inference
47.
The data used here was the same as of fMRI but the words and
music were excluded (113 samples).
These sounds were divided into
two subsets based on stationarity
(the stability of mean, SD etc.)
They divided the cochleogram
into categories and taking
standard deviation over time.
Then the individual layer
response for the two sets of
sounds were measured.
Later the same was compared with voxels
From the fMRI machine
Noise-stimuli sensitivity
48.
The deep layers of the network trained on these natural sounds
had exhibited a greater
response for the non-stationary
sounds when compared to that
of the stationary sounds.
However the same effect was not
observed in the untrained network.
From the fMRI, the responses to
stationary and non-stationary
responses were similar in the
primary areas (A1), but more response
was seen to non-stationary sounds
in the non-primary areas.
Findings
49.
There is a differentiation between the primary and
the non-primary regions functionally and these
proofs support to that of the similar (intermediate-
primary and deep-non primary)
There is a suppression of sound in the later layers of
the model and in the non-primary regions and hence
this contributed for better response to non-stationary
sounds by the deep layers and non-primary cortex.
This has helped the model to predict responses to
natural sounds even though they were affected with
noise.
Inference
50.
Task-performance
It was found that
networks with better
performance on a real-
world visual object
recognition task better
predict cortical responses
in the visual stream.
To prove the same, 57
different models from
stage-1 were taken at 14
different training points
(798) for either word or
genre task
The median variance was
measured for each layer.
51.
The performance of a network on a task strongly
correlated with the variance it explained in auditory
cortical responses.
The word task had a Spearman correlation of 0.87
and the genre task had a Spearman correlation of
0.85
These results suggests that the task-based
optimization of deep neural networks can help yield
more predictive models of sensory systems.
Continued…
53.
The model performed as good as that of humans in
the task of word recognition and genre identification.
The model produced human-like error patterns.
The task optimization resulted in the model
replicating the auditory cortex in one aspect
(branching of layers for specific tasks).
The predicted fMRI responses throughout the
auditory cortex way better than that of the standard
method (spectrotemporal filter)
Achievements
54.
Task optimization resulted in better cortical responses by
the model, without which the predictions were poor
(untrained model)
Intermediate layers of model predicted the primary
response and deep layers of model predicted non-
primary response.
The model has proven that the organisation of the human
auditory cortex is hierarchical.
The model was general and the hierarchical organisation
and task optimization made it general and powerful.
Continued…
55.
The model had some non-linear operations like
normalization and pooling and this is the reason for
its improved response, as a matter of fact research
says that the inner operations in cortex is non-linear,
the model was better than filter which didn’t have
these features.
An alternative method for evaluating the cortex
organisation was provided by the model (model and
human on same task, both performed same so model
architecture is similar to human)
Continued…
56.
The task optimization resulted in powerful models
which can replicate the visual and auditory system.
The primary visual responses were best given by the
early layers of the model and the primary auditory
responses were best given by the intermediate layers
of the model.
This suggests that the auditory cortex is present
deeper in the computational hierarchy compared to
the visual.
This is in accordance with the fact that the auditory
cortex has more subcortical nuclei.
Comparisons with the visual
system
57.
This deep learning model (12 layers) is deeper when
compared to its ancestral models (2 or 3 layers)
This depth helped in a good representation of complex
real-world tasks and better cortical responses
The branching of network in deep layers as a result of
task optimisation goes in accordance with the fact of
functional segregation in the non-primary cortex.
The model could perform other sound related tasks even
though not trained on them.
The parameters were based only on half of the data and
the model performed better for the untrained data also.
Advantages
58.
The individual units used in the model are less
readily understood.
The choice of task wasn’t so important for analysis of
human cortex. The genre task was taken into
consideration due to readily available large dataset,
but this task had some discrepancies that the task is
culture biased.
The model couldn’t replicate the human in terms of
learning; humans learn by experience and feedback
whereas machine learns by data.
Disadvantages
59.
The model was able to prove that the human cortex has
hierarchical organisation, but an even better one is
required to prove if it is tripartite or not as seen in
animals.
Research says that the auditory cortex has more
subcortical nuclei; this can be proven by predicting the
subcortical responses by the early layers of the model.
Training the model for additional music-related tasks, or
tasks not specific to speech or music, could yield a more
complete model of human behaviour.
Improving the model from the learning point of view can
make the model more correlated to that of the human.
Future updates
60.
REFERENCE…
Kell, A. J. E., Yamins, D. L. K., Shook, E. N., Norman-Haignere, S. V., &
McDermott, J. H. (2018). A Task-Optimized Neural Network Replicates Human
Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing
Hierarchy. Neuron, 98(3), 630–644.e16. doi:10.1016/j.neuron.2018.03.044
**All information have been taken from this research article.**