We, as normal people, have access to a potent communication tool, which is sound. Although we can continuously gather, analyse, and interpret sounds thanks to our sense of hearing, it can be challenging for people with hearing impairment to perceive their surroundings through sound. Also known as PWHI (People with Hearing Impairment). Auditory/phonic impairment is one of the most prevailing sensory deficits in humans at present. Fortunately, there is room to apply a solution to this issue, given the development of technology. Our project involves capturing ambient sounds from the user’s surroundings and notifying the user through a mobile application using IoT and Deep Learning. Its architecture offers sound recognition using a tool, such as a microphone, to capture sounds from the user's surroundings. These sounds are identified and categorized as ambient sounds, like a doorbell, baby cry, and dog barking; as well as emergency-related sounds, such as alarms, sirens, et
Mobile Netw Appl
DOI 10.1007/s11036-009-0217-y
NoiseSPY is a mobile phone application that turns phones into noise sensors. It records sound levels using the phone microphone along with GPS data. This allows users to map noise levels encountered during journeys. Initial trials involved cycling couriers collecting noise data in Cambridge. Indications are the functionality engaged users and aspects like personal data, context, and reflection on data collection were important factors in user interest. The system architecture combines sound level measurements on the phone with transmission of data to a server for aggregation and visualization on an online noise map.
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...IRJET Journal
This document presents a lip reading system that uses 3D convolutional neural networks for audio-video speech recognition. The system recognizes phrases and sentences from silent video input with or without audio. It extracts lip movements from video frames using face tracking and processes audio files separately. 3D CNNs are used to extract spatial and temporal features from the audio and video streams. The system is evaluated on the MIRACLE-VC1 dataset and achieves accuracy comparable to other recent work on visual speech recognition. Future applications of this lip reading system include improved hearing aids, voice generation for those unable to speak, and biometric authentication.
WEARABLE VIBRATION-BASED DEVICE FOR HEARING-IMPAIRED PEOPLE USING ACOUSTIC SC...IRJET Journal
This document describes a wearable device designed to help hearing impaired people by classifying environmental sounds and providing vibration alerts. It uses the Mel Frequency Cepstral Coefficient (MFCC) algorithm to extract features from audio samples. Three classification models - Gaussian Hidden Markov Model (HMM), GMM-HMM, and an ensemble of the three - are used to classify sounds into categories. The top predicted class is used to activate the corresponding vibration motor. The goal is to inform users of important sounds through tactile feedback without relying on others for assistance. The device is intended to improve quality of life for the hearing impaired by helping them understand common sounds in their daily environment.
PREPROCESSING CHALLENGES FOR REAL WORLD AFFECT RECOGNITIONCSEIJJournal
Real world human affect recognition requires immediate attention which is a significant aspect of humancomputer interaction. Audio-visual modalities can make a significant contribution by providing rich
contextual information. Preprocessing is an important step in which the relevant information is extracted.
It has a crucial impact on prominent feature extraction and further processing. The main aim is to
highlight the challenges in preprocessing real world data. The research focuses on experimental testing
and comparative analysis for preprocessing using OpenCV, Single Shot MultiBox Detector (SSD), DLib,
Multi-Task Cascaded Convolutional Neural Networks (MTCNN), and RetinaFace detectors. The
comparative analysis shows that MTCNN and RetinaFace give better performance in real world data. The
performance of facial affect recognition using a pre-trained CNN model is analysed with a lab-controlled
dataset CK+ and a representative wild dataset AFEW. This comparative analysis demonstrates the impact
of preprocessing issues on feature engineering framework in real world affect recognition.
Preprocessing Challenges for Real World Affect Recognition CSEIJJournal
Real world human affect recognition requires immediate attention which is a significant aspect of human-
computer interaction. Audio-visual modalities can make a significant contribution by providing rich
contextual information. Preprocessing is an important step in which the relevant information is extracted.
It has a crucial impact on prominent feature extraction and further processing. The main aim is to
highlight the challenges in preprocessing real world data. The research focuses on experimental testing
and comparative analysis for preprocessing using OpenCV, Single Shot MultiBox Detector (SSD), DLib,
Multi-Task Cascaded Convolutional Neural Networks (MTCNN), and RetinaFace detectors. The
comparative analysis shows that MTCNN and RetinaFace give better performance in real world data. The
performance of facial affect recognition using a pre-trained CNN model is analysed with a lab-controlled
dataset CK+ and a representative wild dataset AFEW. This comparative analysis demonstrates the impact
of preprocessing issues on feature engineering framework in real world affect recognition.
Smart Sound Measurement and Control System for Smart CityIRJET Journal
This document summarizes a research paper that proposes a smart sound measurement and control system for smart cities using Internet of Things technology. The system aims to address issues with existing noise measurement devices, such as only detecting noise in limited nearby areas. The proposed system would use multiple inexpensive sound detection devices connected via WiFi that send sensor data to the cloud to be viewed on mobile devices. This would allow for averaging readings across devices and monitoring noise levels in larger spaces. The system is intended to help authorities better enforce noise regulations and view historical noise data to address noise pollution issues near hospitals, schools and other areas that require quiet environments.
complete seminar report on the topic silent sound technology given by raj niranjan in MCA department of BMS Institute of Technology and Management , avalahalli,bangalore ,karnataka
A review of Noise Suppression Technology for Real-Time Speech EnhancementIRJET Journal
This document summarizes research on noise suppression technology for real-time speech enhancement. It discusses how noise suppression has gained interest due to advances in deep learning techniques. It describes how noise suppression works by using multiple microphones to capture audio signals, which are then processed using algorithms to separate and suppress background noises while enhancing speech. Deep learning has achieved promising results for noise suppression by training models to detect human voice between different input noises. The document also reviews conventional uses of noise suppression in devices and limitations, and how using deep learning allows for more effective separation of noise from sound signals.
Mobile Netw Appl
DOI 10.1007/s11036-009-0217-y
NoiseSPY is a mobile phone application that turns phones into noise sensors. It records sound levels using the phone microphone along with GPS data. This allows users to map noise levels encountered during journeys. Initial trials involved cycling couriers collecting noise data in Cambridge. Indications are the functionality engaged users and aspects like personal data, context, and reflection on data collection were important factors in user interest. The system architecture combines sound level measurements on the phone with transmission of data to a server for aggregation and visualization on an online noise map.
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...IRJET Journal
This document presents a lip reading system that uses 3D convolutional neural networks for audio-video speech recognition. The system recognizes phrases and sentences from silent video input with or without audio. It extracts lip movements from video frames using face tracking and processes audio files separately. 3D CNNs are used to extract spatial and temporal features from the audio and video streams. The system is evaluated on the MIRACLE-VC1 dataset and achieves accuracy comparable to other recent work on visual speech recognition. Future applications of this lip reading system include improved hearing aids, voice generation for those unable to speak, and biometric authentication.
WEARABLE VIBRATION-BASED DEVICE FOR HEARING-IMPAIRED PEOPLE USING ACOUSTIC SC...IRJET Journal
This document describes a wearable device designed to help hearing impaired people by classifying environmental sounds and providing vibration alerts. It uses the Mel Frequency Cepstral Coefficient (MFCC) algorithm to extract features from audio samples. Three classification models - Gaussian Hidden Markov Model (HMM), GMM-HMM, and an ensemble of the three - are used to classify sounds into categories. The top predicted class is used to activate the corresponding vibration motor. The goal is to inform users of important sounds through tactile feedback without relying on others for assistance. The device is intended to improve quality of life for the hearing impaired by helping them understand common sounds in their daily environment.
PREPROCESSING CHALLENGES FOR REAL WORLD AFFECT RECOGNITIONCSEIJJournal
Real world human affect recognition requires immediate attention which is a significant aspect of humancomputer interaction. Audio-visual modalities can make a significant contribution by providing rich
contextual information. Preprocessing is an important step in which the relevant information is extracted.
It has a crucial impact on prominent feature extraction and further processing. The main aim is to
highlight the challenges in preprocessing real world data. The research focuses on experimental testing
and comparative analysis for preprocessing using OpenCV, Single Shot MultiBox Detector (SSD), DLib,
Multi-Task Cascaded Convolutional Neural Networks (MTCNN), and RetinaFace detectors. The
comparative analysis shows that MTCNN and RetinaFace give better performance in real world data. The
performance of facial affect recognition using a pre-trained CNN model is analysed with a lab-controlled
dataset CK+ and a representative wild dataset AFEW. This comparative analysis demonstrates the impact
of preprocessing issues on feature engineering framework in real world affect recognition.
Preprocessing Challenges for Real World Affect Recognition CSEIJJournal
Real world human affect recognition requires immediate attention which is a significant aspect of human-
computer interaction. Audio-visual modalities can make a significant contribution by providing rich
contextual information. Preprocessing is an important step in which the relevant information is extracted.
It has a crucial impact on prominent feature extraction and further processing. The main aim is to
highlight the challenges in preprocessing real world data. The research focuses on experimental testing
and comparative analysis for preprocessing using OpenCV, Single Shot MultiBox Detector (SSD), DLib,
Multi-Task Cascaded Convolutional Neural Networks (MTCNN), and RetinaFace detectors. The
comparative analysis shows that MTCNN and RetinaFace give better performance in real world data. The
performance of facial affect recognition using a pre-trained CNN model is analysed with a lab-controlled
dataset CK+ and a representative wild dataset AFEW. This comparative analysis demonstrates the impact
of preprocessing issues on feature engineering framework in real world affect recognition.
Smart Sound Measurement and Control System for Smart CityIRJET Journal
This document summarizes a research paper that proposes a smart sound measurement and control system for smart cities using Internet of Things technology. The system aims to address issues with existing noise measurement devices, such as only detecting noise in limited nearby areas. The proposed system would use multiple inexpensive sound detection devices connected via WiFi that send sensor data to the cloud to be viewed on mobile devices. This would allow for averaging readings across devices and monitoring noise levels in larger spaces. The system is intended to help authorities better enforce noise regulations and view historical noise data to address noise pollution issues near hospitals, schools and other areas that require quiet environments.
complete seminar report on the topic silent sound technology given by raj niranjan in MCA department of BMS Institute of Technology and Management , avalahalli,bangalore ,karnataka
A review of Noise Suppression Technology for Real-Time Speech EnhancementIRJET Journal
This document summarizes research on noise suppression technology for real-time speech enhancement. It discusses how noise suppression has gained interest due to advances in deep learning techniques. It describes how noise suppression works by using multiple microphones to capture audio signals, which are then processed using algorithms to separate and suppress background noises while enhancing speech. Deep learning has achieved promising results for noise suppression by training models to detect human voice between different input noises. The document also reviews conventional uses of noise suppression in devices and limitations, and how using deep learning allows for more effective separation of noise from sound signals.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system provided accurate voice recognition under various conditions.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system can accurately recognize commands and control devices with 99% accuracy under suitable conditions.
A survey on Enhancements in Speech RecognitionIRJET Journal
This document discusses enhancements in speech recognition and provides an overview of the history and basic model of speech recognition. It summarizes key enhancements researchers have made to improve speech recognition, especially in noisy environments. The basic model of speech recognition involves speech input, preprocessing using techniques like MFCCs, classification models like RNNs and HMMs, and output of a transcript. Researchers are working to develop robust speech recognition that can understand speech in any environment.
A Deep Learning Model For Crime Surveillance In Phone Calls.vivatechijri
Public surveillance Systems are creating ground-breaking impact to secure lives, ensuring public safety with the interventions that will curb crime and improve well-being of communities, law enforcement agencies are employing and deploying tools like CCTV for video surveillance in banks, residential societies shopping malls,markets,roads almost everywhere to detect where and who was responsible for events like robbery, rough driving, insolence, murder etc. Other surveillance system tools like phone tapping is used to plot an event to catch a suspected person who is threatening an innocent or conspiring a violent activity over a phone call which is being done voluntarily by authorities . Now analyzing both the scenarios in the case of CCTV the crime(robbery,murder) has already occured and in the case of phone tapping there must be information in advance about the suspected person who is going to commit a crime, Now to overcome this issue a system is proposed to know in advance who is suspected and what conspiracy is being done over phone calls as well to detect it automatically and report it to law enforcemet authorities.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Cochlear implant systems translate acoustic information into electrical impulses to recreate functional hearing. Oticon Medical's new Neuro sound processors use "Coordinated Adaptive Processing" to automatically balance and coordinate advanced sound processing features for optimal speech understanding across environments. This approach captures a wide range of input sounds without compression while using environment detection, directionality, noise reduction, and output compression tailored for each situation. The goal is to provide the richest sound experience possible.
This document describes the PROSAMBA project, a partnership between Brazil and Germany applying microphone array signal processing techniques. The project focuses on fundamental research into blind source separation schemes and their application to safety, social inclusion of the hearing impaired, monitoring the Amazon forest, and developing communication devices. Key areas of research include subband adaptive filtering structures for speech separation, multi-dimensional array processing, and an ICA method called TRINICON that exploits nonwhiteness, nonstationarity, and nongaussianity of signals. Applications developed through the project include a sniper detector, intelligent hearing aids, environmental monitoring of animals and fires, and hands-free communication devices. The project is coordinated in Germany by Prof. Dr. Walter Keller
Teleconferencing plugin for noisy environmentIAMCP MENTORING
The Conference Denoiser plugin integrates with a client-side of teleconferencing system on notebooks, MacBook, tablets, smartphones. It adjusts sound levels according to surrounding noises, personal hearing profile, suppress noises, and protect hearing.
We offer integration for teleconference systems vendors.
This document describes a tool for annotating complex soundscapes. The tool segments sound recordings into regions likely from single sources using time-frequency analysis. It then extracts feature vectors from these regions to classify them. During annotation, the tool learns and suggests classifications to speed the process and improve over time. An initial pilot on outdoor recordings showed the tool could correctly classify 75% of regions. The goal is to develop an assisted annotation process that approaches human-level performance for training sound recognition systems.
Sensory Aids for Persons with Auditory ImpairmentsDamian T. Gordon
There are two basic approaches to assistive technology for auditory impairments: augmentation of existing pathways and use of alternative pathways. Augmentation involves amplifying sounds to improve reception of certain frequencies for those who are hard of hearing. Alternative pathways include tactile substitution, where sounds are felt through touch, and visual substitution, using displays to represent sounds. Examples of aids discussed include hearing aids, cochlear implants, visual alarms, and computerized speech and text.
This document presents information on silent sound technology. It discusses how the technology was developed at the Karlsruhe Institute of Technology in Germany to detect lip movements and convert them into sound signals without actual sounds. It works by using electromyography to measure facial muscle activity and image processing of lip movements. The technology allows for silent phone calls and could benefit those who have lost their voice or need to communicate quietly. Future applications include use in space and by the military, with continued research seeking to improve the technology.
This document discusses a fast algorithm for noisy speaker recognition using artificial neural networks. It summarizes the following:
- The algorithm first records voice patterns from speakers via a noisy channel and applies noise removal techniques.
- It then extracts features using Mel Frequency Cepstral Coefficients (MFCC) and reduces features using Principal Component Analysis.
- The reduced feature vectors are classified using an artificial neural network classifier.
- Experimental results showed the proposed algorithm achieved about 99% accuracy on average, which is higher than other methods, and was also faster.
Classification of Language Speech Recognition Systemijtsrd
This paper is aimed to implement Classification of Language Speech Recognition System by using feature extraction and classification. It is an Automatic language Speech Recognition system. This system is a software architecture which outputs digits from the input speech signals. The system is emphasized on Speaker Dependent Isolated Word Recognition System. To implement this system, a good quality microphone is required to record the speech signals. This system contains two main modules feature extraction and feature matching. Feature extraction is the process of extracting a small amount of data from the voice signal that can later be used to represent each speech signal. Feature matching involves the actual procedure to identify the unknown speech signal by comparing extracted features from the voice input of a set of known speech signals and the decision making process. In this system, the Mel frequency Cepstrum Coefficient MFCC is used for feature extraction and Vector Quantization VQ which uses the LBG algorithm is used for feature matching. Khin May Yee | Moh Moh Khaing | Thu Zar Aung "Classification of Language Speech Recognition System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26546.pdfPaper URL: https://www.ijtsrd.com/computer-science/speech-recognition/26546/classification-of-language-speech-recognition-system/khin-may-yee
Optimized audio classification and segmentation algorithm by using ensemble m...Venkat Projects
The document proposes an optimized audio classification and segmentation algorithm that segments audio streams into four types - pure speech, music, environment sound, and silence - using ensemble methods. It uses a hybrid classification approach of bagged support vector machines and artificial neural networks. The algorithm aims to accurately segment audio with minimum misclassification and requires less training data, making it suitable for real-time applications. It segments non-speech portions into music or environment sound and further divides speech into silence and pure speech. The algorithm achieves approximately 98% accurate segmentation.
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
This document reviews techniques for speech-based emotion detection using deep learning. It discusses how deep learning techniques have been proposed as an alternative to traditional methods for speech emotion recognition. Feature extraction is an important part of speech emotion recognition, and deep learning can help minimize the complexity of extracted features. The document surveys related work on speech emotion recognition using techniques like deep neural networks, convolutional neural networks, recurrent neural networks, and more. It examines the limitations of current approaches and the potential for deep learning to improve speech-based emotion detection.
1) Silent Sound Technology allows for communication without speaking by detecting lip movements and converting them to electrical signals that are then translated into sound signals.
2) It uses electromyography to monitor muscle movements in the face during speech and image processing of lip movements. The signals are then converted to speech.
3) Potential applications include silent communication in noisy places, aiding those who have lost their voice, and transmitting confidential information privately. However, it still faces restrictions related to accuracy and practical usability.
Assistive technology for hearing and speech disordersAlexander Decker
This document discusses assistive technologies for people with hearing and speech disorders. It describes three categories of assistive technologies: hearing technologies, alerting devices, and communication supports. Hearing technologies include hearing aids, frequency modulation systems, infrared systems, audio loop systems, amplified phones, and amplified stethoscopes. Alerting devices provide visual or tactile alerts for sounds like smoke alarms, alarm clocks, doorbells, and telephones. Communication supports facilitate communication, including telecommunication devices for the deaf, captioned phones, computers with webcams, video phones, and software that converts speech to text or sign language. The goal of these assistive technologies is to improve accessibility to information for those with hearing or speech impairments
The SOPRANO project aims to develop context-aware smart home services to support independent living for elderly people. A consortium of 25 partners from 8 countries is working on ambient assisted living technologies like sensors, actuators, avatars and a middleware platform. The project will be evaluated through demonstrations and field trials in 80 homes across 3 countries to validate how the services can help with medication compliance, fall detection, exercise and entertainment.
A CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATIONijasa
Social media platform has greatly enhanced human interactive activities in the virtual community. Virtual
socialization has positively influenced social bonding among social media users irrespective of one’s
location in the connected global village. Human user and social bot user are the two types of social media
users. While human users personally operate their social media accounts, social bot users are developed
software that manages a social media account for the human user called the botmaster. This botmaster in
most cases are hackers with bad intention of attacking social media users through various attacking mode
using social bots. The aim of this research work is to design an intelligent framework that will prevent
attacks through social bots on social media network platforms.
DESIGN OF A MINIATURE RECTANGULAR PATCH ANTENNA FOR KU BAND APPLICATIONSijasa
A significant portion of communication devices employs microstrip antennas because of their compact size,
low profile, and ability to conform to both planar and non-planar surfaces. To achieve this, we present a
miniature inset-fed rectangular patch antenna using partial ground plane for Ku band applications. The
proposed antenna design used an operating frequency of 15.5 GHz, a FR4 substrate with a dielectric
constant of 4.3, and a thickness of 1.4 mm. It is fed by a 50 Ω inset feedline. Computer simulation
technology (CST) software is used to design, simulate, and analyze. The simulation yields the antenna
performance parameters, including return loss (S11), bandwidth, VSWR, gain, directivity, and radiation
efficiency. The simulation findings revealed that the proposed antenna resonated at 15.5 GHz, with a
return loss of -22.312 dB, a bandwidth of 2.73 GHz (2730 MHz), VSWR of 1.17, a gain of 3.843 dBi, a
directivity of 5.926 dBi, and an antenna efficiency of -2.083 dB (61.901%).
More Related Content
Similar to SMART SOUND SYSTEM APPLIED FOR THE EXTENSIVE CARE OF PEOPLE WITH HEARING IMPAIRMENT
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system provided accurate voice recognition under various conditions.
Voice Recognition Based Automation System for Medical Applications and for Ph...IRJET Journal
This document describes a voice recognition-based automation system for medical applications and physically challenged patients. The system uses a voice recognition model, Arduino microcontroller, relays, LEDs, buzzers, and a motor to control an adjustable bed. Voice commands are recognized using techniques like MFCC and HMM and used to control devices via the Arduino. The system is intended to allow paralyzed patients to control devices like lights, alarms, and their bed using only voice commands for increased independence. Testing showed the system can accurately recognize commands and control devices with 99% accuracy under suitable conditions.
A survey on Enhancements in Speech RecognitionIRJET Journal
This document discusses enhancements in speech recognition and provides an overview of the history and basic model of speech recognition. It summarizes key enhancements researchers have made to improve speech recognition, especially in noisy environments. The basic model of speech recognition involves speech input, preprocessing using techniques like MFCCs, classification models like RNNs and HMMs, and output of a transcript. Researchers are working to develop robust speech recognition that can understand speech in any environment.
A Deep Learning Model For Crime Surveillance In Phone Calls.vivatechijri
Public surveillance Systems are creating ground-breaking impact to secure lives, ensuring public safety with the interventions that will curb crime and improve well-being of communities, law enforcement agencies are employing and deploying tools like CCTV for video surveillance in banks, residential societies shopping malls,markets,roads almost everywhere to detect where and who was responsible for events like robbery, rough driving, insolence, murder etc. Other surveillance system tools like phone tapping is used to plot an event to catch a suspected person who is threatening an innocent or conspiring a violent activity over a phone call which is being done voluntarily by authorities . Now analyzing both the scenarios in the case of CCTV the crime(robbery,murder) has already occured and in the case of phone tapping there must be information in advance about the suspected person who is going to commit a crime, Now to overcome this issue a system is proposed to know in advance who is suspected and what conspiracy is being done over phone calls as well to detect it automatically and report it to law enforcemet authorities.
In this paper we present the implementation of speaker identification system using artificial neural network
with digital signal processing. The system is designed to work with the text-dependent speaker
identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using
an audio wave recorder. The speech features are acquired by the digital signal processing technique. The
identification of speaker using frequency domain data is performed using backpropagation algorithm.
Hamming window and Blackman-Harris window are used to investigate better speaker identification
performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
This document summarizes a research paper on speaker identification using artificial neural networks. The paper presents a speaker identification system that uses digital signal processing and ANN techniques. Speech features are extracted from utterances using FFT and windowing. These features are used to train a multi-layer perceptron network to classify speakers. The system was tested on Bangla speech and achieved accurate identification of speakers from their utterances.
Utterance Based Speaker Identification Using ANNIJCSEA Journal
In this paper we present the implementation of speaker identification system using artificial neural network with digital signal processing. The system is designed to work with the text-dependent speaker identification for Bangla Speech. The utterances of speakers are recorded for specific Bangla words using an audio wave recorder. The speech features are acquired by the digital signal processing technique. The identification of speaker using frequency domain data is performed using back propagation algorithm. Hamming window and Blackman-Harris window are used to investigate better speaker identification performance. Endpoint detection of speech is developed in order to achieve high accuracy of the system.
Cochlear implant systems translate acoustic information into electrical impulses to recreate functional hearing. Oticon Medical's new Neuro sound processors use "Coordinated Adaptive Processing" to automatically balance and coordinate advanced sound processing features for optimal speech understanding across environments. This approach captures a wide range of input sounds without compression while using environment detection, directionality, noise reduction, and output compression tailored for each situation. The goal is to provide the richest sound experience possible.
This document describes the PROSAMBA project, a partnership between Brazil and Germany applying microphone array signal processing techniques. The project focuses on fundamental research into blind source separation schemes and their application to safety, social inclusion of the hearing impaired, monitoring the Amazon forest, and developing communication devices. Key areas of research include subband adaptive filtering structures for speech separation, multi-dimensional array processing, and an ICA method called TRINICON that exploits nonwhiteness, nonstationarity, and nongaussianity of signals. Applications developed through the project include a sniper detector, intelligent hearing aids, environmental monitoring of animals and fires, and hands-free communication devices. The project is coordinated in Germany by Prof. Dr. Walter Keller
Teleconferencing plugin for noisy environmentIAMCP MENTORING
The Conference Denoiser plugin integrates with a client-side of teleconferencing system on notebooks, MacBook, tablets, smartphones. It adjusts sound levels according to surrounding noises, personal hearing profile, suppress noises, and protect hearing.
We offer integration for teleconference systems vendors.
This document describes a tool for annotating complex soundscapes. The tool segments sound recordings into regions likely from single sources using time-frequency analysis. It then extracts feature vectors from these regions to classify them. During annotation, the tool learns and suggests classifications to speed the process and improve over time. An initial pilot on outdoor recordings showed the tool could correctly classify 75% of regions. The goal is to develop an assisted annotation process that approaches human-level performance for training sound recognition systems.
Sensory Aids for Persons with Auditory ImpairmentsDamian T. Gordon
There are two basic approaches to assistive technology for auditory impairments: augmentation of existing pathways and use of alternative pathways. Augmentation involves amplifying sounds to improve reception of certain frequencies for those who are hard of hearing. Alternative pathways include tactile substitution, where sounds are felt through touch, and visual substitution, using displays to represent sounds. Examples of aids discussed include hearing aids, cochlear implants, visual alarms, and computerized speech and text.
This document presents information on silent sound technology. It discusses how the technology was developed at the Karlsruhe Institute of Technology in Germany to detect lip movements and convert them into sound signals without actual sounds. It works by using electromyography to measure facial muscle activity and image processing of lip movements. The technology allows for silent phone calls and could benefit those who have lost their voice or need to communicate quietly. Future applications include use in space and by the military, with continued research seeking to improve the technology.
This document discusses a fast algorithm for noisy speaker recognition using artificial neural networks. It summarizes the following:
- The algorithm first records voice patterns from speakers via a noisy channel and applies noise removal techniques.
- It then extracts features using Mel Frequency Cepstral Coefficients (MFCC) and reduces features using Principal Component Analysis.
- The reduced feature vectors are classified using an artificial neural network classifier.
- Experimental results showed the proposed algorithm achieved about 99% accuracy on average, which is higher than other methods, and was also faster.
Classification of Language Speech Recognition Systemijtsrd
This paper is aimed to implement Classification of Language Speech Recognition System by using feature extraction and classification. It is an Automatic language Speech Recognition system. This system is a software architecture which outputs digits from the input speech signals. The system is emphasized on Speaker Dependent Isolated Word Recognition System. To implement this system, a good quality microphone is required to record the speech signals. This system contains two main modules feature extraction and feature matching. Feature extraction is the process of extracting a small amount of data from the voice signal that can later be used to represent each speech signal. Feature matching involves the actual procedure to identify the unknown speech signal by comparing extracted features from the voice input of a set of known speech signals and the decision making process. In this system, the Mel frequency Cepstrum Coefficient MFCC is used for feature extraction and Vector Quantization VQ which uses the LBG algorithm is used for feature matching. Khin May Yee | Moh Moh Khaing | Thu Zar Aung "Classification of Language Speech Recognition System" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26546.pdfPaper URL: https://www.ijtsrd.com/computer-science/speech-recognition/26546/classification-of-language-speech-recognition-system/khin-may-yee
Optimized audio classification and segmentation algorithm by using ensemble m...Venkat Projects
The document proposes an optimized audio classification and segmentation algorithm that segments audio streams into four types - pure speech, music, environment sound, and silence - using ensemble methods. It uses a hybrid classification approach of bagged support vector machines and artificial neural networks. The algorithm aims to accurately segment audio with minimum misclassification and requires less training data, making it suitable for real-time applications. It segments non-speech portions into music or environment sound and further divides speech into silence and pure speech. The algorithm achieves approximately 98% accurate segmentation.
A Review Paper on Speech Based Emotion Detection Using Deep LearningIRJET Journal
This document reviews techniques for speech-based emotion detection using deep learning. It discusses how deep learning techniques have been proposed as an alternative to traditional methods for speech emotion recognition. Feature extraction is an important part of speech emotion recognition, and deep learning can help minimize the complexity of extracted features. The document surveys related work on speech emotion recognition using techniques like deep neural networks, convolutional neural networks, recurrent neural networks, and more. It examines the limitations of current approaches and the potential for deep learning to improve speech-based emotion detection.
1) Silent Sound Technology allows for communication without speaking by detecting lip movements and converting them to electrical signals that are then translated into sound signals.
2) It uses electromyography to monitor muscle movements in the face during speech and image processing of lip movements. The signals are then converted to speech.
3) Potential applications include silent communication in noisy places, aiding those who have lost their voice, and transmitting confidential information privately. However, it still faces restrictions related to accuracy and practical usability.
Assistive technology for hearing and speech disordersAlexander Decker
This document discusses assistive technologies for people with hearing and speech disorders. It describes three categories of assistive technologies: hearing technologies, alerting devices, and communication supports. Hearing technologies include hearing aids, frequency modulation systems, infrared systems, audio loop systems, amplified phones, and amplified stethoscopes. Alerting devices provide visual or tactile alerts for sounds like smoke alarms, alarm clocks, doorbells, and telephones. Communication supports facilitate communication, including telecommunication devices for the deaf, captioned phones, computers with webcams, video phones, and software that converts speech to text or sign language. The goal of these assistive technologies is to improve accessibility to information for those with hearing or speech impairments
The SOPRANO project aims to develop context-aware smart home services to support independent living for elderly people. A consortium of 25 partners from 8 countries is working on ambient assisted living technologies like sensors, actuators, avatars and a middleware platform. The project will be evaluated through demonstrations and field trials in 80 homes across 3 countries to validate how the services can help with medication compliance, fall detection, exercise and entertainment.
Similar to SMART SOUND SYSTEM APPLIED FOR THE EXTENSIVE CARE OF PEOPLE WITH HEARING IMPAIRMENT (20)
A CONCEPTUAL FRAMEWORK OF A DETECTIVE MODEL FOR SOCIAL BOT CLASSIFICATIONijasa
Social media platform has greatly enhanced human interactive activities in the virtual community. Virtual
socialization has positively influenced social bonding among social media users irrespective of one’s
location in the connected global village. Human user and social bot user are the two types of social media
users. While human users personally operate their social media accounts, social bot users are developed
software that manages a social media account for the human user called the botmaster. This botmaster in
most cases are hackers with bad intention of attacking social media users through various attacking mode
using social bots. The aim of this research work is to design an intelligent framework that will prevent
attacks through social bots on social media network platforms.
DESIGN OF A MINIATURE RECTANGULAR PATCH ANTENNA FOR KU BAND APPLICATIONSijasa
A significant portion of communication devices employs microstrip antennas because of their compact size,
low profile, and ability to conform to both planar and non-planar surfaces. To achieve this, we present a
miniature inset-fed rectangular patch antenna using partial ground plane for Ku band applications. The
proposed antenna design used an operating frequency of 15.5 GHz, a FR4 substrate with a dielectric
constant of 4.3, and a thickness of 1.4 mm. It is fed by a 50 Ω inset feedline. Computer simulation
technology (CST) software is used to design, simulate, and analyze. The simulation yields the antenna
performance parameters, including return loss (S11), bandwidth, VSWR, gain, directivity, and radiation
efficiency. The simulation findings revealed that the proposed antenna resonated at 15.5 GHz, with a
return loss of -22.312 dB, a bandwidth of 2.73 GHz (2730 MHz), VSWR of 1.17, a gain of 3.843 dBi, a
directivity of 5.926 dBi, and an antenna efficiency of -2.083 dB (61.901%).
AN INTELLIGENT AND DATA-DRIVEN MOBILE VOLUNTEER EVENT MANAGEMENT PLATFORM USI...ijasa
In Lewis and Clark High School’s Key Club, meetings are always held in a crowded classroom. The
system of event sign-up is inefficient and hinders members from joining events. This has led to students
becoming discouraged from joining Key Club and often resulted in a lack of volunteers for important
events. The club needed a more efficient way of connecting volunteers with volunteering opportunities. To
solve this problem, we developed a VolunteerMatch Mobile application using Dart and Flutter framework
for Key Club to use. The next steps will be to add a volunteer event recommendation and matching feature,
utilizing the results from the research on machine learning models and algorithms in this paper.
A STUDY OF IOT BASED REAL-TIME SOLAR POWER REMOTE MONITORING SYSTEMijasa
We have Developed an IoT-based real-time solar power monitoring system in this paper. It seeks an opensource IoT solution that can collect real-time data and continuously monitor the power output and environmental conditions of a photovoltaic panel.The Objective of this work is to continuously monitor the status of various parameters associated with solar systems through sensors without visiting manually, saving time and ensures efficient power output from PV panels while monitoring for faulty solar panels, weather conditionsand other such issues that affect solar effectiveness.Manually, the user must use a multimeter to determine what value of measurement of the system is appropriate for appliance consumers, which is difficult for the larger System. But the Solar Energy Monitoring system is designed to make it easier for users to use the solar system.This system is comprised of a microcontroller (Node MCU), a PV panel, sensors (INA219 Current Module, Digital Temperature Sensor, LDR), a Battery Charger Module, and a battery. The data from the PV panels and other appliances are sent to the cloud (Thingspeak) via the internet using IoT technology and a Wi-Fi module (NodeMCU). It also allows users in remote areas to monitor the parameters of the solar power plant using connected devices. The user can view the current, previous, and average parameters of the solar PV system, such as voltage, current, temperature, and light intensity using a Graphical User Interface. This will facilitate fault detection and maintenance of the solar power plant easier and saves time.
SENSOR BASED SMART IRRIGATION SYSTEM WITH MONITORING AND CONTROLLING USING IN...ijasa
This document presents a sensor-based smart irrigation system using IoT. The system uses soil moisture, temperature, and humidity sensors connected to a NodeMCU microcontroller. The sensor data is sent to a cloud server (ThingSpeak) and displayed as graphs on a website. A web page allows users to control a water pump remotely. The system was tested on a field over one day, recording sensor data and pump status in the morning, afternoon and night. Test results showed the pump turned on when soil moisture fell below a threshold and off when above a threshold, conserving water. The smart irrigation system allows remote monitoring and control to help farmers irrigate crops efficiently with minimal human effort or water waste.
COMPARISON OF BIT ERROR RATE PERFORMANCE OF VARIOUS DIGITAL MODULATION SCHEME...ijasa
This document compares the bit error rate (BER) performance of different digital modulation schemes (BPSK, QPSK, 16-QAM) over AWGN and Rayleigh fading channels using Simulink simulations. It finds that BPSK outperforms QPSK and 16-QAM in both channels. The BER is evaluated for these modulation schemes using two equalization techniques: constant modulus algorithm (CMA) and maximum likelihood sequence estimation (MLSE). According to the results, BPSK has better BER performance than QPSK and 16-QAM when using either equalizer, especially at lower SNR values. CMA equalization works better than MLSE equalization for all modulation schemes based on the BER values obtained.
PERFORMANCE OF CONVOLUTION AND CRC CHANNEL ENCODED V-BLAST 4×4 MIMO MCCDMA WI...ijasa
This document analyzes the performance of a 4x4 Vertical Bell Labs Layered Space-Time (V-Blast) multiple-input multiple-output multi-carrier code division multiple access (MIMO MC-CDMA) wireless communication system using different digital modulation schemes. The system uses minimum mean square error (MMSE) signal detection and 1/2-rated convolution and cyclic redundancy check (CRC) channel encoding. Simulation results show that binary phase-shift keying (BPSK) modulation outperforms differential phase-shift keying (DPSK), quadrature phase-shift keying (QPSK), and 16-quadrature amplitude modulation (QAM), achieving the lowest bit error rate (BER) especially at
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTINGijasa
Cloud computing is an anthology in which one or more computers are connected in a network. Cloud
computing is a cluster of lattice computing, autonomic computing and utility computing. Cloud provides an
on demand services to the users. Many numbers of users access the cloud to utilize the cloud resources.
The security is one the major problem in cloud computing. Hence security is a major issue in cloud
computing. Providing security is a major requirement of cloud computing. The study enclose all the
security issues and attack issues in cloud computing.
The International Journal of Ambient Systems and Applications (IJASA) ijasa
The International Journal of Ambient Systems and applications is a quarterly open access peer-reviewed journal that publishes articles which contribute new results in all areas of ambient Systems. The journal focuses on all technical and practical aspects of ambient Systems, networks, technologies and applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced ambient Systems and establishing new collaborations in these areas.Authors are solicited to contribute to this journal by submitting articles that illustrate research results, projects, surveying works and industrial experiences that describe significant advances in ambient Systems.
A SCRUTINY TO ATTACK ISSUES AND SECURITY CHALLENGES IN CLOUD COMPUTINGijasa
Cloud computing is an anthology in which one or more computers are connected in a network. Cloud computing is a cluster of lattice computing, autonomic computing and utility computing. Cloud provides an on demand services to the users. Many numbers of users access the cloud to utilize the cloud resources. The security is one the major problem in cloud computing. Hence security is a major issue in cloud computing. Providing security is a major requirement of cloud computing. The study enclose all the security issues and attack issues in cloud computing.
TOWARD ORGANIC COMPUTING APPROACH FOR CYBERNETIC RESPONSIVE ENVIRONMENTijasa
The developpment of the Internet of Things (IoT) concept revives Responsive Environments (RE) technologies. Nowadays, the idea of a permanent connection between physical and digital world is technologically possible. The capillar Internet relates to the Internet extension into daily appliances such as they become actors of Internet like any hu-man. The parallel development of Machine-to-Machine
communications and Arti cial Intelligence (AI) technics start a new area of cybernetic. This paper presents an approach for Cybernetic Organism (Cyborg) for RE based on Organic Computing (OC). In such approach, each appli-ance is a part of an autonomic system in order to control a physical environment.The underlying idea is that such systems must have self-x properties in order to adapt their behavior to
external disturbances with a high-degree of autonomy.
A STUDY ON DEVELOPING A SMART ENVIRONMENT IN AGRICULTURAL IRRIGATION TECHNIQUEijasa
Maintaining a good irrigation system is a necessity in today’s water scarcity environment. This paper describes a new approach for automated Smart Irrigation (SIR) system in agricultural management. Using
various types of sensors in the crop field area, temperature and moisture value of the soil is monitored.Based on the sensed data, SIR will automatically decide about the necessary action for irrigation and also notifies the user. The system will also focus on the reduction of energy consumption by the sensors during communication.
A REVIEW ON DDOS PREVENTION AND DETECTION METHODOLOGYijasa
Denial of Service (DoS) or Distributed-Denial of Service (DDoS) is major threat to network security.
Network is collection of nodes that interconnect with each other for exchange the Information. This
information is required for that node is kept confidentially. Attacker in network computer captures this
information that is confidential and misuse the network. Hence security is one of the major issues. There
are one or many attacks in network. One of the major threats to internet service is DDoS (Distributed
denial of services) attack. DDoS attack is a malicious attempt to suspending or interrupting services to
target node. DDoS or DoS is an attempt to make network resource or the machine is unavailable to its
intended user. Many ideas are developed for avoiding the DDoS or DoS. DDoS happen in two ways
naturally or it may due to some botnets .Various schemes are developed defense against to this attack.
Main idea of this paper is present basis of DDoS attack. DDoS attack types, DDoS attack components,
survey on different mechanism to prevent DDoS
The smart mobile terminal operator platform Android is getting popular all over the world with its wide variety of applications and enormous use in numerous spheres of our daily life. Considering the fact of increasing demand of home security and automation, an Android based control system is presented in this paper where the proposed system can maintain the security of home main entrance and also the car door lock. Another important feature of the designed system is that it can control the overall appliances in a room. The mobile to security system or home automation system interface is established through Bluetooth. The hardware part is designed with the PIC microcontroller.
The World Wide Web is booming and radically vibrant due to the well established standards and widely accountable framework which guarantees the interoperability at various levels of the application and the society as a whole. So far, the web has been functioning at the random rate on the basis of the human intervention and some manual processing but the next generation web which the researchers called semantic web, edging for automatic processing and machine-level understanding. The well set notion, Semantic Web would be turn possible if only there exists the further levels of interoperability prevails among the applications and networks. In achieving this interoperability and greater functionality among the applications, the W3C standardization has already released the well defined standards such as RDF/RDF Schema and OWL. Using XML as a tool for semantic interoperability has not achieved anything effective and failed to bring the interconnection at the larger level. This leads to the further inclusion of inference layer at the top of the web architecture and its paves the way for proposing the common design for encoding the ontology representation languages in the data models such as RDF/RDFS. In this research article, we have given the clear implication of semantic web research roots and its ontological background process which may help to augment the sheer understanding of named entities in the web.
Wireless sensor networks provide ubiquitous computing systems in various open environments. In the
environment, sensor nodes can easily be compromised by adversaries to generate injecting false data
attacks. The injecting false data attack not only consumes unnecessary energy in en-route nodes, but also
causes false alarms at the base station. To detect this type of attack, a bandwidth-efficient cooperative
authentication (BECAN) scheme was proposed to achieve high filtering probability and high reliability
based on random graph characteristics and cooperative bit-compressed authentication techniques. This
scheme may waste energy resources in en-route nodes due to the fixed number of forwarding reports. In
this paper, our proposed method effectively selects a dynamic number of forwarding reports in the source
nodes based on an evaluation function. The experimental results indicate that our proposed method
enhances the energy savings while maintaining security levels as compared to BECAN.
Artificial neural networks (ANN) consider classification as one of the most dynamic research and
application areas. ANN is the branch of Artificial Intelligence (AI). The neural network was trained by
back propagation algorithm. The different combinations of functions and its effect while using ANN as a
classifier is studied and the correctness of these functions are analyzed for various kinds of datasets. The
back propagation neural network (BPNN) can be used as a highly successful tool for dataset classification
with suitable combination of training, learning and transfer functions. When the maximum likelihood
method was compared with backpropagation neural network method, the BPNN was more accurate than
maximum likelihood method. A high predictive ability with stable and well functioning BPNN is possible.
Multilayer feed-forward neural network algorithm is also used for classification. However BPNN proves to
be more effective than other classification algorithms.
Wireless sensor networks (WSNs) are regularly deployed in harsh and unattended environments, and
sensor nodes are easily exposed to attacks due to the random arrangement of the sensor field. An attacker
can inject fabricated reports from a compromised node with false votes and false vote-based reports. The
false report attacks can waste the energy of the intermediate nodes, shortening the network lifetime.
Furthermore, false votes cause the filtering out of legitimate reports. A probabilistic voting-based filtering
scheme (PVFS) was proposed as a countermeasure against this type of attacks by Li and Wu. PVFS uses a
vote threshold, a security threshold, and a verification node. The scheme does not make additional use
energy or communications resources because the verification node and threshold values are fixed. There
needs to be a verification node selection method that considers the energy resources of the node. In this
paper, we propose a verification path election scheme based on a fuzzy logic system. In the proposed
scheme, one node transmits reports in the node with a strong state through a fuzzy logic system after which
a neighbor is selected out of two from the surroundings. Experimental results show that the proposed
scheme improves energy savings up to maximum 13% relative to the PVFS.
In this paper a novel intelligent soft computing based cryptographic technique based on synchronization of
two chaotic systems (CSCT) between sender and receiver has been proposed to generate session key using
Pecora and Caroll (PC) method. Chaotic system has some unique features like sensitive to initial
conditions, topologically mixing; and dense periodic orbits. By nature, the Lorenz system is very sensitive
to initial conditions meaning that the error between attacker and receiver is going to grow exponentially if
there is a very slight difference between their initial conditions. All these features make chaotic system as
good alternatives for session key generation. In the proposed CSCT few parameters ( , b , r , x1 ,y2 and z2 )
are being exchanged between sender and receiver. Some of the parameter which takes major roles to form
the session key does not get transmitted via public channel, sender keeps these parameters secret. This way
of handling parameter passing mechanism prevents any kind of attacks during exchange of parameters like
sniffing, spoofing or phishing.
A black-hole attack in the Mobile Ad-hoc NETwork (MANET) is an attack occurs due to malicious nodes,
which attracts the data packets by falsely advertising a fresh route to the destination. In this paper, we
present a clustering approach in Ad-hoc On-demand Distance Vector (AODV) routing protocol for the
detection and prevention of black-hole attack in MANETs. In this approach every member of the cluster will
ping once to the cluster head, to detect the peculiar difference between the number of data packets received
and forwarded by the node. If anomalousness is perceived, all the nodes will obscure the malicious nodes
from the network.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
SMART SOUND SYSTEM APPLIED FOR THE EXTENSIVE CARE OF PEOPLE WITH HEARING IMPAIRMENT
1. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
DOI : 10.5121/ijasa.2022.10301 1
SMART SOUND SYSTEM APPLIED
FOR THE EXTENSIVE CARE OF PEOPLE
WITH HEARING IMPAIRMENT
Smitha S Maganti, Sahana S, Kriti K, Shravanthi Madhugiri and Priya S
Computer Science and Engineering Department, BNMIT, Bangalore. India
ABSTRACT
We, as normal people, have access to a potent communication tool, which is sound. Although we can
continuously gather, analyse, and interpret sounds thanks to our sense of hearing, it can be challenging for
people with hearing impairment to perceive their surroundings through sound. Also known as PWHI
(People with Hearing Impairment). Auditory/phonic impairment is one of the most prevailing sensory
deficits in humans at present. Fortunately, there is room to apply a solution to this issue, given the
development of technology. Our project involves capturing ambient sounds from the user’s surroundings
and notifying the user through a mobile application using IoT and Deep Learning. Its architecture offers
sound recognition using a tool, such as a microphone, to capture sounds from the user's surroundings.
These sounds are identified and categorized as ambient sounds, like a doorbell, baby cry, and dog barking;
as well as emergency-related sounds, such as alarms, sirens, etc.
KEYWORDS
Deaf, PWHI, DHH, Sound detection, ResNet50, Deep Learning, Internet of Things, Mobile Application.
1. INTRODUCTION
Being privy to all the sounds around us is highly crucial. We can perceive the world around us
through sound, due to our sense of hearing. Significant events that are occurring out of our sight
can be signalled to us via auditory cues. Society exclusively uses sound to convey various sorts of
information to each other. To give a specific example, we have alarms that sound to alert us about
critical events and emergencies, and people who ring doorbells to let us know they are there.
Critical information is inaccessible to the deaf and people with hearing impairment because of
these societal norms (PWHI).
Technical solutions are frequently specialized for certain sounds, whereas non-technical sound
awareness techniques like ocular inspection might be annoying and difficult. For instance, some
deaf individuals additionally connect their doorbell to their house lights so that the lights flash
when the doorbell rings. These solutions, however, focus on certain sounds. The cost and
inconvenience of buying a distinct gadget for each sound can be high. Due to the uniqueness of
each person's life and the noises contained within it, some sounds cannot be covered by even a
large number of devices.
Our goal is to design a deep learning model that would identify sounds heard in the user's
environment into the appropriate category, such as a dog barking, a baby crying, or a doorbell
ringing, and then alert the user via notifications to a mobile application so they can take the
appropriate actions. The dataset we have employed for the testing and training of the
2. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
2
classification algorithm is Google AudioSet. The STFT (short time fourier transform) is used as a
pre-processing tool to create spectrograms from the raw audio files in WAV format before further
processing. The features of the spectrograms generated by the Short-Time Fourier Transform are
extracted using the Log-Mel Spectrogram. The spectrograms are then uploaded to our ResNet50
deep learning model, as input for further analysis and classification of the audio.
The aim of our project is to implement the suggested methodology to accurately identify and
categorize the sounds in the user's environment while also addressing the shortcomings of the
existing systems, including their accuracy, frequency of repeated and persistent notifications, and
error rates.
2. RELATED WORK
JoãoElison da Rosa Tavares et al. [1] proposed a system called the Apollo SignSound system. In
a smart home setting, this solution encourages accessibility for PWHI and deaf individuals. The
use of neural networks in Apollo SignSound's ambient danger detection represents its scientific
contribution. In order to provide accessible alerts in Brazilian Sign Language, SignSound will
consider the profile of the user as well, particularly the deafness degree (which, in Portuguese is
LIBRAS). The risk warnings delivered to a PWHI or deaf smartphone may additionally vibrate or
switch on the device's torchlight. This system was enforced as part of a dual-part device, which is
the MAX4466 sensor that senses sounds. Used to record background ambient sounds and the
ESP8266 Wi-Fi module and the SignSound Server to communicate with one other over HTTP.
The FFT (Fast Fourier Transform) is used for data pre-processing. A neural network (NN) using
the Multi-Layer Perceptron (MLP) architecture was built. After gathering 20 examples of every
sort of sound, such as door knocks, dog barks, and boiling kettles, the NN was trained. The NN
was modified throughout the training phase to prevent overfitting and underfitting. It was decided
to use a hybrid cloud architecture, where Nueral Network training takes place on cloud while
event detection takes place on local cloud, known as Fog computing. Three scenarios that
simulated real-world circumstances were used to accomplish the numeric evaluation of the
Apollo SignSound system. Twenty collections were run for each scenario, and the results showed
a mean inclusive f-score of 0.73. The neural network might be enhanced to increase its accuracy,
and this approach could be integrated with wearable gadgets like smartwatches to create
personalised notifications for PWHI.
Dhruv Jain et al. [2] introduced HomeSound, an indoor sound detection architecture for DHH
(Deaf and hard of hearing) users. They employed a deep CNN named VGG16, which was
previously trained on 8 million YouTube videos, to build a reliable, real-time sound classification
engine. Transfer learning was used to modify VGG16 for sound classification as it was primarily
created for video classification. 19 typical household sounds, such as dog barking, knocking on
doors, and conversation, were acquired from 6 libraries: BBC, TAU, Network Sound, TUT,
Freesound and UPC. No participants reported false positives. However, false negatives (such as
unsubstantiated noises like garbage disposal) and misattributions (such as a fan being mistaken
for a microwave) were both problematic. These misclassification problems point to the necessity
for increased system accuracy. Along with classification errors, overly persistent notifications
were also a problem.
Aurora Polo-Rodriguez et al. [3] proposed the utilization of surrounding audio sensors to
recognise incidents that rise from everyday tasks performed by house residents. They created a
balanced dataset that was gathered and classified under carefully controlled circumstances.
Convolutional network inputs were used to generate a spectrum representation of sounds, and the
results for identifying ambient noises were promising. To sieve through the unprocessed audio
event identification and eliminate false positives in the recognition of real-time event, using the
3. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
3
temporary restrictions that protoforms offered, ambiguous processing of audio event streams was
introduced to IoT board. In the realm of audio recognition, CNNs and the usage of spectrograms
for sound representation have been shown to produce positive outcomes in the recognition of
sounds, which may be utilized for music signal analysis and ambient sound categorization. In
particular, MFCC (Mel-frequency cepstral coefficient) and LM (log-Mel spectrogram) usage
have been suggested for robust representation in sound categorization. A MF (Mel-frequency
spectrogram) and in order to offer a deep learning model for the sound detection in the
background of commonplace cases, convolutional neural networks are employed. Under
controlled settings, the performance of the audio dataset for both the CNN models shows great
results, with CNN + MFCC displaying on of the best results. With the Audioset dataset, however,
the process of recognizing sounds of everyday activities is really poor. This is because audio
snippets derived from the video on YouTube contain noise that overlaps with the other noises and
audio that is generated from a variety of origins.
Chuan-Yu Chang et al. [4] proposed an anomalous sound identification system for monitoring
interior sounds. Each sound frame was used to extract 24 characteristics. Then, to choose high
discriminative features, SFFS (sequential floating forward selection) was used. The sounds were
finally divided into six categories by the support vector machine (SVM), which consist of
screams, baby cries, coughs, breaking of glass, laughs, and rings of doorbells. According to the
experiment's findings, the suggested system has a high recognition rate and is capable of
accurately identifying various types of anomalous sounds. Extraction of sound characteristics is
by far the most crucial process in sound signal detection. The outcomes obtained were both real
and imaginary values from the modification after applying the FFT (fast Fourier transform), as
well as spectrum information. 15 frequency domain variables, such as intensity, variance,
average, standard deviation, bandwidth, centroid, roll-off, total spectrum power, maximum, sub-
band powers, valley and peak, in addition to MFCC and LPCC, depends on the fastfourier
transform to be retrieved.. Peak, contrast and valley of MFCC and Octave-based features were
selected to be the 4 discriminative properties using this method using the SFFS. The SVM was
finally used for categorization. Exploratory outcomes depicted that this technique achieved a
good precision rate of 86%.
3. PROBLEM STATEMENT
Sounds present in our surroundings play a huge role in our day-to-day lives. People who cannot
perceive these sounds face a lot of hindrances in performing the simplest of tasks, like sound
from a fire alarm, smoke detector, baby crying etc. The aim of our project is to implement the
suggested methodology to accurately identify and categorize the sounds in the user's environment
while also addressing the shortcomings of the existing systems, including their accuracy,
frequency of repeated and persistent notifications, and error rates.
4. PROPOSED SOLUTION
The project's main goal is to create a DL model that would identify and categorize sounds heard
in the user's environment into the appropriate category, such as a dog barking, a baby crying, or a
doorbell ringing, and then alert the user via a mobile application about these sounds so they can
take the appropriate actions.
4. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
4
4.1. Internet of Things – Smart Home
The Internet of Things layer is in charge of gathering and delivering user-generated sounds to
storage for additional processing.
4.2. Server
The server layer is responsible for hosting the DL model for the classification of the sound. It
checks the storage for real-time sounds, processes them, and classifies the sound to its class. It
communicates the result of the classification to the Mobile Application through Rest APIs.
4.3. Mobile App Notification
The Mobile Application sends an API request to the server to check for sounds. If there are
sounds in the storage and they are classified into one of the classes, the server returns the
classification result back to the app. The app displays the result and gives a notification of the
result to the PWHI user
Figure 1. System Architecture
4.4. Data Flow Diagram
The user's environment serves as the source of the real-time audio in our project. After that, the
sounds are uploaded and kept in Firebase Storage. Following that, the sounds are loaded onto the
librosa library as required by the Python software, which is hosted on the PythonAnywhere
cloud. The sounds are then transformed to spectrograms after the features of the sounds have
been extracted. After that, the spectrograms are pre-processed into a model-acceptable format.
The ResNet50 deep learning is then fed the spectrograms in order to classify and identify the
sound. The mobile application receives the classification's outcome. The user is then informed
and the information is then shown in the app. The data flow diagram is shown in fig 2.
5. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
5
5. BUILDING THE DEEP LEARNING MODEL
5.1. Dataset Acquisition
The dataset from FreeSound and Google AudioSet was utilized to develop and validate the
categorization model. Noisy and Clean data were both collected. Baby crying, dog barking,
coughing, dishes and pans, doorbell, fire alarm, smoke detector, sneeze, thunder, and water were
taken into account as ambient noises. A website called FreeSound offers sound files free of
background noise for every kind of ambient sound.
Figure 2 Data Flow Diagram
5.2. Data Augmentation
To provide the model more data to train on, we have augmented the clean dataset of each class.
We have used Time Shift, Pitch Shift, and Time Stretch.
5.3. Spectrogram Conversion
The Short-time fourier transform (STFT) is used as a preprocessing tool to create spectrograms
from the raw audio files in WAV format [12] before further processing [5][6][7]. The Librosa
library was used to accomplish this.
5.4. Feature Extraction
The characteristics are extracted from the spectrograms produced by the Short-time fourier
transform using the log-mel spectrogram [8][9]. The classification model is then used to process
and label the Mel-Spectrograms as input.
6. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
6
5.5. Building the Pre-Trained Model
The keras library is used to build and train the pre-trained models. Transfer learning is used to
train these models on our dataset. The models we trained are ResNet50 [10], VGG16,
MobileNetV1, InceptionV3, InceptionResNetV2, and Xception [13].
5.6. Training and Testing the Models
The models must then be trained, validated, and tested after being built. The keras library is
utilized to complete this step.
5.7. Dataset Split
We split our dataset into train, test, and validation groups in the following proportions: 80:10:10.
For each class, we had about 60 validation and testing samples and about 500 training samples.
5.8. Parameters Used
1. Learning Rate (lr): 0.001
2. Optimizer: Adam Optimizer, which follows the stochastic gradient descent method to
optimize the weights
3. Epochs: 15
4. Spectrogram size: 224 x 224
5.9. Test Accuracies
1. ResNet50: 100%
2. VGG16: 56.29%
3. MobileNetV1: 90.2%
4. InceptionV3: 59.5%
5. InceptionResNetV2: 71.9%
6. Xception: 100%
5.10. Outcome
ResNet50 and Xception models, as can be seen above, classified our test data with an accuracy of
100 percent. ResNet50 has the advantage over Xception in that it is lighter and faster. ResNet50
model loads the data and classifies more quickly, and it takes up less memory than Xception.
Hence, we decided to identify our real-time ambient sounds using the ResNet50 model [10][11].
5.11. Converting the Model to TensorFlow Lite
After building and training the ResNet50 model, it was compressed to a smaller, more efficient
ML model format called TensorFlow Lite model using TensorFlow library. The model was then
hosted on the PythonAnywhere cloud to process and classify real-time data.
7. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
7
6. SYSTEM IMPLEMENTATION
The project has been implemented by dividing it into 3 modules:
1. Internet of Things - Collects real-time sounds.
2. Server - Pre-processes and classifies the real-time sounds.
3. Mobile Application - Displays and notifies the sounds to the users.
6.1. Internet of Things
This module of our project is implemented using 2 hardware components: Raspberry Pi 3 and
USB microphone
The Raspberry Pi is used in headless setups without a monitor, keyboard, or mouse. The
RaspberryPi's USB microphone can be accessed with Alsa. The recording was made using
Libportaudio and arecord.
The real-time sounds are stored in Firebase Storage for further processing. The Raspberry Pi is
programmed using Python to connect to Firebase Storage and uploads the real-time audio files to
Firebase via the internet each time a new sound is recorded. This is accomplished by utilizing the
"pryebase" library in python.
6.2. Server
PythonAnywhere cloud is used to implement this module of our project. The online services,
audio pre-processing, and model classification methods were created using Python programming.
ResNet50, the chosen deep learning model for classification, as well as a Python program to
perform various tasks are hosted on the PythonAnywhere server. The Python software
implements the web services using the Flask-RESTful framework. When an API request is made
to the server and directed to this specific function, this function is called Every 10 seconds, the
mobile application sends a GET request to the server via the API to route the necessary function.
For each API call, the following actions are performed on the server.
1. Loading real-time audio from Firebase Storage - Real-time audio files are downloaded
from Firebase Storage in the first section of the Python application. This is accomplished
using the libraries "pyrebase" and "firebase admin." The 'librosa' and 'wavfile' libraries
are next used to load the audio files for further processing. The audio file is removed
from the Firebase Storage after it has been loaded onto the aforementioned libraries. The
program ends and returns "None" if there are no audio files in the Firebase Storage.
2. Spectrogram conversion - The program's second section extracts audio features and turns
them into spectrograms. This procedure makes use of the Librosa library. The required
features are extracted from the audio using the Short-Time Fourier Transform, and the
audio is then transformed into a Mel Spectrogram.
3. Pre-processing - The spectrogram is pre-processed in the third section of the program in
accordance with the model's input. The spectrogram is preprocessed before being fed to
the ResNet50 model using the 'tensorflow.keras.preprocessing.image' and
'tensorflow.keras.applications.resnet50.preprocess input' libraries.
4. Classification of sound - The pre-processed spectrogram is fed into the model in the
fourth section of the program to predict the sound. The model assigns the input audio to
the appropriate class according to its composition. The Keras library is used to
implement this.
8. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
8
5. Returning the result to the Mobile Application - The program's final section sends the
classification outcome back to the mobile application. The mobile app sends an API
request, and once the model categorizes the sound, the result is returned to it in JSON
format. The result is returned as "None" if the sound does not fit into any of the classes
To summarize, every time the mobile app makes an API call, the server retrieves the real-time
input from Firebase Storage, processes it, categorizes it using the ResNet50 model, and then
sends the classified data back to the mobile app.
6.3. Mobile Application for Notification
Using the Java programming language and the Android Studio IDE, the mobile application is
created for devices running the Android operating system. Every 10 seconds, the app sends API
GET requests to the PythonAnywhere server. To make the API requests, we used the 'volley'
library. The results of the real-time sound classification are returned in JSON format by the
server in response to requests. Using the "json" library, the result is extracted and put into String
format. If the outcome is "None," either Firebase Storage is empty or the sound has not been
assigned to any class. As a result, nothing occurs in the app. If the sound is classified into a class,
the app displays that class, and the user receives a warning alerting them to a sound they should
be aware of in their immediate surroundings.
The user interface of the app has the following features as explained below.
1. A dynamic sound display view for each sound. The sound's name is written in English on
the card, along with a picture of the sound and a button that allows the user to delete the
sound after viewing it. The same sound won't be played twice if it hasn't been deleted.
This prevents the user from hearing the same notification sound again in a short period of
time, which could have negative impacts.
2. A choice to alter the notification's format. Users can select either a torch light or a
vibration as their notification mode. Every time the model hears a new sound, it adds it to
the app's display and, depending on what the user has chosen, also adds a vibration or
torch light to alert them.
Figure 3 shows how the UI of the app looks. The sounds are displayed as cards with notification
settings at the bottom of the screen.
9. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
9
Figure 3. Mobile Application to Display the Sounds
7. TESTING AND VALIDATION
7.1. Testing Results of Different Models on Our Dataset
Table 1 displays the testing outcomes for several models based on Accuracy, Recall, Precision,
and F1Score. ResNet50 and Xception architectures showed the best accuracy and F1 score out of
all the models. Since ResNet50 is lighter, faster, and easily deployable as compared to Xception,
we finalized ResNet50 as the architecture. Xception poses a problem when there is a memory
constraint, and is slow to load the data.
Table 1. Test Result
Models Accuracy Recall Precision F1Score
ResNet50 1.0 1.0 1.0 1.0
MobileNet 0.902 0.905 0.908 0.892
InceptionResnetV2 0.719 0.697 0.766 0.712
InceptionV3 0.595 0.598 0.593 0.571
Vgg16 0.563 0.558 0.596 0.536
Xception 1.0 1.0 1.0 1.0
10. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
10
7.2. Confusion Matrices
The 10 classes that are chosen are represented as numbers from 0-9 in the matrices as shown in
Table 2.
Table 2. Classes
Numb
er
0 1 2
3 4 5 6 7 8 9
Class Baby
Cry
Dog
Bark
Coug
h
Dishes
and Pots
Door
bell
Fire
Alarm
Smoke
Detector
Snee
ze
Thund
er
Wate
r
Figure 6 shows the confusion matrices of all the models to visualize the accuracies.
8. RESULT AND DISCUSSIONS
The real-time sounds are recorded using Raspberry Pi in WAV format. The sounds are then
classified using the ResNet50 model, and the result is sent to the Mobile Application. The app
displays the sound and notifies the user through vibration or torch light. The process is shown
below.
1. Recording real-time sounds using RaspberryPi Command Line Interface for a duration of
5 seconds and at 48000 Hz in WAV format. The sound being recorded is Baby Cry. This
process is shown in Fig. 4.
2. The sound recorded above is processed and classified. The result is sent to the Mobile
Application. The app notifies the user and displays the sound in a dynamic card view
with English Text, Pictorial Representation, and a discard button to remove the sound
after it is seen. It also allows the user to set the type of notifications to either vibration or
torch light. This process is shown in Fig. 5.
Figure 4. Recording Real-Time Baby Cry Sound Using Raspberry CLI
11. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
11
Figure 5. Baby Crying Sound Detected on the Application
Figure 6. Confusion Matrices (True Label vs Predicted Label) of all the Models
12. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
12
9. CONCLUSION AND FUTURE SCOPE
The scientific contribution of our project is the detection of ambient sound for people with
hearing impairment (PWHI) using neural networks with notifications in multiple modes. Our
deep learning model, ResNet50, is trained to recognize 10 sounds: Baby Crying, Barking, Cough,
Dishes and Pots, Doorbell, Fire Alarm, Smoke Detector, Sneeze, Thunder, and Water. The goal
of our project is to detect sounds in the user’s ambiance in real-time and notify the user when one
of the sounds mentioned above occurs in the surroundings so that they can take the respective
action. This allows them to interact with their surroundings, be aware of their environment, and
carry out day-to-day tasks with ease.
When put to the test on our test dataset, the results showed that the ResNet50 architecture had an
overall f1-score of 1.0. It gave the best accuracy for our training, validation, and testing dataset
compared to other models that were trained, namely, VGG16, Xception, MobileNet,
InceptionV3, and InceptionResNetV2. Hence, the ResNet50 model was used to classify real-time
sounds from the surroundings and send the result as a notification to the users through a mobile
application.
Although the ResNet50 model has shown 100% accuracy on our test dataset, it can exhibit
further improvements for real-time data classification. We can use better audio recording
mechanisms to improve the audio clarity and aid the classification process. In addition to this, the
implementation of Indian Sign Language (ISL) can be incorporated with the mobile application
for a better understanding of PWHI.
All in all, it is important to ensure that these people are guaranteed the opportunity to experience
the same comfort of living in their homes as those without hearing loss.
REFERENCES
[1] J. E. da Rosa Tavares and J. L. Victória Barbosa, “Apollo SignSound: an intelligent system applied to
ubiquitous healthcare of deaf people,” Journal of Reliable Intelligent Environments, vol. 7, no. 2, pp.
157–170, Jun. 2021, doi: 10.1007/s40860-020-00119-w.
[2] D. Jain et al., “HomeSound: An Iterative Field Deployment of an In-Home Sound Awareness System
for Deaf or Hard of Hearing Users,” in Proceedings of the 2020 CHI Conference on Human Factors
in Computing Systems, Honolulu HI USA, Apr. 2020, pp. 1–12. doi: 10.1145/3313831.3376758.S.
Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III,
G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
[3] A. Polo-Rodriguez, J. M. VilchezChiachio, C. Paggetti, and J. Medina-Quero, “Ambient Sound
Recognition of Daily Events by Means of Convolutional Neural Networks and Fuzzy Temporal
Restrictions,” Appl. Sci., vol. 11, no. 15, p. 6978, Jul. 2021, doi: 10.3390/app11156978.
[4] Chuan-Yu Chang and Yi-Ping Chang, “Application of abnormal sound recognition system for indoor
environment,” in 2013 9th International Conference on Information, Communications & Signal
Processing, Tainan, Taiwan, Dec. 2013, pp. 1–5. doi: 10.1109/ICICS.2013.6782772.
[5] S. Dass, M. S. Holi, and K. S. Rajan, “A Comparative Study on FFT,STFT and WT for the Analysis
of Auditory Evoked Potentials,” Int. J. Eng. Res., vol. 2, no. 11, p. 6, 2013.
[6] D. D. Jayasree, “Classification of Power Quality Disturbance Signals Using FFT, STFT, Wavelet
Transforms and Neural Networks - A Comparative Analysis,” in International Conference on
Computational Intelligence and Multimedia Applications (ICCIMA 2007), Sivakasi, Tamil Nadu,
India, Dec. 2007, pp. 335–340. doi: 10.1109/ICCIMA.2007.279.
[7] N. Mehala and R. Dahiya, “A Comparative Study of FFT, STFT and Wavelet Techniques for
Induction Machine Fault Diagnostic Analysis,” p. 6.
[8] B. Suhas et al., “Speech task based automatic classification of ALS and Parkinson’s Disease and their
severity using log Mel spectrograms,” in 2020 International Conference on Signal Processing and
13. International Journal of Ambient Systems and Applications (IJASA) Vol.10, No.1/2/3, September 2022
13
Communications (SPCOM), Bangalore, India, Jul. 2020, pp. 1–5. doi:
10.1109/SPCOM50965.2020.9179503.
[9] A. Meghanani, A. C. S., and A. G. Ramakrishnan, “An Exploration of Log-Mel Spectrogram and
MFCC Features for Alzheimer’s Dementia Recognition from Spontaneous Speech,” in 2021 IEEE
Spoken Language Technology Workshop (SLT), Shenzhen, China, Jan. 2021, pp. 670–677. doi:
10.1109/SLT48900.2021.9383491.
[10] M. Sankupellay and D. Konovalov, “Bird Call Recognition using Deep Convolutional Neural
Network, ResNet-50,” p. 8, 2018.
[11] C. Giuseppe, “A ResNet-50-Based Convolutional Neural Network Model for Language ID
Identification from Speech Recordings,” in Proceedings of the Third Workshop on Computational
Typology and Multilingual NLP, Online, 2021, pp. 136–144. doi: 10.18653/v1/2021.sigtyp-1.13.
[12] Y. Kumar, M. Vyas, and S. Garg, “From Image Classification to Audio Classification,” p. 8.
[13] S. Hershey et al., “CNN Architectures for Large-Scale Audio Classification,” ArXiv160909430 Cs
Stat, Jan. 2017, Accessed: Jan. 25, 2022.
AUTHORS
Smitha S Maganti, student at BNM Institute of Technology, pursuing B.E in Computer
Science and Engineering. Enjoys coding, and is enthusiastic about making an impact and
giving back to the community. Specialization in Machine Learning and Software
Development and a keen interest to learn new technologies.
Sahana S, student at BNM Institute of Technology pursuing B.E in Computer Science and
Engineering. Keen to solve real life problems and automate daily tasks in the field of
Healthcare, Agriculture and IT using technologies like AI, ML and IoT.
Kriti K, student at BNM Institute of Technology pursuing B.E in Computer Science and
Engineering. An engineer with a deep rooted passion for coding along with the intent to
translate ideas into software. A major proponent for revolutionizing the world and thus
equipping myself with the necessary tools and knowledge to contribute to the movement.
Shravanthi Madhugiri, student at BNM Institute of Technology, pursuing B.E. in
Computer Science and Engineering. UI/UX designer who has a passion for design and a
keen eye for detail.
Priya S, assistant Professor at BNM Institute of Technology, area of specialization in
Network security, Data Mining and Machine Learning.