A feature extraction for musical instrument tones that based on a transform domain approach was
proposed in this paper. The aim of the proposed feature extraction was to get the lower feature extraction
coefficients. In general, the proposed feature extraction was carried out as follow. Firstly, the input signal
was transformed using FFT (Fast Fourier Transform). Secondly, the left half of the transformed signal was
divided into a number of segments. Finally, the averaging results of that segments, was the feature
extraction of the input signal. Based on the test results, the proposed feature extraction was highly efficient
for the tones, which have many significant local peaks in the Fourier transform domain, because it only
required at least four feature extraction coefficients, in order to represent every tone.
The influence of sampling frequency on tone recognition of musical instrumentsTELKOMNIKA JOURNAL
Sampling frequency of musical instruments tone recognition generally follows the Shannon sampling theorem. This paper explores the influence of sampling frequency that does not follow the Shannon sampling theorem, in the tone recognition system using segment averaging for feature extraction and template matching for classification. The musical instruments we used were bellyra, flute, and pianica, where each of them represented a musical instrument that had one, a few, and many significant local peaks in the Discrete Fourier Transform (DFT) domain. Based on our experiments, until the sampling frequency is as low as 312 Hz, recognition rate performance of bellyra and flute tones were influenced a little since it reduced in the range of 5%. However, recognition rate performance of pianica tones was not influenced by that sampling frequency. Therefore, if that kind of reduced recognition rate could be accepted, the sampling frequency as low as 312 Hz could be used for tone recognition of musical instruments.
Digital signal processing through speech, hearing, and PythonMel Chua
Slides from PyCon 2013 tutorial reformatted for self-study. Code at https://github.com/mchua/pycon-sigproc, original description follows: Why do pianos sound different from guitars? How can we visualize how deafness affects a child's speech? These are signal processing questions, traditionally tackled only by upper-level engineering students with MATLAB and differential equations; we're going to do it with algebra and basic Python skills. Based on a signal processing class for audiology graduate students, taught by a deaf musician.
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...cscpconf
Analysis of speech for recognition of stress is important for identification of emotional state of
person. This can be done using ‘Linear Techniques’, which has different parameters like pitch,
vocal tract spectrum, formant frequencies, Duration, MFCC etc. which are used for extraction
of features from speech. TEO-CB-Auto-Env is the method which is non-linear method of
features extraction. Analysis is done using TU-Berlin (Technical University of Berlin) German
database. Here emotion recognition is done for different emotions like neutral, happy, disgust,
sad, boredom and anger. Emotion recognition is used in lie detector, database access systems, and in military for recognition of soldiers’ emotion identification during the war.
Construction of The Sampled Signal Up To Any Frequency While Keeping The Samp...CSCJournals
In this paper we will try to develop a method that will let us construct up to any frequency by some additional work we propose. With this method we can construct up to any frequency by adding more hardware to the system with the same sampling rate. By increasing the hardware complexity and keeping the same sampling rate we can reduce the information loss in a proportional manner.
Depth Estimation of Sound Images Using Directional Clustering and Activation...奈良先端大 情報科学研究科
The document proposes two methods for estimating the depth of sound images using direction of arrival (DOA) information extracted from stereo signals.
Method 1 estimates depth based on the shape of the DOA distribution modeled using a generalized Gaussian distribution (GGD), where a smoother distribution indicates a source is farther away.
Method 2 applies activation-shared nonnegative matrix factorization (NMF) to extract features while maintaining directional information and reduce noise interfering with DOA estimation. Conventional NMF generates artificial fluctuations, while shared activation preserves source direction.
An experiment calculates the correlation between estimated and reference depths from 6 datasets containing varying source combinations. Results show the proposed method achieves high correlation, confirming its efficacy over conventional methods
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...奈良先端大 情報科学研究科
This document proposes a sound field reproduction system that utilizes an image sensor to estimate a listener's position and orientation. It summarizes challenges with existing spectral division methods that rely on a fixed reference listening position. The proposed system applies an "equiangular filter" to the spectral division method to locally synthesize sound fields around the listener by limiting the spatial bandwidth, as estimated from the image sensor. Simulation experiments and subjective assessments show the proposed system can accurately reproduce sound fields at the listener's position regardless of frequency, with improved directional perception over conventional methods. However, it did not clearly outperform alternatives in terms of sound quality. Overall, the system aims to address limitations of existing approaches by adapting sound field synthesis based on estimated listener parameters
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
(2012) Rigaud, Falaize, David, Daudet - Does Inharmonicity Improve an NMF-Bas...François Rigaud
This document investigates whether explicitly modeling inharmonicity improves piano transcription accuracy when using non-negative matrix factorization (NMF). It compares three models for the note spectra dictionary in NMF-based piano transcription: 1) strictly harmonic, 2) strictly following theoretical inharmonicity, and 3) relaxed inharmonicity constraints. Experimental results found the inharmonic models improved transcription accuracy compared to the harmonic model, but only when provided a good initialization. The paper aims to better understand how precisely a model needs to capture characteristics like inharmonicity for robust NMF analysis of piano recordings.
The influence of sampling frequency on tone recognition of musical instrumentsTELKOMNIKA JOURNAL
Sampling frequency of musical instruments tone recognition generally follows the Shannon sampling theorem. This paper explores the influence of sampling frequency that does not follow the Shannon sampling theorem, in the tone recognition system using segment averaging for feature extraction and template matching for classification. The musical instruments we used were bellyra, flute, and pianica, where each of them represented a musical instrument that had one, a few, and many significant local peaks in the Discrete Fourier Transform (DFT) domain. Based on our experiments, until the sampling frequency is as low as 312 Hz, recognition rate performance of bellyra and flute tones were influenced a little since it reduced in the range of 5%. However, recognition rate performance of pianica tones was not influenced by that sampling frequency. Therefore, if that kind of reduced recognition rate could be accepted, the sampling frequency as low as 312 Hz could be used for tone recognition of musical instruments.
Digital signal processing through speech, hearing, and PythonMel Chua
Slides from PyCon 2013 tutorial reformatted for self-study. Code at https://github.com/mchua/pycon-sigproc, original description follows: Why do pianos sound different from guitars? How can we visualize how deafness affects a child's speech? These are signal processing questions, traditionally tackled only by upper-level engineering students with MATLAB and differential equations; we're going to do it with algebra and basic Python skills. Based on a signal processing class for audiology graduate students, taught by a deaf musician.
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...cscpconf
Analysis of speech for recognition of stress is important for identification of emotional state of
person. This can be done using ‘Linear Techniques’, which has different parameters like pitch,
vocal tract spectrum, formant frequencies, Duration, MFCC etc. which are used for extraction
of features from speech. TEO-CB-Auto-Env is the method which is non-linear method of
features extraction. Analysis is done using TU-Berlin (Technical University of Berlin) German
database. Here emotion recognition is done for different emotions like neutral, happy, disgust,
sad, boredom and anger. Emotion recognition is used in lie detector, database access systems, and in military for recognition of soldiers’ emotion identification during the war.
Construction of The Sampled Signal Up To Any Frequency While Keeping The Samp...CSCJournals
In this paper we will try to develop a method that will let us construct up to any frequency by some additional work we propose. With this method we can construct up to any frequency by adding more hardware to the system with the same sampling rate. By increasing the hardware complexity and keeping the same sampling rate we can reduce the information loss in a proportional manner.
Depth Estimation of Sound Images Using Directional Clustering and Activation...奈良先端大 情報科学研究科
The document proposes two methods for estimating the depth of sound images using direction of arrival (DOA) information extracted from stereo signals.
Method 1 estimates depth based on the shape of the DOA distribution modeled using a generalized Gaussian distribution (GGD), where a smoother distribution indicates a source is farther away.
Method 2 applies activation-shared nonnegative matrix factorization (NMF) to extract features while maintaining directional information and reduce noise interfering with DOA estimation. Conventional NMF generates artificial fluctuations, while shared activation preserves source direction.
An experiment calculates the correlation between estimated and reference depths from 6 datasets containing varying source combinations. Results show the proposed method achieves high correlation, confirming its efficacy over conventional methods
Robust Sound Field Reproduction against Listener’s Movement Utilizing Image ...奈良先端大 情報科学研究科
This document proposes a sound field reproduction system that utilizes an image sensor to estimate a listener's position and orientation. It summarizes challenges with existing spectral division methods that rely on a fixed reference listening position. The proposed system applies an "equiangular filter" to the spectral division method to locally synthesize sound fields around the listener by limiting the spatial bandwidth, as estimated from the image sensor. Simulation experiments and subjective assessments show the proposed system can accurately reproduce sound fields at the listener's position regardless of frequency, with improved directional perception over conventional methods. However, it did not clearly outperform alternatives in terms of sound quality. Overall, the system aims to address limitations of existing approaches by adapting sound field synthesis based on estimated listener parameters
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
(2012) Rigaud, Falaize, David, Daudet - Does Inharmonicity Improve an NMF-Bas...François Rigaud
This document investigates whether explicitly modeling inharmonicity improves piano transcription accuracy when using non-negative matrix factorization (NMF). It compares three models for the note spectra dictionary in NMF-based piano transcription: 1) strictly harmonic, 2) strictly following theoretical inharmonicity, and 3) relaxed inharmonicity constraints. Experimental results found the inharmonic models improved transcription accuracy compared to the harmonic model, but only when provided a good initialization. The paper aims to better understand how precisely a model needs to capture characteristics like inharmonicity for robust NMF analysis of piano recordings.
This document summarizes a research paper that introduces a probabilistic model for analyzing line spectra, which are sets of prominent frequency components, in musical instrument sounds. The model assumes observations in a time frame are generated by a mixture of notes composed of partials and noise. For piano music specifically, the model introduces fundamental frequency and inharmonicity coefficient as parameters for each note that can be estimated from line spectra using an Expectation-Maximization algorithm. The paper applies this technique to unsupervised estimation of tuning and inharmonicity across the range of a piano from a recorded musical piece.
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...奈良先端大 情報科学研究科
This document summarizes a research presentation on an online divergence switching method for a hybrid source separation technique. The hybrid method combines directional clustering for spatial separation with supervised nonnegative matrix factorization (SNMF) for spectral separation. The proposed method switches between KL divergence and Euclidean distance for the SNMF, depending on the amount of spectral gaps from the directional clustering. When there are many gaps, Euclidean distance is better for basis extrapolation. When gaps are fewer, KL divergence gives better separation. In experiments, the proposed online switching method outperformed using only one divergence, achieving higher signal-to-distortion ratios for music source separation.
Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals
Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper we propose a new method for minimum tracking in Minimum Statistics (MS) noise estimation method. This noise estimation algorithm is proposed for highly nonstationary noise environments. This was confirmed with formal listening tests which indicated that the proposed noise estimation algorithm when integrated in speech enhancement was preferred over other noise estimation algorithms.
This paper describe the morphing concept in which we convert the voice of any person into pre -analyzed or pre-recorded voice of any animals.As the user generate a pre-established voice, his pitch, timbre, vibrato and articulation can be modified to resemble those of a pre-recorded and pre-analyzed voice of animal. This technique is based on SMS. Thus using this concept we can develop many funny application and we can used this type of application in mobile device, personal computer etc. for enjoying the sometime of period.
Audio Steganography Using Tone Insertion TechniqueEditor IJCATR
This paper presents a new technique of embedding text data into an audio file using tone insertion method. The new technique generates two frequency f1 and f2 and inserts them into audio file in a suitable power level according to specific table (stego-table). The proposed technique aimed to increase the payload capacity of the audio file using two bits in the frame without increases the number of the inserted frequencies to four frequencies, as well as using another convoy frequency (CF) for specific pattern. The proposed method conceals the English text into the .wave audio. The performance of the proposed method has been checked by spectrogram, MSE and PSNR.
(2012) Rigaud, David, Daudet - Piano Sound Analysis Using Non-negative Matrix...François Rigaud
This document presents a method for estimating the tuning (fundamental frequency F0) and inharmonicity coefficient (B) of piano tones from single note or chord recordings. The method is based on non-negative matrix factorization with a parametric model for the dictionary atoms that includes the inharmonicity law as a relaxed constraint. The model is optimized using multiplicative update rules to estimate the parameters (B, F0) for each note, even in polyphonic recordings. Applications show the method can accurately estimate tuning and inharmonicity from single notes or chords.
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
This paper proposes a new voicing detection and pitch estimation method that is particularly robust for noisy speech. This method is based on the spectral analysis of the speech multi-scale product. The multi-scale product (MP) consists of making the product of wavelet transform coefficients. The wavelet used is the quadratic spline function. We argue that the spectral of Multi-scale Product Analysis is capable of revealing an estimate of a pitch-harmonic more accurately even in a heavy noisy scenario. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.
Introductory Lecture to Audio Signal ProcessingAngelo Salatino
The document provides an introduction to audio signal processing and related topics. It discusses analog and digital audio signals, the waveform audio file format (WAV) specification including its header structure, and tools for audio processing like FFmpeg and MATLAB. Example code is given to read header metadata and audio samples from a WAV file in C++. While useful for understanding audio formats and processing, the solution contains an error and FFmpeg is noted as a better library for audio tasks.
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
Daichi Kitamura, "Blind audio source separation based on time-frequency structure models," Invited Overview Session in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), Tokyo, Japan, December 2021.
(2013) Rigaud et al. - A parametric model and estimation techniques for the i...François Rigaud
This document proposes a parametric model to jointly model the inharmonicity and tuning of pianos across their entire pitch range. It uses a small number of parameters to represent both the specific design characteristics of different piano types and tuner practices. An estimation algorithm is presented that can estimate the parameters from recordings of isolated notes or chords, assuming the played notes are known. The model aims to provide a synthetic description of a particular piano's tuning/inharmonicity pattern that can highlight tuner choices and be useful for applications like piano synthesis or transcription of piano music.
Data Compression using Multiple Transformation Techniques for Audio Applicati...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document compares the main lobe and side lobes of the frequency response curves for different types of FIR filters designed using the Fourier Series Expansion Method. It analyzes low pass, high pass, and band pass FIR filters with sampling frequencies of 4000Hz, 8000Hz, 12000Hz, 16000Hz, and 20000Hz. The maximum magnitudes in the main lobe and side lobes are observed and compared for each case. The results show that as the sampling frequency increases, the response curves move closer to the ideal response curves with smaller deviations in the magnitudes of the different lobes.
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, "Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), pp. 651–658, Tokyo, Japan, December 2021.
Speaker recognition systems aim to automatically identify or verify a speaker's identity based on characteristics of their voice. There are two main types: speaker identification determines which registered speaker is speaking, while speaker verification accepts or rejects a speaker's claimed identity. All systems contain modules for feature extraction and feature matching. Feature extraction represents the voice signal with parameters like MFCCs that can distinguish speakers. Feature matching compares extracted features from an unknown voice to known speaker models. The document describes the process of MFCC feature extraction in detail, including framing the speech signal, windowing frames, taking the FFT, mapping to the mel scale, and finally the DCT to produce MFCC coefficients.
This document discusses speech signal processing and speech recognition. It begins by defining speech processing and its relationship to digital signal processing. It then outlines several disciplines related to speech processing including signal processing, physics, pattern recognition, and computer science. The document discusses aspects of speech signals including phonemes, the speech waveform, and spectral envelope. It covers various aspects of speech processing including pre-processing, feature extraction, and recognition. It provides details on techniques for pre-processing, feature extraction including filtering, linear predictive coding, and cepstrum. Finally, it summarizes the main steps in a speech recognition procedure including endpoint detection, framing and windowing, feature extraction, and distortion measure calculations for recognition.
PhD Thesis Marius Miron - Source Separation Methods for Orchestral MusicMarius Miron
Video from PhD thesis defence: Marius Miron - Source Separation Methods for Orchestral Music: Timbre-Informed and Score-Informed Strategies
More info here: http://mariusmiron.com/research/thesis/
http://mtg.upf.edu/node/3863
Video here: https://youtu.be/FDrFTTVOtr0
In this paper, the performances of adaptive noise cancelling system employing Least Mean Square (LMS) algorithm are studied considering both white Gaussian noise (Case 1) and colored noise (Case 2)
situations. Performance is analysed with varying number of iterations, Signal to Noise Ratio (SNR) and tap size with considering Mean Square Error (MSE) as the performance measurement criteria. Results show that the noise reduction is better as well as convergence speed is faster for Case 2 as compared with Case 1. It is also observed that MSE decreases with increasing SNR with relatively faster decrease of MSE in Case 2 as compared with Case 1, and on average MSE increases linearly with increasing number of filter
coefficients for both type of noise situations. All the experiments have been done using computer
simulations implemented on MATLAB platform.
This document summarizes a technique called mixed single frequency delay line filtering that is proposed to optimize computational complexity and processing delay for multichannel audio crosstalk cancellation. The technique combines mixed filtering and single frequency delay line filtering. Mixed filtering allows all filtering operations to be performed in a single equation in the frequency domain, reducing computations. Single frequency delay line filtering partitions long filter impulse responses into shorter partitions that are filtered using overlap-save, reducing delay. The proposed technique partitions impulse responses and applies mixed filtering to further reduce computations over existing methods. It is shown to provide less computational complexity and delay than overlap-save filtering while maintaining performance for long impulse responses.
Speech Analysis and synthesis using VocoderIJTET Journal
Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period.
Room Transfer Function Estimation and Room Equalization in Noise EnvironmentsIJERA Editor
Audio quality in listening situation is degraded by indoor room reverberation. Room equalization can be used to
increase the audio quality by applying the inverse transfer function to the input audio signals. In noise
environments, however, it is hard to exactly measure the room transfer function. In this work, we developed the
techniques to measure the room transfer function in indoor noise environments and to enhance the audio quality
by room equalization. From the experimental results, we showed that the proposed techniques can be
successfully used in indoor noise environments
This document summarizes a research paper that introduces a probabilistic model for analyzing line spectra, which are sets of prominent frequency components, in musical instrument sounds. The model assumes observations in a time frame are generated by a mixture of notes composed of partials and noise. For piano music specifically, the model introduces fundamental frequency and inharmonicity coefficient as parameters for each note that can be estimated from line spectra using an Expectation-Maximization algorithm. The paper applies this technique to unsupervised estimation of tuning and inharmonicity across the range of a piano from a recorded musical piece.
Online Divergence Switching for Superresolution-Based Nonnegative Matrix Fa...奈良先端大 情報科学研究科
This document summarizes a research presentation on an online divergence switching method for a hybrid source separation technique. The hybrid method combines directional clustering for spatial separation with supervised nonnegative matrix factorization (SNMF) for spectral separation. The proposed method switches between KL divergence and Euclidean distance for the SNMF, depending on the amount of spectral gaps from the directional clustering. When there are many gaps, Euclidean distance is better for basis extrapolation. When gaps are fewer, KL divergence gives better separation. In experiments, the proposed online switching method outperformed using only one divergence, achieving higher signal-to-distortion ratios for music source separation.
Improvement of minimum tracking in Minimum Statistics noise estimation methodCSCJournals
Noise spectrum estimation is a fundamental component of speech enhancement and speech recognition systems. In this paper we propose a new method for minimum tracking in Minimum Statistics (MS) noise estimation method. This noise estimation algorithm is proposed for highly nonstationary noise environments. This was confirmed with formal listening tests which indicated that the proposed noise estimation algorithm when integrated in speech enhancement was preferred over other noise estimation algorithms.
This paper describe the morphing concept in which we convert the voice of any person into pre -analyzed or pre-recorded voice of any animals.As the user generate a pre-established voice, his pitch, timbre, vibrato and articulation can be modified to resemble those of a pre-recorded and pre-analyzed voice of animal. This technique is based on SMS. Thus using this concept we can develop many funny application and we can used this type of application in mobile device, personal computer etc. for enjoying the sometime of period.
Audio Steganography Using Tone Insertion TechniqueEditor IJCATR
This paper presents a new technique of embedding text data into an audio file using tone insertion method. The new technique generates two frequency f1 and f2 and inserts them into audio file in a suitable power level according to specific table (stego-table). The proposed technique aimed to increase the payload capacity of the audio file using two bits in the frame without increases the number of the inserted frequencies to four frequencies, as well as using another convoy frequency (CF) for specific pattern. The proposed method conceals the English text into the .wave audio. The performance of the proposed method has been checked by spectrogram, MSE and PSNR.
(2012) Rigaud, David, Daudet - Piano Sound Analysis Using Non-negative Matrix...François Rigaud
This document presents a method for estimating the tuning (fundamental frequency F0) and inharmonicity coefficient (B) of piano tones from single note or chord recordings. The method is based on non-negative matrix factorization with a parametric model for the dictionary atoms that includes the inharmonicity law as a relaxed constraint. The model is optimized using multiplicative update rules to estimate the parameters (B, F0) for each note, even in polyphonic recordings. Applications show the method can accurately estimate tuning and inharmonicity from single notes or chords.
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
This paper presents an approach to speaker recognition using frequency spectral information with Mel frequency for the improvement of speech feature representation in a Vector Quantization codebook based recognition approach. The Mel frequency approach extracts the features of the speech signal to get the training and testing vectors. The VQ Codebook approach uses training vectors to form clusters and recognize accurately with the help of LBG algorithm.
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...ijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
This paper proposes a new voicing detection and pitch estimation method that is particularly robust for noisy speech. This method is based on the spectral analysis of the speech multi-scale product. The multi-scale product (MP) consists of making the product of wavelet transform coefficients. The wavelet used is the quadratic spline function. We argue that the spectral of Multi-scale Product Analysis is capable of revealing an estimate of a pitch-harmonic more accurately even in a heavy noisy scenario. We evaluate our approach on the Keele database. The experimental results show the robustness of our method for noisy speech, and the good performance for clean speech in comparison with state-of-the-art algorithms.
Introductory Lecture to Audio Signal ProcessingAngelo Salatino
The document provides an introduction to audio signal processing and related topics. It discusses analog and digital audio signals, the waveform audio file format (WAV) specification including its header structure, and tools for audio processing like FFmpeg and MATLAB. Example code is given to read header metadata and audio samples from a WAV file in C++. While useful for understanding audio formats and processing, the solution contains an error and FFmpeg is noted as a better library for audio tasks.
Blind audio source separation based on time-frequency structure modelsKitamura Laboratory
Daichi Kitamura, "Blind audio source separation based on time-frequency structure models," Invited Overview Session in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), Tokyo, Japan, December 2021.
(2013) Rigaud et al. - A parametric model and estimation techniques for the i...François Rigaud
This document proposes a parametric model to jointly model the inharmonicity and tuning of pianos across their entire pitch range. It uses a small number of parameters to represent both the specific design characteristics of different piano types and tuner practices. An estimation algorithm is presented that can estimate the parameters from recordings of isolated notes or chords, assuming the played notes are known. The model aims to provide a synthetic description of a particular piano's tuning/inharmonicity pattern that can highlight tuner choices and be useful for applications like piano synthesis or transcription of piano music.
Data Compression using Multiple Transformation Techniques for Audio Applicati...iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document compares the main lobe and side lobes of the frequency response curves for different types of FIR filters designed using the Fourier Series Expansion Method. It analyzes low pass, high pass, and band pass FIR filters with sampling frequencies of 4000Hz, 8000Hz, 12000Hz, 16000Hz, and 20000Hz. The maximum magnitudes in the main lobe and side lobes are observed and compared for each case. The results show that as the sampling frequency increases, the response curves move closer to the ideal response curves with smaller deviations in the magnitudes of the different lobes.
Prior distribution design for music bleeding-sound reduction based on nonnega...Kitamura Laboratory
Yusaku Mizobuchi, Daichi Kitamura, Tomohiko Nakamura, Hiroshi Saruwatari, Yu Takahashi, and Kazunobu Kondo, "Prior distribution design for music bleeding-sound reduction based on nonnegative matrix factorization," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021), pp. 651–658, Tokyo, Japan, December 2021.
Speaker recognition systems aim to automatically identify or verify a speaker's identity based on characteristics of their voice. There are two main types: speaker identification determines which registered speaker is speaking, while speaker verification accepts or rejects a speaker's claimed identity. All systems contain modules for feature extraction and feature matching. Feature extraction represents the voice signal with parameters like MFCCs that can distinguish speakers. Feature matching compares extracted features from an unknown voice to known speaker models. The document describes the process of MFCC feature extraction in detail, including framing the speech signal, windowing frames, taking the FFT, mapping to the mel scale, and finally the DCT to produce MFCC coefficients.
This document discusses speech signal processing and speech recognition. It begins by defining speech processing and its relationship to digital signal processing. It then outlines several disciplines related to speech processing including signal processing, physics, pattern recognition, and computer science. The document discusses aspects of speech signals including phonemes, the speech waveform, and spectral envelope. It covers various aspects of speech processing including pre-processing, feature extraction, and recognition. It provides details on techniques for pre-processing, feature extraction including filtering, linear predictive coding, and cepstrum. Finally, it summarizes the main steps in a speech recognition procedure including endpoint detection, framing and windowing, feature extraction, and distortion measure calculations for recognition.
PhD Thesis Marius Miron - Source Separation Methods for Orchestral MusicMarius Miron
Video from PhD thesis defence: Marius Miron - Source Separation Methods for Orchestral Music: Timbre-Informed and Score-Informed Strategies
More info here: http://mariusmiron.com/research/thesis/
http://mtg.upf.edu/node/3863
Video here: https://youtu.be/FDrFTTVOtr0
In this paper, the performances of adaptive noise cancelling system employing Least Mean Square (LMS) algorithm are studied considering both white Gaussian noise (Case 1) and colored noise (Case 2)
situations. Performance is analysed with varying number of iterations, Signal to Noise Ratio (SNR) and tap size with considering Mean Square Error (MSE) as the performance measurement criteria. Results show that the noise reduction is better as well as convergence speed is faster for Case 2 as compared with Case 1. It is also observed that MSE decreases with increasing SNR with relatively faster decrease of MSE in Case 2 as compared with Case 1, and on average MSE increases linearly with increasing number of filter
coefficients for both type of noise situations. All the experiments have been done using computer
simulations implemented on MATLAB platform.
This document summarizes a technique called mixed single frequency delay line filtering that is proposed to optimize computational complexity and processing delay for multichannel audio crosstalk cancellation. The technique combines mixed filtering and single frequency delay line filtering. Mixed filtering allows all filtering operations to be performed in a single equation in the frequency domain, reducing computations. Single frequency delay line filtering partitions long filter impulse responses into shorter partitions that are filtered using overlap-save, reducing delay. The proposed technique partitions impulse responses and applies mixed filtering to further reduce computations over existing methods. It is shown to provide less computational complexity and delay than overlap-save filtering while maintaining performance for long impulse responses.
Speech Analysis and synthesis using VocoderIJTET Journal
Abstract— In this paper, I proposed a speech analysis and synthesis using a vocoder. Voice conversion systems do not create new speech signals, but just transform existing one. The proposed speech vocoding is different from speech coding. To analyze the speech signal and represent it with less number of bits, so that bandwidth efficiency can be increased. The Synthesis of speech signal from the received bits of information. In this paper three aspects of analysis have been discussed: pitch refinement, spectral envelope estimation and maximum voiced frequency estimation. A Quasi-harmonic analysis model can be used to implement a pitch refinement algorithm which improves the accuracy of the spectral estimation. Harmonic plus noise model to reconstruct the speech signal from parameter. Finally to achieve the highest possible resynthesis quality using the lowest possible number of bits to transmit the speech signal. Future work aims at incorporating the phase information into the analysis and modeling process and also synthesis these three aspects in different pitch period.
Room Transfer Function Estimation and Room Equalization in Noise EnvironmentsIJERA Editor
Audio quality in listening situation is degraded by indoor room reverberation. Room equalization can be used to
increase the audio quality by applying the inverse transfer function to the input audio signals. In noise
environments, however, it is hard to exactly measure the room transfer function. In this work, we developed the
techniques to measure the room transfer function in indoor noise environments and to enhance the audio quality
by room equalization. From the experimental results, we showed that the proposed techniques can be
successfully used in indoor noise environments
This document summarizes a research paper on pitch detection of speech synthesis using MATLAB. It discusses using an adaptable filter and peak-valley decision method to determine pitch marks for speech synthesis. Low-pass filtering and autocorrelation are used to detect pitch periods. An adaptive filter is designed to flatten spectral peaks. Peak and valley costs are calculated over each pitch period to determine pitch marks. Dynamic programming is then used to obtain the optimal pitch mark locations for high quality speech synthesis.
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
In this paper, a system, developed for speech encoding, analysis, synthesis and gender identification is
presented. A typical gender recognition system can be divided into front-end system and back-end system.
The task of the front-end system is to extract the gender related information from a speech signal and
represents it by a set of vectors called feature. Features like power spectrum density, frequency at
maximum power carry speaker information. The feature is extracted using First Fourier Transform (FFT)
algorithm. The task of the back-end system (also called classifier) is to create a gender model to recognize
the gender from his/her speech signal in recognition phase. This paper also presents the digital processing
of a speech signals (pronounced “A” and “B”) which are taken from 10 persons, 5 of them are Male and
the rest of them are Female. Power Spectrum Estimation of the signal is examined .The frequency at
maximum power of the English Phonemes is extracted from the estimated power spectrum. The system uses
threshold technique as identification tool. The recognition accuracy of this system is 80% on average.
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
This paper proposes the combined methods of Wavelet Transform (WT) and Euclidean Distance
(ED) to estimate the expected value of the possibly feature vector of Indonesian syllables. This research
aims to find the best properties in effectiveness and efficiency on performing feature extraction of each
syllable sound to be applied in the speech recognition systems. This proposed approach which is the
state-of-the-art of the previous study consist of three main phase. In the first phase, the speech signal is
segmented and normalized. In the second phase, the signal is transformed into frequency domain by using
the WT. In the third phase, to estimate the expected feature vector, the ED algorithm is used. Th e result
shows the list of features of each syllables can be used for the next research, and some recommendations
on the most effective and efficient WT to be used in performing syllable sound recognition.
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Roman Atachiants
This paper approaches speaker recognition in a new way. A speaker recognition system has been realized that works on adult and child speakers, both male and female. Furthermore, the system employs text-dependent and text-independent algorithms, which makes robust speaker recognition possible in many applications. Single-speaker classication is achieved by age/sex pre-classication and is implemented using classic text-dependent techniques, as well as a novel technology for text-independent recognition. This new research uses Evolutionary Stable Strategies to model human speech and allows speaker recognition by analyzing just one vowel.
Frequency based criterion for distinguishing tonal and noisy spectral componentsCSCJournals
A frequency-based criterion for distinguishing tonal and noisy spectral components is proposed. For considered spectral local maximum two instantaneous frequency estimates are determined and the difference between them is used in order to verify whether component is noisy or tonal. Since one of the estimators was invented specially for this application its properties are deeply examined. The proposed criterion is applied to the stationary and nonstationary sinusoids in order to examine its efficiency.
Research on VoIP Acoustic Echo Cancelation Algorithm Based on SpeexTELKOMNIKA JOURNAL
Echo cancellation has been a major problem to be solved in VoIP, although the integrated echo cancellation module in Speex, it does not consider thread synchronization issues. The frequency domain echo cancellation algorithm MDF of speex is analyzed, and then a synchronization method of playing thread and recording thread is proposed. The results show that the acoustic echo canceller which achieved by the proposed method meet the requirements of voice communication, implementation is easier and therefore provides a reference for the VoIP voice communication and mobile communication terminal.
A NOVEL APPROACH TO CHANNEL DECORRELATION FOR STEREO ACOUSTIC ECHO CANCELLATI...a3labdsp
This document proposes a novel approach to decorrelating stereo acoustic signals for acoustic echo cancellation based on the psychoacoustic phenomenon of the "missing fundamental". The approach tracks and removes the pitch from one channel of the stereo signal using an adaptive notch filter, which greatly reduces inter-channel coherence in the lower spectrum without affecting signal quality. Experimental results show the proposed approach provides significant coherence reduction and faster convergence speed of adaptive filters compared to a masked noise injection method, while better preserving the stereo quality.
Single Channel Speech Enhancement using Wiener Filter and Compressive Sensing IJECEIAES
The speech enhancement algorithms are utilized to overcome multiple limitation factors in recent applications such as mobile phone and communication channel. The challenges focus on corrupted speech solution between noise reduction and signal distortion. We used a modified Wiener filter and compressive sensing (CS) to investigate and evaluate the improvement of speech quality. This new method adapted noise estimation and Wiener filter gain function in which to increase weight amplitude spectrum and improve mitigation of interested signals. The CS is then applied using the gradient projection for sparse reconstruction (GPSR) technique as a study system to empirically investigate the interactive effects of the corrupted noise and obtain better perceptual improvement aspects to listener fatigue with noiseless reduction conditions. The proposed algorithm shows an enhancement in testing performance evaluation of objective assessment tests outperform compared to other conventional algorithms at various noise type conditions of 0, 5, 10, 15 dB SNRs. Therefore, the proposed algorithm significantly achieved the speech quality improvement and efficiently obtained higher performance resulting in better noise reduction compare to other conventional algorithms.
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesMatthieu Hodgkinson
This document summarizes a new method for analyzing inharmonic instrumental tones called Median-Adjustive Trajectories (MAT). The method exploits an equation that relates the inharmonicity coefficient to the frequencies and numbers of any two partials from an inharmonic series. It estimates the frequencies of the first two prominent peaks to calculate an initial inharmonicity coefficient. This is then used along with the partial frequencies in iterative steps to estimate subsequent partial frequencies, refining the coefficient at each step. The estimates are based on medians of arrays calculated from the relevant equations to improve accuracy. The method allows efficient analysis of inharmonic spectra without exhaustive searches over parameter ranges.
Feature Extraction Analysis for Hidden Markov Models in Sundanese Speech Reco...TELKOMNIKA JOURNAL
Sundanese language is one of the popular languages in Indonesia. Thus, research in Sundanese language becomes essential to be made. It is the reason this study was being made. The vital parts to get the high accuracy of recognition are feature extraction and classifier. The important goal of this study was to analyze the first one. Three types of feature extraction tested were Linear Predictive Coding (LPC), Mel Frequency Cepstral Coefficients (MFCC), and Human Factor Cepstral Coefficients (HFCC). The results of the three feature extraction became the input of the classifier. The study applied Hidden Markov Models as its classifier. However, before the classification was done, we need to do the quantization. In this study, it was based on clustering. Each result was compared against the number of clusters and hidden states used. The dataset came from four people who spoke digits from zero to nine as much as 60 times to do this experiments. Finally, it showed that all feature extraction produced the same performance for the corpus used.
This document summarizes and compares different speech enhancement methods for noisy Punjabi speech at the phoneme level. It describes spectral subtraction, Wiener filtering, Kalman filtering, and RASTA methods. It also proposes a hybrid method using features from Wiener and Kalman filtering. Phonemes of Punjabi language and the methodology are explained. Results applying various noise types at different signal-to-noise ratios on phonemes show the proposed method produces the best enhancement compared to other methods.
This document proposes a noise reduction method for audio signals based on an LMS adaptive filter. It segments the noisy audio signal into frames and uses an NLMS adaptive algorithm to estimate the filter coefficients and minimize the mean square error between the clean signal and filter output. Simulation results show the proposed method significantly reduces noise and improves the signal to noise ratio by adaptively filtering the noisy audio signal in the time domain. Analysis of the output signal variance indicates the noise level is substantially decreased compared to the original noisy signal.
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...IRJET Journal
This document presents a speech enhancement method based on spectral subtraction involving the magnitude and phase components. The proposed method aims to improve noisy speech quality by estimating and subtracting noise from the noisy speech signal in the frequency domain. It involves segmenting the noisy speech into overlapping frames, estimating the noise spectrum during non-speech periods, subtracting the noise magnitude from the noisy speech magnitude, and reconstructing the enhanced speech using the inverse FFT and overlap-add processing. The method was tested on different noise types using MATLAB simulations. Results showed the proposed method achieved better noise reduction compared to conventional spectral subtraction while introducing less speech distortion.
Design of dfe based mimo communication system for mobile moving with high vel...Made Artha
The document discusses the design of a decision feedback equalizer (DFE) based multiple-input multiple-output (MIMO) communication system for mobile receivers moving at high velocities of up to 250km/hr. It analyzes the time dispersive and frequency dispersive effects of fading channels on signals. A DFE is proposed whose weights are periodically updated using the least mean squares (LMS) algorithm based on statistical channel parameters to combat the effects of fading. Simulation results show that the proposed MIMO system with a DFE achieves bit error rates below 10-3 at signal-to-noise ratios of 4dB and 10-4 at 6dB, even when the receiver is moving at 250km/hr.
Signal Processing of Radar Echoes Using Wavelets and Hilbert Huang Transformsipij
Atmospheric Radar Signal Processing is one field of Signal Processing where there is a lot of scope for development of new and efficient tools for spectrum cleaning, detection and estimation of desired parameters. The wavelet transform and HHT (Hilbert-Huang transform) are both signal processing methods. This paper is based on comparing HHT and Wavelet transform applied to Radar signals. Wavelet analysis is one of the most important methods for removing noise and extracting signal from any data. The de-noising application of the wavelets has been used in spectrum cleaning of the atmospheric radar signals. HHT can be used for processing non-stationary and nonlinear signals. HHT is one of the timefrequency analysis techniques which consists of two parts: Empirical Mode Decomposition (EMD) and instantaneous frequency solution. EMD is a numerical sifting process to decompose a signal into its fundamental intrinsic oscillatory modes, namely intrinsic mode functions (IMFs). A series of IMFs can be obtained after the application of EMD. In this paper wavelets and EMD has been applied to the time series data obtained from the mesosphere-stratosphere-troposphere (MST) region near Gadanki, Tirupati for 6 beam directions. The Algorithm is developed and tested using Matlab. Moments were estimated and analysis has brought out improvement in some of the characteristic features like SNR, Doppler width, Noise power of the atmospheric signals. The results showed that the proposed algorithm is efficient for dealing non-linear and non- stationary signals contaminated with noise. The results were compared using ADP (Atmospheric Data Processor) and plotted for validation of the proposed algorithm.
A NOVEL ENF EXTRACTION APPROACH FOR REGION-OF-RECORDING IDENTIFICATION OF MED...cscpconf
The electric network frequency (ENF) of power lines leaves its trace in nearby media recordings. The ENF signals vary in a consistent way in a given power grid. Therefore, it is possible to develop signal processing and machine learning techniques to identify the grid of origin by extracting attributes of the embedded ENF signal in recorded audio. This paper presents a model based on a novel ENF extraction technique with training on audio and power recordings from different grids. The proposed approach is based on correcting erroneously selected peaks from the Short Time Fourier Transform (STFT) by leveraging time correlations. These peaks are mistakenly taken for the frequency component belonging to the embedded ENF signal of the power grid and are corrected by the algorithm. Results on a test set of 50 recordings from nine different locations demonstrate the effectiveness of the proposed approach with an overall accuracy of 88%.
This document discusses audio compression using multiple transformation techniques for audio applications. It compares the Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) for compressing audio signals. The DCT and DWT are applied to audio signals to generate new data sets with smaller values, achieving compression. Performance is evaluated using metrics like compression ratio, peak signal-to-noise ratio, signal-to-noise ratio, and normalized root mean square error. The results show that DWT provides a lower compression ratio but higher performance metrics compared to DCT. Overall, the document examines using DCT and DWT transforms to compress audio signals and compares their performance.
Similar to Feature Extraction of Musical Instrument Tones using FFT and Segment Averaging (20)
Amazon products reviews classification based on machine learning, deep learni...TELKOMNIKA JOURNAL
In recent times, the trend of online shopping through e-commerce stores and websites has grown to a huge extent. Whenever a product is purchased on an e-commerce platform, people leave their reviews about the product. These reviews are very helpful for the store owners and the product’s manufacturers for the betterment of their work process as well as product quality. An automated system is proposed in this work that operates on two datasets D1 and D2 obtained from Amazon. After certain preprocessing steps, N-gram and word embedding-based features are extracted using term frequency-inverse document frequency (TF-IDF), bag of words (BoW) and global vectors (GloVe), and Word2vec, respectively. Four machine learning (ML) models support vector machines (SVM), logistic regression (RF), logistic regression (LR), multinomial Naïve Bayes (MNB), two deep learning (DL) models convolutional neural network (CNN), long-short term memory (LSTM), and standalone bidirectional encoder representations (BERT) are used to classify reviews as either positive or negative. The results obtained by the standard ML, DL models and BERT are evaluated using certain performance evaluation measures. BERT turns out to be the best-performing model in the case of D1 with an accuracy of 90% on features derived by word embedding models while the CNN provides the best accuracy of 97% upon word embedding features in the case of D2. The proposed model shows better overall performance on D2 as compared to D1.
Design, simulation, and analysis of microstrip patch antenna for wireless app...TELKOMNIKA JOURNAL
In this study, a microstrip patch antenna that works at 3.6 GHz was built and tested to see how well it works. In this work, Rogers RT/Duroid 5880 has been used as the substrate material, with a dielectric permittivity of 2.2 and a thickness of 0.3451 mm; it serves as the base for the examined antenna. The computer simulation technology (CST) studio suite is utilized to show the recommended antenna design. The goal of this study was to get a more extensive transmission capacity, a lower voltage standing wave ratio (VSWR), and a lower return loss, but the main goal was to get a higher gain, directivity, and efficiency. After simulation, the return loss, gain, directivity, bandwidth, and efficiency of the supplied antenna are found to be -17.626 dB, 9.671 dBi, 9.924 dBi, 0.2 GHz, and 97.45%, respectively. Besides, the recreation uncovered that the transfer speed side-lobe level at phi was much better than those of the earlier works, at -28.8 dB, respectively. Thus, it makes a solid contender for remote innovation and more robust communication.
Design and simulation an optimal enhanced PI controller for congestion avoida...TELKOMNIKA JOURNAL
This document describes using a snake optimization algorithm to tune the gains of an enhanced proportional-integral controller for congestion avoidance in a TCP/AQM system. The controller aims to maintain a stable and desired queue size without noise or transmission problems. A linearized model of the TCP/AQM system is presented. An enhanced PI controller combining nonlinear gain and original PI gains is proposed. The snake optimization algorithm is then used to tune the parameters of the enhanced PI controller to achieve optimal system performance and response. Simulation results are discussed showing the proposed controller provides a stable and robust behavior for congestion control.
Improving the detection of intrusion in vehicular ad-hoc networks with modifi...TELKOMNIKA JOURNAL
Vehicular ad-hoc networks (VANETs) are wireless-equipped vehicles that form networks along the road. The security of this network has been a major challenge. The identity-based cryptosystem (IBC) previously used to secure the networks suffers from membership authentication security features. This paper focuses on improving the detection of intruders in VANETs with a modified identity-based cryptosystem (MIBC). The MIBC is developed using a non-singular elliptic curve with Lagrange interpolation. The public key of vehicles and roadside units on the network are derived from number plates and location identification numbers, respectively. Pseudo-identities are used to mask the real identity of users to preserve their privacy. The membership authentication mechanism ensures that only valid and authenticated members of the network are allowed to join the network. The performance of the MIBC is evaluated using intrusion detection ratio (IDR) and computation time (CT) and then validated with the existing IBC. The result obtained shows that the MIBC recorded an IDR of 99.3% against 94.3% obtained for the existing identity-based cryptosystem (EIBC) for 140 unregistered vehicles attempting to intrude on the network. The MIBC shows lower CT values of 1.17 ms against 1.70 ms for EIBC. The MIBC can be used to improve the security of VANETs.
Conceptual model of internet banking adoption with perceived risk and trust f...TELKOMNIKA JOURNAL
Understanding the primary factors of internet banking (IB) acceptance is critical for both banks and users; nevertheless, our knowledge of the role of users’ perceived risk and trust in IB adoption is limited. As a result, we develop a conceptual model by incorporating perceived risk and trust into the technology acceptance model (TAM) theory toward the IB. The proper research emphasized that the most essential component in explaining IB adoption behavior is behavioral intention to use IB adoption. TAM is helpful for figuring out how elements that affect IB adoption are connected to one another. According to previous literature on IB and the use of such technology in Iraq, one has to choose a theoretical foundation that may justify the acceptance of IB from the customer’s perspective. The conceptual model was therefore constructed using the TAM as a foundation. Furthermore, perceived risk and trust were added to the TAM dimensions as external factors. The key objective of this work was to extend the TAM to construct a conceptual model for IB adoption and to get sufficient theoretical support from the existing literature for the essential elements and their relationships in order to unearth new insights about factors responsible for IB adoption.
Efficient combined fuzzy logic and LMS algorithm for smart antennaTELKOMNIKA JOURNAL
The smart antennas are broadly used in wireless communication. The least mean square (LMS) algorithm is a procedure that is concerned in controlling the smart antenna pattern to accommodate specified requirements such as steering the beam toward the desired signal, in addition to placing the deep nulls in the direction of unwanted signals. The conventional LMS (C-LMS) has some drawbacks like slow convergence speed besides high steady state fluctuation error. To overcome these shortcomings, the present paper adopts an adaptive fuzzy control step size least mean square (FC-LMS) algorithm to adjust its step size. Computer simulation outcomes illustrate that the given model has fast convergence rate as well as low mean square error steady state.
Design and implementation of a LoRa-based system for warning of forest fireTELKOMNIKA JOURNAL
This paper presents the design and implementation of a forest fire monitoring and warning system based on long range (LoRa) technology, a novel ultra-low power consumption and long-range wireless communication technology for remote sensing applications. The proposed system includes a wireless sensor network that records environmental parameters such as temperature, humidity, wind speed, and carbon dioxide (CO2) concentration in the air, as well as taking infrared photos.The data collected at each sensor node will be transmitted to the gateway via LoRa wireless transmission. Data will be collected, processed, and uploaded to a cloud database at the gateway. An Android smartphone application that allows anyone to easily view the recorded data has been developed. When a fire is detected, the system will sound a siren and send a warning message to the responsible personnel, instructing them to take appropriate action. Experiments in Tram Chim Park, Vietnam, have been conducted to verify and evaluate the operation of the system.
Wavelet-based sensing technique in cognitive radio networkTELKOMNIKA JOURNAL
Cognitive radio is a smart radio that can change its transmitter parameter based on interaction with the environment in which it operates. The demand for frequency spectrum is growing due to a big data issue as many Internet of Things (IoT) devices are in the network. Based on previous research, most frequency spectrum was used, but some spectrums were not used, called spectrum hole. Energy detection is one of the spectrum sensing methods that has been frequently used since it is easy to use and does not require license users to have any prior signal understanding. But this technique is incapable of detecting at low signal-to-noise ratio (SNR) levels. Therefore, the wavelet-based sensing is proposed to overcome this issue and detect spectrum holes. The main objective of this work is to evaluate the performance of wavelet-based sensing and compare it with the energy detection technique. The findings show that the percentage of detection in wavelet-based sensing is 83% higher than energy detection performance. This result indicates that the wavelet-based sensing has higher precision in detection and the interference towards primary user can be decreased.
A novel compact dual-band bandstop filter with enhanced rejection bandsTELKOMNIKA JOURNAL
In this paper, we present the design of a new wide dual-band bandstop filter (DBBSF) using nonuniform transmission lines. The method used to design this filter is to replace conventional uniform transmission lines with nonuniform lines governed by a truncated Fourier series. Based on how impedances are profiled in the proposed DBBSF structure, the fractional bandwidths of the two 10 dB-down rejection bands are widened to 39.72% and 52.63%, respectively, and the physical size has been reduced compared to that of the filter with the uniform transmission lines. The results of the electromagnetic (EM) simulation support the obtained analytical response and show an improved frequency behavior.
Deep learning approach to DDoS attack with imbalanced data at the application...TELKOMNIKA JOURNAL
A distributed denial of service (DDoS) attack is where one or more computers attack or target a server computer, by flooding internet traffic to the server. As a result, the server cannot be accessed by legitimate users. A result of this attack causes enormous losses for a company because it can reduce the level of user trust, and reduce the company’s reputation to lose customers due to downtime. One of the services at the application layer that can be accessed by users is a web-based lightweight directory access protocol (LDAP) service that can provide safe and easy services to access directory applications. We used a deep learning approach to detect DDoS attacks on the CICDDoS 2019 dataset on a complex computer network at the application layer to get fast and accurate results for dealing with unbalanced data. Based on the results obtained, it is observed that DDoS attack detection using a deep learning approach on imbalanced data performs better when implemented using synthetic minority oversampling technique (SMOTE) method for binary classes. On the other hand, the proposed deep learning approach performs better for detecting DDoS attacks in multiclass when implemented using the adaptive synthetic (ADASYN) method.
The appearance of uncertainties and disturbances often effects the characteristics of either linear or nonlinear systems. Plus, the stabilization process may be deteriorated thus incurring a catastrophic effect to the system performance. As such, this manuscript addresses the concept of matching condition for the systems that are suffering from miss-match uncertainties and exogeneous disturbances. The perturbation towards the system at hand is assumed to be known and unbounded. To reach this outcome, uncertainties and their classifications are reviewed thoroughly. The structural matching condition is proposed and tabulated in the proposition 1. Two types of mathematical expressions are presented to distinguish the system with matched uncertainty and the system with miss-matched uncertainty. Lastly, two-dimensional numerical expressions are provided to practice the proposed proposition. The outcome shows that matching condition has the ability to change the system to a design-friendly model for asymptotic stabilization.
Implementation of FinFET technology based low power 4×4 Wallace tree multipli...TELKOMNIKA JOURNAL
Many systems, including digital signal processors, finite impulse response (FIR) filters, application-specific integrated circuits, and microprocessors, use multipliers. The demand for low power multipliers is gradually rising day by day in the current technological trend. In this study, we describe a 4×4 Wallace multiplier based on a carry select adder (CSA) that uses less power and has a better power delay product than existing multipliers. HSPICE tool at 16 nm technology is used to simulate the results. In comparison to the traditional CSA-based multiplier, which has a power consumption of 1.7 µW and power delay product (PDP) of 57.3 fJ, the results demonstrate that the Wallace multiplier design employing CSA with first zero finding logic (FZF) logic has the lowest power consumption of 1.4 µW and PDP of 27.5 fJ.
Evaluation of the weighted-overlap add model with massive MIMO in a 5G systemTELKOMNIKA JOURNAL
The flaw in 5G orthogonal frequency division multiplexing (OFDM) becomes apparent in high-speed situations. Because the doppler effect causes frequency shifts, the orthogonality of OFDM subcarriers is broken, lowering both their bit error rate (BER) and throughput output. As part of this research, we use a novel design that combines massive multiple input multiple output (MIMO) and weighted overlap and add (WOLA) to improve the performance of 5G systems. To determine which design is superior, throughput and BER are calculated for both the proposed design and OFDM. The results of the improved system show a massive improvement in performance ver the conventional system and significant improvements with massive MIMO, including the best throughput and BER. When compared to conventional systems, the improved system has a throughput that is around 22% higher and the best performance in terms of BER, but it still has around 25% less error than OFDM.
Reflector antenna design in different frequencies using frequency selective s...TELKOMNIKA JOURNAL
In this study, it is aimed to obtain two different asymmetric radiation patterns obtained from antennas in the shape of the cross-section of a parabolic reflector (fan blade type antennas) and antennas with cosecant-square radiation characteristics at two different frequencies from a single antenna. For this purpose, firstly, a fan blade type antenna design will be made, and then the reflective surface of this antenna will be completed to the shape of the reflective surface of the antenna with the cosecant-square radiation characteristic with the frequency selective surface designed to provide the characteristics suitable for the purpose. The frequency selective surface designed and it provides the perfect transmission as possible at 4 GHz operating frequency, while it will act as a band-quenching filter for electromagnetic waves at 5 GHz operating frequency and will be a reflective surface. Thanks to this frequency selective surface to be used as a reflective surface in the antenna, a fan blade type radiation characteristic at 4 GHz operating frequency will be obtained, while a cosecant-square radiation characteristic at 5 GHz operating frequency will be obtained.
Reagentless iron detection in water based on unclad fiber optical sensorTELKOMNIKA JOURNAL
A simple and low-cost fiber based optical sensor for iron detection is demonstrated in this paper. The sensor head consist of an unclad optical fiber with the unclad length of 1 cm and it has a straight structure. Results obtained shows a linear relationship between the output light intensity and iron concentration, illustrating the functionality of this iron optical sensor. Based on the experimental results, the sensitivity and linearity are achieved at 0.0328/ppm and 0.9824 respectively at the wavelength of 690 nm. With the same wavelength, other performance parameters are also studied. Resolution and limit of detection (LOD) are found to be 0.3049 ppm and 0.0755 ppm correspondingly. This iron sensor is advantageous in that it does not require any reagent for detection, enabling it to be simpler and cost-effective in the implementation of the iron sensing.
Impact of CuS counter electrode calcination temperature on quantum dot sensit...TELKOMNIKA JOURNAL
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 °C, 120 °C, 150 °C, and 180 °C under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 °C. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 °C, which lowers resistance while increasing electron lifetime.
In place of the commercial Pt electrode used in quantum sensitized solar cells, the low-cost CuS cathode is created using electrophoresis. High resolution scanning electron microscopy and X-ray diffraction were used to analyze the structure and morphology of structural cubic samples with diameters ranging from 40 nm to 200 nm. The conversion efficiency of solar cells is significantly impacted by the calcination temperatures of cathodes at 100 °C, 120 °C, 150 °C, and 180 °C under vacuum. The fluorine doped tin oxide (FTO)/CuS cathode electrode reached a maximum efficiency of 3.89% when it was calcined at 120 °C. Compared to other temperature combinations, CuS nanoparticles crystallize at 120 °C, which lowers resistance while increasing electron lifetime.
A progressive learning for structural tolerance online sequential extreme lea...TELKOMNIKA JOURNAL
This article discusses the progressive learning for structural tolerance online sequential extreme learning machine (PSTOS-ELM). PSTOS-ELM can save robust accuracy while updating the new data and the new class data on the online training situation. The robustness accuracy arises from using the householder block exact QR decomposition recursive least squares (HBQRD-RLS) of the PSTOS-ELM. This method is suitable for applications that have data streaming and often have new class data. Our experiment compares the PSTOS-ELM accuracy and accuracy robustness while data is updating with the batch-extreme learning machine (ELM) and structural tolerance online sequential extreme learning machine (STOS-ELM) that both must retrain the data in a new class data case. The experimental results show that PSTOS-ELM has accuracy and robustness comparable to ELM and STOS-ELM while also can update new class data immediately.
Electroencephalography-based brain-computer interface using neural networksTELKOMNIKA JOURNAL
This study aimed to develop a brain-computer interface that can control an electric wheelchair using electroencephalography (EEG) signals. First, we used the Mind Wave Mobile 2 device to capture raw EEG signals from the surface of the scalp. The signals were transformed into the frequency domain using fast Fourier transform (FFT) and filtered to monitor changes in attention and relaxation. Next, we performed time and frequency domain analyses to identify features for five eye gestures: opened, closed, blink per second, double blink, and lookup. The base state was the opened-eyes gesture, and we compared the features of the remaining four action gestures to the base state to identify potential gestures. We then built a multilayer neural network to classify these features into five signals that control the wheelchair’s movement. Finally, we designed an experimental wheelchair system to test the effectiveness of the proposed approach. The results demonstrate that the EEG classification was highly accurate and computationally efficient. Moreover, the average performance of the brain-controlled wheelchair system was over 75% across different individuals, which suggests the feasibility of this approach.
Adaptive segmentation algorithm based on level set model in medical imagingTELKOMNIKA JOURNAL
For image segmentation, level set models are frequently employed. It offer best solution to overcome the main limitations of deformable parametric models. However, the challenge when applying those models in medical images stills deal with removing blurs in image edges which directly affects the edge indicator function, leads to not adaptively segmenting images and causes a wrong analysis of pathologies wich prevents to conclude a correct diagnosis. To overcome such issues, an effective process is suggested by simultaneously modelling and solving systems’ two-dimensional partial differential equations (PDE). The first PDE equation allows restoration using Euler’s equation similar to an anisotropic smoothing based on a regularized Perona and Malik filter that eliminates noise while preserving edge information in accordance with detected contours in the second equation that segments the image based on the first equation solutions. This approach allows developing a new algorithm which overcome the studied model drawbacks. Results of the proposed method give clear segments that can be applied to any application. Experiments on many medical images in particular blurry images with high information losses, demonstrate that the developed approach produces superior segmentation results in terms of quantity and quality compared to other models already presented in previeous works.
Automatic channel selection using shuffled frog leaping algorithm for EEG bas...TELKOMNIKA JOURNAL
Drug addiction is a complex neurobiological disorder that necessitates comprehensive treatment of both the body and mind. It is categorized as a brain disorder due to its impact on the brain. Various methods such as electroencephalography (EEG), functional magnetic resonance imaging (FMRI), and magnetoencephalography (MEG) can capture brain activities and structures. EEG signals provide valuable insights into neurological disorders, including drug addiction. Accurate classification of drug addiction from EEG signals relies on appropriate features and channel selection. Choosing the right EEG channels is essential to reduce computational costs and mitigate the risk of overfitting associated with using all available channels. To address the challenge of optimal channel selection in addiction detection from EEG signals, this work employs the shuffled frog leaping algorithm (SFLA). SFLA facilitates the selection of appropriate channels, leading to improved accuracy. Wavelet features extracted from the selected input channel signals are then analyzed using various machine learning classifiers to detect addiction. Experimental results indicate that after selecting features from the appropriate channels, classification accuracy significantly increased across all classifiers. Particularly, the multi-layer perceptron (MLP) classifier combined with SFLA demonstrated a remarkable accuracy improvement of 15.78% while reducing time complexity.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELijaia
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Transcat
Join us for this solutions-based webinar on the tools and techniques for commissioning and maintaining PV Systems. In this session, we'll review the process of building and maintaining a solar array, starting with installation and commissioning, then reviewing operations and maintenance of the system. This course will review insulation resistance testing, I-V curve testing, earth-bond continuity, ground resistance testing, performance tests, visual inspections, ground and arc fault testing procedures, and power quality analysis.
Fluke Solar Application Specialist Will White is presenting on this engaging topic:
Will has worked in the renewable energy industry since 2005, first as an installer for a small east coast solar integrator before adding sales, design, and project management to his skillset. In 2022, Will joined Fluke as a solar application specialist, where he supports their renewable energy testing equipment like IV-curve tracers, electrical meters, and thermal imaging cameras. Experienced in wind power, solar thermal, energy storage, and all scales of PV, Will has primarily focused on residential and small commercial systems. He is passionate about implementing high-quality, code-compliant installation techniques.
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Height and depth gauge linear metrology.pdfq30122000
Height gauges may also be used to measure the height of an object by using the underside of the scriber as the datum. The datum may be permanently fixed or the height gauge may have provision to adjust the scale, this is done by sliding the scale vertically along the body of the height gauge by turning a fine feed screw at the top of the gauge; then with the scriber set to the same level as the base, the scale can be matched to it. This adjustment allows different scribers or probes to be used, as well as adjusting for any errors in a damaged or resharpened probe.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Home security is of paramount importance in today's world, where we rely more on technology, home
security is crucial. Using technology to make homes safer and easier to control from anywhere is
important. Home security is important for the occupant’s safety. In this paper, we came up with a low cost,
AI based model home security system. The system has a user-friendly interface, allowing users to start
model training and face detection with simple keyboard commands. Our goal is to introduce an innovative
home security system using facial recognition technology. Unlike traditional systems, this system trains
and saves images of friends and family members. The system scans this folder to recognize familiar faces
and provides real-time monitoring. If an unfamiliar face is detected, it promptly sends an email alert,
ensuring a proactive response to potential security threats.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
2. TELKOMNIKA ISSN: 1693-6930
Feature Extraction of Musical Instrument Tones using FFT and Segment… (Linggo Sumarno)
1281
match them with the characteristic of the tones which had heard in the past [2]. Today, by using
computers, it can be developed a tone recognition system, which is based on the characteristic
of the tone. In this case that characteristic is called as feature extraction.
An approach to get the feature extraction is to use a transform domain. FFT and DCT
(Discrete Cosine Transform) are two examples of the transform methods that can be used. In
the transform domain, there are two ways to get the feature extraction. The first one is based on
fundamental signals [3-6] and the second one is not based on fundamental signals [7-11].
There was a problem in the field of feature extraction, which using the transform domain
approach that was not based on fundamental signals. It still showing a relatively large number
of the feature extraction coefficients. For example, based on the references, the feature
extraction for musical instrument sounds gave 34 coefficients [7], 32 coefficients [8], 12
coefficients [9], 9 coefficients [10], and 8 coefficients [11]. Therefore, there is still an opportunity
to further reduce the number of the feature extraction coefficients.
This paper proposes a feature extraction for musical instrument tones, with the
approach of the Fourier transform domain, which is not based on fundamental signals. The
proposed feature extraction has the ability to further reduce the number of feature extraction
coefficients that described above. In this paper, it will be discussed also the performance of the
proposed feature extraction, if it deals with the signals that have multiple and single significant
local peaks in the Fourier transform domain.
2. Research Methodology
2.1. Overall System Development
In order to explore the proposed feature extraction in this research, it had been
developed a tone recognition system, which is shown in Figure 2. The input is an isolated signal
tone in wav format, while the output is a text that indicates a recognized tone. Here is an
explanation of the blocks shown in Figure 2.
Figure 2. Overall system block diagram
2.1.1. Input
The input of the recognition system is a signal tone of a musical instrument (pianica,
tenor recorder, or bellyra) in an isolated wav format. From each musical instrument there are
eight tones (C, D, E, F, G, A, B, and C') used. That signal tones were obtained by recording the
sounds that coming out from the musical instrument, by using a sampling frequency of 5000 Hz.
That magnitude of the sampling frequency was chosen since it had met the Shannon sampling
theorem [12] as follow:
max2ffs (1)
where maxf is the highest frequency of the sound, which is coming out from a musical
instrument, sf is the sampling frequency. The magnitude of the sampling frequency (5000 Hz)
3. ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 3, September 2017 : 1280 – 1289
1282
has already exceeded the highest tone C’ on pianica, tenor recorder, and bellyra. Based on the
test spectrum, the highest tones C' on pianica, tenor recorder, and bellyra, were 1584 Hz, 547
Hz, and 2097 Hz respectively. Based on the evaluation, recording duration of 1.5 seconds was
sufficient, in order to obtain the middle area data which was already in the steady state
condition. This area was required for the purpose of frame blocking.
In order to generate the tone sounds, three musical instruments were used. They were
pianica Yamaha P-37D, tenor recorder Yamaha YRT-304B II, and bellyra Isuzu ZBL-27 (see
Figure 3). In order to capture the tone sounds, a USB microphone AKG Perception 120 was
used.
(a) Pianica (b)Tenor
recorder
(c) Bellyra
Figure 3. Pianica, tenor recorder, and bellyra, which were used for the research
2.1.2. Initial Normalization
Initial normalization is a process of setting the maximum value to one, from a series of
signal data, which comes from recording the sound that coming out from a musical instrument.
By setting the maximum value to one, it will eliminate the differences in the maximum values on
a number of the series of signal data.
2.1.3. Initial Cutting
Initial cutting is a process of cutting the silence area followed by cutting the transition
area of the tone signal. Silence area is an area that has no tone information, whereas the
transition area is the area where the tone signal is not in a steady state condition yet. In this
research, based on the graphical observation of the signals, the cutting of silence area could be
carried out by using an amplitude threshold |0.5|, while the cutting of the transition region could
be carried out by cutting for 0.1 seconds at the beginning area of the tone signal.
2.1.4. Frame Blocking
Frame blocking is a process of taking a frame signal from a long series of signal [13].
The purpose of frame blocking is to reduce the number of data signals to be processed. The
effect of this reduction is a reduction in computing time. In this research, a frame signal was
taken from the beginning area of the tone signal that had already in the steady state condition.
In this research, the length of a frame signal was equal to the length of the FFT that used in the
next process.
2.1.5. Windowing
Windowing is the process of reducing discontinuities at the edges of the signal [13].
This reduction is necessary in order to reduce the appearance of harmonic signals after the
signal is transformed using FFT. In this research, windowing used a Hamming window.
Hamming window w(n) with a width of N points defined [14].
4. TELKOMNIKA ISSN: 1693-6930
Feature Extraction of Musical Instrument Tones using FFT and Segment… (Linggo Sumarno)
1283
10,
1
2cos46.054.0)(
Nn
N
n
nw (2)
Hamming window is a window that is widely used in the field of signal processing. In this
research, the length of the windows was the same with the length of the frame blocking.
2.1.6. FFT
FFT is a process of changing the signal from the time domain to the Fourier transform
domain. Basically FFT is an efficient method for calculating the DFT (Discrete Fourier
Transform). DFT is a form of Fourier transforms that used for discrete signals. Mathematically,
DFT is formulated as follows.
1,,0,)()(
1
0
2
NkenxkX
N
n
kn
N
j
(3)
where x(n) = input series, X(k) = output series, and N = number of samples. In this research
FFT radix-2 [15] was used. This FFT is widely used in the field of signal processing. In this
research, the length of the FFT was evaluated with values of 16, 32, 64, 128, and 256 points.
2.1.7. Symmetry Cutting
Symmetry cutting is cutting the half of the right side of the FFT result. This cutting is
necessary, because basically FFT will give the symmetry result. Therefore, if only the half side
(right or left) is used, it is sufficient.
2.1.8. Segment Averaging
Segment averaging is a process for reducing the size of the data signal. Basically, the
result of segment averaging can represent the basic shape of the data signal. This research
made use of the segment averaging that had been inspired by Setiawan [16]. The algorithm of
the segment averaging is shown as follows.
1. Let a series }10|)({ MkkX , where pM 2 for 0p .
2. Calculate the segment length L, where qL 2 for pq 0 .
3. Cut uniformly the series X(k) by using segment length L. So, it will give a number of
S segments.
L
M
S , (4)
and also a series }1|)({ LrrF in every segment.
4. Calculate the mean value of every segment Z(v) as follows.
SvrF
L
vZ
L
r
v
1,)(
1
)(
1
. (5)
In this research, the segment lengths in the segment averaging were evaluated with the values
1, 2, 4, ..., and )2/(log22 N
points, where N is the length of the FFT used.
2.1.9. Final Normalization
Final normalization is a process of setting the maximum value to one, from a series of
signal data, which comes from segment averaging. As described above, by setting the
maximum value to one, it will eliminate the differences in the maximum values on a number of
the series of signal data.
5. ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 3, September 2017 : 1280 – 1289
1284
2.1.10. Distance Calculation
Distance calculation is a process for comparing the feature extraction of an input signal
and the feature extraction of a number of the tone signals, which is stored in a tone database.
Distance calculation is an indication of a pattern recognition method known as template
matching [17]. Template matching is basically trying to find the degree of similarity between two
vectors in the feature space. In order to find the degree of similarity, it can be used a distance
function. If using a distance function, there is a general guideline, the degree of similarity of the
two vectors will be increased away from zero (getting more similar) if the value of the distance
function decreases toward zero. On the contrary, the degree of similarity of the two vectors will
be decreased toward zero (getting not similar) if the value of the distance function is increased
away from zero.
Euclidean distance is a distance function that can be used to find the degree of
similarity between two vectors in the feature space. Euclidean distance is a distance that is
commonly used in pattern recognition [18]. Euclidean distance is defined as follows.
m
i ii yxE 1
2)(),( yx (6)
where x and y are two vectors of the same length, and m is the length of the vector x and y. In a
pattern recognition system which using template matching method, one of the vector (x or y) is a
vector that will be determined its class pattern, whereas the other vector is a vector that stored
in a pattern class.
2.1.11. Tone Decision
Tone decision is a process for determining the output tone of the input signal, which is
in the form of an isolated tone signal in wav format. Tone decision is carried out by finding a
minimum value from a number of distance calculation results. These results come from a
number of distance calculation, between the feature extraction of an input signal and the feature
extraction of a number of the tone signals, which is stored in the tone database. A tone, which
has a minimum distance, will be determined as the output tone.
2.2. Feature Extraction and Tone Database
The above distance calculation process requires a tone database. This database is
generated by using the tone feature extraction shown in Figure 4. From Figure 4, the input is an
isolated tone signal in wav format. The output is the feature extraction of the input. The output is
a vector of length S, where S value is obtained from the Equation (4).
Figure 4. Tone feature extraction block diagram
Based on Figure 4, the process of segment averaging gets a signal from the symmetry
cutting process. In this cutting symmetry process, the signal from the FFT process is divided
into two equal lengths. If the signal from the FFT process has length N, then the Equation (4)
can be rewritten to be as follows:
L
N
S
2/
(7)
6. TELKOMNIKA ISSN: 1693-6930
Feature Extraction of Musical Instrument Tones using FFT and Segment… (Linggo Sumarno)
1285
where S is the length of the feature extraction vector, N is the FFT length, and L is the segment
length in the segment averaging process.
By using the process illustrated in Figure 4, for each instrument (pianica, recorder tenor,
or bellyra), it is generated the feature extraction from a number of K samples for each tone (C,
D, E, F, G, A, B, or C'). Next, for each tone of a musical instrument, it is calculated the average
value of the following:
K
K
T
ZZZ
R
21
(8)
where Z1, Z2, … ZK are the feature extraction vectors of a tone of a musical instrument, which
have a number of K samples. In this research it was evaluated the value K with the values 1, 5,
10, and 15. Therefore, for this evaluation, it was necessary to provide a number of 120 tones for
each musical instrument, or number of 360 tones for all three instruments (pianica, tenor
recorder, and bellyra). The addition operation Z1 + Z2 + … + ZK is the addition operation of
vector elements. Vector { RT |T = C, D, E, F, G, A, B, or C’} is called as the reference vector.
In the generation of a tone database, every musical instrument has a tone database,
which consists of eight reference vectors, namely RC, RD. RE. RF. RG, RA, RB, and RC’. Thus, for
the three instruments (pianica, tenor recorder, and bellyra), there are three tone databases. On
the tone recognition system in this research, the tone database that is used depends on the
tone input, namely, whether the tone comes from pianica, tenor recorder, or bellyra.
2.3. Test Tones
Test tones are used to evaluate the performance of the developed tone recognition
system. In this research, for each instrument (pianica, tenor recorder, or bellyra), a number of
10 samples for each tone (C, D, E, F, G, A, B, and C ') were used. Thus, there were a number
of 80 test tones for each musical instrument, or a number 240 test tones for all three musical
instruments (pianica, tenor recorder, and bellyra).
2.4. Recognition Rate
The recognition rate is the performance a recognition system when given a number
particular inputs. The recognition rate is calculated using the following equation.
%100
tonestestofNumber
tonesrecognizedofNumber
ratenRecognitio (9)
In this research, as described above, a number of 80 test tones were used for each musical
instrument (pianica, tenor recorder, or bellyra).
3. Results and Analysis
3.1. Test Results
Test results of the tone recognition system for a variety of the FFT length, the segment
length, and the number of feature extraction vectors, are shown in Table 1, 2, and 3. As the first
note, the number of feature extraction vectors refers to the Equation (8). As the second note,
the segment length refers to the segment length used for the segment averaging (see Equation
(4) and (5)).
3.2. The Smallest Number of feature Extraction Coefficients
The main objective of this research is to find the smallest number of the feature
extraction coefficients, which can represent a tone. Therefore, by using Table 1, 2, and 3, it can
be explored the combination of the values of the FFT length, the segment length, and the
number of feature extraction vectors that can give the smallest number of the feature extraction
coefficients, at the highest recognition rate (in this case 100%). The exploration results of the
combination of these values, along with the number of the feature extraction coefficients are
shown in Table 4. As a note, in order to calculate the number of the feature extraction
coefficients, the equation (7) was used.
7. ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 3, September 2017 : 1280 – 1289
1286
As shown in Table 4, the features of a pianica tone can be represented by at least four
feature extraction coefficients, while a tone of tenor recorder or bellyra can be represented by at
least 16 feature extraction coefficients. Therefore, in general, it can be said that, for the tones of
musical instrument (e.g. pianica) which have many significant local peaks in the Fourier
transform domain, the feature extraction which using FFT and segment averaging is highly
efficient. The reason is that it can give at least four feature extraction coefficients.
Table 1. Test results of pianica musical instrument, in various combinations of the FFT length,
the segment length, and the number of feature extraction vectors
Results shown: Recognition rate (%)
FFT length
(points)
Number of feature extraction
vectors
Segment length (points)
1 2 4 8 16 32 64 128
16
1 86.25 71.25 45.00 12.50 - - - -
5 86.25 75.00 50.00 12.50 - - - -
10 87.25 76.25 50.00 12.50 - - - -
15 87.50 78.75 52.50 12.50 - - - -
32
1 100 95.00 78.75 46.25 12.50 - - -
5 100 97.50 86.25 52.50 12.50 - - -
10 100 100 92.50 53.75 12.50 - - -
15 100 100 95.00 56.25 12.50 - - -
64
1 100 100 100 81.25 57.50 12.50 - -
5 100 100 100 81.25 58.75 12.50 - -
10 100 100 100 85.00 58.75 12.50 - -
15 100 100 100 91.25 61.25 12.50 - -
128
1 100 100 100 98.75 81.25 52.50 12.50 -
5 100 100 100 98.75 88.75 55.00 12.50 -
10 100 100 100 98.75 92.50 55.00 12.50 -
15 100 100 100 98.75 98.75 60.00 12.50 -
256
1 100 100 100 100 100 98.75 62.50 12.50
5 100 100 100 100 100 100 65.00 12.50
10 100 100 100 100 100 100 65.00 12.50
15 100 100 100 100 100 100 67.50 12.50
Table 2. Test results of tenor recorder musical instrument, in various combinations of the FFT
length, the segment length, and the number of feature extraction vectors
Results shown: Recognition rate (%)
FFT length
(points)
Number of feature extractions
vectors
Segment length (points)
1 2 4 8 16 32 64 128
16
1 57.50 43.75 11.25 12.50 - - - -
5 60.00 48.75 16.25 12.50 - - - -
10 63.75 53.75 17.50 12.50 - - - -
15 63.75 58.75 27.50 12.50 - - - -
32
1 98.75 86.25 62.50 30.00 12.50 - - -
5 98.75 88.75 62.50 35.50 12.50 - - -
10 98.75 88.75 71.25 37.00 12.50 - - -
15 98.75 90.00 77.00 43.75 12.50 - - -
64
1 100 98.75 73.75 58.75 37.50 12.50 - -
5 100 100 85.00 56.25 42.50 12.50 - -
10 100 100 87.50 62.50 42.50 12.50 - -
15 100 100 87.50 71.25 43.75 12.50 - -
128
1 100 100 97.50 68.75 46.25 31.25 12.50 -
5 100 100 100 81.25 47.50 36.25 12.50 -
10 100 100 100 85.00 58.75 38.75 12.50 -
15 100 100 100 85.00 60.00 45.00 12.50 -
256
1 100 100 100 96.25 71.25 48.75 31.25 12.50
5 100 100 100 98.75 81.25 53.75 33.75 12.50
10 100 100 100 100 82.50 61.25 36.25 12.50
15 100 100 100 100 78.75 65.00 40.00 12.50
8. TELKOMNIKA ISSN: 1693-6930
Feature Extraction of Musical Instrument Tones using FFT and Segment… (Linggo Sumarno)
1287
Table 3. Test results of bellyra musical instrument, in various combinations of the FFT
length, the segment length, and the number of feature extractions vectors.
Results shown: Recognition rate (%).
FFT length
(points)
Number of
feature extraction
vectors
Segment length (points)
1 2 4 8 16 32 64 128
16
1 57.50 40.00 25.00 12.50 - - - -
5 81.25 60.00 27.50 12.50 - - - -
10 85.00 62.50 35.00 12.50 - - - -
15 88.75 65.00 36.25 12.50 - - - -
32
1 73.75 71.25 42.50 23.75 12.50 - - -
5 95.00 91.25 53.75 27.50 12.50 - - -
10 97.50 95.00 70.00 32.50 12.50 - - -
15 96.25 95.00 72.50 35.00 12.50 - - -
64
1 100 93.75 83.75 65.00 28.75 12.50 - -
5 100 97.50 91.25 76.25 32.50 12.50 - -
10 100 97.50 92.50 83.75 33.75 12.50 - -
15 100 98.75 92.50 80.00 36.25 12.50 - -
128
1 100 98.75 93.75 83.75 77.50 30.00 12.50 -
5 100 100 97.50 95.00 83.75 33.75 12.50 -
10 100 100 98.75 97.50 92.50 36.25 12.50 -
15 100 100 98.75 97.50 90.00 42.50 12.50 -
256
1 100 100 100 92.50 85.00 73.75 26.25 12.50
5 100 100 100 100 96.25 83.75 38.75 12.50
10 100 100 100 100 98.75 87.50 38.75 12.50
15 100 100 100 100 98.75 87.50 42.50 12.50
Table 4. The combination of the values of FFT length, segment length, number of feature
extraction vectors, which resulted in the smallest number of feature extraction coefficients
Parameter
Musical Instrument
Pianica Tenor
Recorder
Bellyra
FFT length (points) 256 64 256
Segment length (points) 32 2 8
Number of feature extraction vectors 5 5 5
Number of feature extraction coefficients 4 16 16
For the tone of a musical instrument that has only one significant local peak in the
Fourier transform domain (e.g. bellyra), or the tone of a musical instrument that has a few
significant local peaks in the Fourier transform domain (e.g. tenor recorder), the feature
extraction is relatively inefficient. The reason is that it can give at least 16 coefficients feature
extraction to represent each tone. Figure 5 and Table 5 shows the reason of this case. As a
note to Figure 5, the results of feature extraction are described by linear interpolation line, for
easy visual comparison.
As shown in Figure 5, the feature extractions of tone C and D on tenor recorder and
also bellyra are lookalike (because they look almost overlap), when compared with pianica. In
order to be more exact, the feature extraction coefficients of tone C and D, on tenor recorder
and also bellyra, as shown in Table 5, has Euclidean distances that are much smaller, when
compared to pianica. It means, the feature extraction coefficients from the adjacent tones on
tenor recorder and also bellyra, are more difficult to be distinguished each other, compared with
pianica. Furthermore, in order that adjacent tones of tenor recorder and also bellyra can be
distinguished easier, there is a need in increasing the number of feature extraction coefficients.
Therefore, tenor recorder and also bellyra require a greater number of feature extraction
coefficients, when compared with pianica.
9. ISSN: 1693-6930
TELKOMNIKA Vol. 15, No. 3, September 2017 : 1280 – 1289
1288
Figure 5. The examples of feature extraction differences between pianica, tenor recorder,
and pianica, when using an FFT 256 points and a segment length 16 points
Table 5. Euclidean distances for the feature extractions of tone C and D, for pianica, tenor
recorder, and bellyra, in Figure 5
Musical Instrument Euclidean distance of tone C and D
Pianica 0.966
Tenor Recorder 0.262
Bellyra 0.097
3.3. Comparison with other Feature Extractions
As has been described above, the feature extraction that used FFT and segment
averaging is highly efficient for use in the musical instrument tones that have many significant
local peaks in the Fourier transform domain. Table 6 compares the performance of several
feature extraction for pianica tone recognition. As shown in Table 6, the feature extraction used
in this research, when compared with the other feature extraction that have been published
previously [7-11], is highly efficient for use in the musical instrument tones that have many
significant local peaks in the Fourier transform domain. The reason is that it can give the
number of feature extraction coefficients that are significantly much smaller.
Table 6. The comparison of the number of coefficients of the smallest feature extraction results,
which resulted in a 100% recognition rate, of several feature extraction for pianica tone
recognition
Feature extraction Number of feature extraction coefficients
Spectral Features [7] 34
DCT and windowing coefficients [8] 32
FFT and windowing coefficients [9] 12
Musical Surface [10] 9
DCT and segment averaging [11] 8
FFT and segment averaging (this research) 4
4. Conclusions and Future Work
Based on the things that have been described above, it can be concluded the following:
1. A feature extraction with FFT and segment averaging is highly efficient for musical
instrument tones, which have many significant local peaks in the Fourier transform domain. The
reason is that it only required at least four feature extraction coefficients, in order to represent
every tone.
2. A feature extraction with FFT and segment averaging is not efficient for musical
instrument tones, which have one or several significant local peaks in the Fourier transform
domain. The reason is that it required at least 16 feature extraction coefficients, in order to
represent every tone.
10. TELKOMNIKA ISSN: 1693-6930
Feature Extraction of Musical Instrument Tones using FFT and Segment… (Linggo Sumarno)
1289
Here are some suggestions for future work:
1. Further studies of the feature extractions that are not based on a fundamental signal,
which can produce a number of feature extraction coefficients less than 16 coefficients, for the
tone of musical instruments that have one or several significant local peaks in the Fourier
transform domain.
2. Further studies about the performance of feature extraction, which uses FFT and
segment averaging, when using the tone recognition methods other than the template matching.
Acknowledgements
This work has been supported by The Institution of Research and Community Service
of Sanata Dharma University, Yogyakarta.
References
[1] Forster C. Musical Mathematics: On the Art of Science and Acoustic Instruments. California: Chronicle
Books LLC. 2010. 7-8.
[2] McAdams S. Recognition of Auditory Sound Sources and Events. Thinking in Sound: The Cognitive
Psychology of Human Audition. Oxford: Oxford University Press. 1993: 146-198.
[3] Noll M. Pitch Determination of Human Speech by the Harmonic Product Spectrum, the Harmonic Sum
Spectrum and a Maximum Likelihood Estimate. Proceedings of the Symposium on Computer
Processing in Communications. Brooklyn, New York. 1970; 19: 779-797.
[4] Mitre A, Queiroz M, Faria R. Accurate and Efficient Fundamental Frequency Determination from
Precise Partial Estimates. Proceedings of the 4th AES Brazil Conference. 2006: 113-118.
[5] Izzudin A, Santoso TB, Dutono T. Recognition of Guitar Single Tones using Digital Signal Processing.
EEPIS Journal Online System. 2005; 10(1).
[6] Gaffar I, Hidayatno A, Zahra AA. An Application to Convert Single Instrument Tones become a Chord
using Pitch Class Profile Method. Transient. 2012; 1(3): 121-127.
[7] Tjahyanto A, Suprapto YK, Wulandari DP, Spectral-based Features Ranking for Gamelan Instruments
Identification using Filter Techniques. TELKOMNIKA Indonesian Journal of Electrical Engineering.
2013; 11(1): 95-106.
[8] Sumarno L. Recognition of Pianica Tones using Gaussian Window, DCT, and Cosine Distance.
Research Journal. 2013; 17(1): 8-15.
[9] Sumarno L. Recognition of Pianica Tones using Blackman Window and Fast Fourier Transform
Feature Extraction (in Indonesian). Media Teknika. 2014; 9(2): 84-93.
[10] Mutiara AB, Refianti R, Mukarromah NRA, Musical Genre Classification Using Support Vector
Machine and Audio Features. TELKOMNIKA Indonesian Journal of Electrical Engineering. 2016;
14(3): 1024-1034.
[11] Sumarno L. On The Performace of Segment Averaging of Discrete Cosine Transform Coefficients on
Musical Instruments Tone Recognition. ARPN Journal of Engineering and Applied Sciences. 2016;
11(9): 5644-5649.
[12] Tan L, Jiang J. Digital Signal Processing Fundamentals and Applications. Second Edition. Oxford:
Elsevier Inc. 2013: 15-56.
[13] Meseguer NA. Speech Analysis for Automatic Speech Recognition. MSc Thesis. Trondheim: NTNU;
2009.
[14] Harris FJ. On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform.
Proceedings of the IEEE. 1978; 66(1): 51-83.
[15] Proakis JG, Manolakis DG. Digital Signal Processing: Principles, Algorithm, and Applications. Fourth
Edition. New Jersey: Prentice Hall Inc. 2007: 511-562.
[16] Setiawan YR. Numbers Speech Recognition using Fast Fourier Transform and Cosine Similarity.
Undergraduate Thesis. Yogyakarta: Sanata Dharma University. 2015.
[17] Jain AK, Duin RPW, Mao J. Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern
Analysis and Machine Intelligence. 2000; 22(1): 4-37.
[18] Wilson DR, Martinez TR. Improved Heterogeneous Distance Function. Journal of Artificial Intelligence
Research. 1997; 6: 1-34.