This document proposes using robust principal component analysis (RPCA) to separate singing voices from music accompaniment in monaural recordings. RPCA decomposes an audio signal matrix into a low-rank matrix representing the repetitive music accompaniment and a sparse matrix representing the more variable singing voice. RPCA is applied to the short-time Fourier transform (STFT) of an audio signal. A binary time-frequency mask is then used to separate the estimated singing voice and accompaniment based on their STFT magnitudes in the low-rank and sparse matrices. Evaluations on the MIR-1K dataset show this RPCA-based method achieves 1-1.4 dB higher source-to-distortion ratio than other state
Blind Source Separation using Dictionary LearningDavide Nardone
The sparse decomposition of images and signals found great use in the field of: Compression, Noise removal and also in the Sources separation. This implies the decomposition of signals in the form of linear combinations with some elements of a redundant dictionary. The dictionary may be either a fixed dictionary (Fourier, Wavelet, etc) or may be learned from a set of samples. The algorithms based on learning the dictionary can be applied to a broad class of signals and have a better compression performance than methods based on fixed dictionary. Here we present a Compressed Sensing (CS) approach with an adaptive dictionary for solving a Determined Blind Source Separation (DBSS). The proposed method has been developed by reformulating a DBSS as Sparse Coding (SC) problem. The algorithm consist of few steps: Mixing matrix estimation, Sparse source separation and Source reconstruction. A sparse mixture of the original source signals has been used for the estimating the mixing matrix which have been used for the reconstruction of the of the source signals. A 'block signal representation' is used for representing the mixture in order to greatly improve the computation efficiency of the 'mixing matrix estimation' and the 'signal recovery' processes without particularly lose separation accuracy. Some experimental results are provided to compare the computation and separation performance of the method by varying the type of the dictionary used, be it fixed or an adaptive one. Finally a real case of study in the field of the Wireless Sensor Network (WSN) is illustrated in which a set of sensor nodes relay data to a multi-receiver node. Since more nodes transmits messages simultaneously it's necessary to separate the mixture of information at the receiver, thus solving a BSS problem.
On the Performance Analysis of Multi-antenna Relaying System over Rayleigh Fa...IDES Editor
In this work, the end-to-end performance of an
amplify-and-forward multi-antenna infrastructure-based relay
(fixed relay) system over flat Rayleigh fading channel is
investigated. New closed form expressions for the statistics of
the received signal-to-noise ratio (SNR) are presented and
applied for studying the outage probability and the average
bit error rate of the digital receivers. The results reveal that
the system performance improves significantly (roughly 3 dB)
for M=2 over that for M=1 in both low and high signal-tonoise
ratio. However, little additional performance
improvement can be achieved for M>2 relative to M=2 at high
SNR.
Performance Analysis of Image Enhancement Using Dual-Tree Complex Wavelet Tra...IJERD Editor
Resolution enhancement (RE) schemes which are not based on wavelets has one of the major
drawbacks of losing high frequency contents which results in blurring. The discrete wavelet- transform-based
(DWT) Resolution Enhancement scheme generates artifacts (due to a DWT shift-variant property). A wavelet-
Domain approach based on dual-tree complex wavelet transform (DT-CWT) & nonlocal means (NLM) is
proposed for RE of the satellite images. A satellite input image is decomposed by DT-CWT (which is nearly
shift invariant) to obtain high-frequency sub bands. Here the Lanczos interpolator is used to interpolate the highfrequency
sub bands & the low-resolution (LR) input image. The high frequency sub bands are passed through
an NLM filter to cater for the artifacts generated by DT-CWT (despite of it’s nearly shift invariance). The
filtered high-frequency sub bands and the LR input image are combined by using inverse DTCWT to obtain a
resolution-enhanced image. Objective and subjective analyses show superiority of the new proposed technique
over the conventional and state-of-the-art RE techniques.
Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence.
Blind Source Separation using Dictionary LearningDavide Nardone
The sparse decomposition of images and signals found great use in the field of: Compression, Noise removal and also in the Sources separation. This implies the decomposition of signals in the form of linear combinations with some elements of a redundant dictionary. The dictionary may be either a fixed dictionary (Fourier, Wavelet, etc) or may be learned from a set of samples. The algorithms based on learning the dictionary can be applied to a broad class of signals and have a better compression performance than methods based on fixed dictionary. Here we present a Compressed Sensing (CS) approach with an adaptive dictionary for solving a Determined Blind Source Separation (DBSS). The proposed method has been developed by reformulating a DBSS as Sparse Coding (SC) problem. The algorithm consist of few steps: Mixing matrix estimation, Sparse source separation and Source reconstruction. A sparse mixture of the original source signals has been used for the estimating the mixing matrix which have been used for the reconstruction of the of the source signals. A 'block signal representation' is used for representing the mixture in order to greatly improve the computation efficiency of the 'mixing matrix estimation' and the 'signal recovery' processes without particularly lose separation accuracy. Some experimental results are provided to compare the computation and separation performance of the method by varying the type of the dictionary used, be it fixed or an adaptive one. Finally a real case of study in the field of the Wireless Sensor Network (WSN) is illustrated in which a set of sensor nodes relay data to a multi-receiver node. Since more nodes transmits messages simultaneously it's necessary to separate the mixture of information at the receiver, thus solving a BSS problem.
On the Performance Analysis of Multi-antenna Relaying System over Rayleigh Fa...IDES Editor
In this work, the end-to-end performance of an
amplify-and-forward multi-antenna infrastructure-based relay
(fixed relay) system over flat Rayleigh fading channel is
investigated. New closed form expressions for the statistics of
the received signal-to-noise ratio (SNR) are presented and
applied for studying the outage probability and the average
bit error rate of the digital receivers. The results reveal that
the system performance improves significantly (roughly 3 dB)
for M=2 over that for M=1 in both low and high signal-tonoise
ratio. However, little additional performance
improvement can be achieved for M>2 relative to M=2 at high
SNR.
Performance Analysis of Image Enhancement Using Dual-Tree Complex Wavelet Tra...IJERD Editor
Resolution enhancement (RE) schemes which are not based on wavelets has one of the major
drawbacks of losing high frequency contents which results in blurring. The discrete wavelet- transform-based
(DWT) Resolution Enhancement scheme generates artifacts (due to a DWT shift-variant property). A wavelet-
Domain approach based on dual-tree complex wavelet transform (DT-CWT) & nonlocal means (NLM) is
proposed for RE of the satellite images. A satellite input image is decomposed by DT-CWT (which is nearly
shift invariant) to obtain high-frequency sub bands. Here the Lanczos interpolator is used to interpolate the highfrequency
sub bands & the low-resolution (LR) input image. The high frequency sub bands are passed through
an NLM filter to cater for the artifacts generated by DT-CWT (despite of it’s nearly shift invariance). The
filtered high-frequency sub bands and the LR input image are combined by using inverse DTCWT to obtain a
resolution-enhanced image. Objective and subjective analyses show superiority of the new proposed technique
over the conventional and state-of-the-art RE techniques.
Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence.
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...IDES Editor
Using the Mel-frequency cepstral coefficients
(MFCC), Human Factor cepstral coefficients (HFCC) and
their new parameters derived from log dynamic spectrum and
dynamic log spectrum, these features are widely used for
speech recognition in various applications. But, speech
recognition systems based on these features do not perform
efficiently in the noisy conditions, mobile environment and
for speech variation between users of different genders and
ages. To maximize the recognition rate of speaker independent
isolated word recognition system, we combine both of the above
features and proposed a hybrid feature set of them. We tested
the system for this hybrid feature vector and we gained results
with accuracy of 86.17% in clean condition (closed window),
82.33% in class room open window environment, and 73.67%
in outdoor with noisy environment.
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
This paper proposes a multimodal biometric system
using palmprint and speech signal. In this paper, we propose a
novel approaches for both the modalities. We extract the
features using Subband Cepstral Coefficients for speech signal
and Modified Canonical method for palmprint. The individual
feature score are passed to the fusion level. Also we have
proposed a new fusion method called weighted score. This
system is tested on clean and degraded database collected by
the authors for more than 300 subjects. The results show
significant improvement in the recognition rate.
Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOAMartin Pelikan
An automated technique has recently been proposed to transfer learning in the hierarchical Bayesian optimization algorithm (hBOA) based on distance-based statistics. The technique enables practitioners to improve hBOA efficiency by collecting statistics from probabilistic models obtained in previous hBOA runs and using the obtained statistics to bias future hBOA runs on similar problems. The purpose of this paper is threefold: (1) test the technique on several classes of NP-complete problems, including MAXSAT, spin glasses and minimum vertex cover; (2) demonstrate that the technique is effective even when previous runs were done on problems of different size; (3) provide empirical evidence that combining transfer learning with other efficiency enhancement techniques can often yield nearly multiplicative speedups.
Performance Analysis of M-ary Optical CDMA in Presence of Chromatic DispersionIDES Editor
The performance of M-ary optical code division
multiple access (OCDMA) is analytically investigated in
presence of chromatic dispersion. The study is carried out for
single mode dispersion shifted and non dispersion shifted
fibers. Walsh code is used as user address. The p-i-n
photodetector is used for optoelectronic conversion process.
In our proposed model 16 different symbols are modulated
with different intensity levels and detected by direct detection
technique. The numerical results show that, the reconstruction
of the transmitted symbol is strongly dependent on the received
symbols magnitude which is reduced by fiber length and
symbol rate. It is found that the proposed OCDMA system
shows better performance when dispersion shifted fiber is
used as a communication medium.
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionCSCJournals
In this paper, a phase space reconstruction-based method is proposed for speech enhancement. The method embeds the noisy signal into a high dimensional reconstructed phase space and uses Spectral Subtraction idea. The advantages of the proposed method are fast performance, high SNR and good MOS. In order to evaluate the proposed method, ten signals of TIMIT database mixed with the white additive Gaussian noise and then the method was implemented. The efficiency of the proposed method was evaluated by using qualitative and quantitative criteria.
Optimal Synthesis of Array Pattern for Concentric Circular Antenna Array Usin...IDES Editor
In this paper one optimization heuristic search
technique, Hybrid Evolutionary Programming (HEP) is
applied to the process of synthesizing three-ring Concentric
Circular Antenna Array (CCAA) focused on maximum
sidelobe-level reduction. This paper assumes non-uniform
excitations and uniform spacing of excitation elements in each
three-ring CCAA design. Experimental results reveal that the
design of non-uniformly excited CCAAs with optimal current
excitations using the method of HEP provides a considerable
sidelobe level reduction with respect to the uniform current
excitation with d=λ/2 element-to-element spacing. Among the
various CCAA designs, the design containing central element
and 4, 6 and 8 elements in three successive concentric rings
proves to be such global optimal design with global minimum
SLL (-40.22 dB) as determined by HEP.
From nebular to pynebular: a new package for the analysis of emission linesAstroAtom
Poster presented by . Luridiana, C. Morisset, R.A. Shaw, D. Díaz-González at the IAU Symposium 283, Planetary Nebulae: an Eye to the Future, 25-29 July 2011, Tenerife, Spain.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...IDES Editor
OFDM is a popular and widely accepted modulation
and multiplexing technique in the area of wireless
communication. IEEE 802.15, a wireless specification defined
for WPAN is an emerging wireless technology for short range
multimedia applications. Two general categories of 802.15
are the low rate 802.15.4 (ZigBee) and high rate 802.15.3
(UWB). In their physical (PHY) layer design, OFDM is a
competing technique due to the various advantages it renders
in the practical wireless media. OFDM has been a popular
technique for many years and adopted as the core technique
in a number of wireless standards. It makes the system more
immune to interference like InterSymbol Interference (ISI)
and InterCarrier Interference (ICI) and dispersive effects of
the channel. It is also a spectrally efficient scheme since the
spectra of the signal are overlapping in nature. Despite these
advantages OFDM suffers from a serious problem of high
Peak to Average Power. This limits the system’s capabilities
and increases the complexity. This paper compares the signal
distortion technique of Amplitude Clipping and the
distortionless technique of SLM for Peak to Average Power
reduction
Resource-Oriented Architecture offers advantages over other web-service architectures. It is based on a simple, scalable and highly standardised application-level protocol. Multimedia content is commonly managed using the MPEG-7. The MPEG-7 is a standard for representing audiovisual information that satisfies specific requirements based on syntax, semantic and decoding. Content descriptions under MPEG-7 can be organised and characterized without ambiguity. The MPEG-7 eXperimental Model (XM) includes the best performing tools for MPEG-7 normative and non-normative elements. In this paper, multimedia content is managed using the MPEG-7 eXperimental Model functionalities and provided using web-services technology. RESTful principles are the guidelines for achieving multimedia content storage and retrieval. Quantitative evaluation of the proposed web services has shown that this approach has better performance, in term of retrieval speed and storage space
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Performance of Matching Algorithmsfor Signal Approximationiosrjce
IOSR Journal of Electronics and Communication Engineering(IOSR-JECE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of electronics and communication engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in electronics and communication engineering. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Eigen Subspace based Direction of Arrival Estimation for Coherent SourcesINFOGAIN PUBLICATION
Direction of arrival (DOA) estimation technology plays an important role in enhancing the performance of the adaptive arrays for mobile communication. In this paper comparative performance analysis of eigen subspace based DOA estimation for coherent sources is presented. A number of DOA estimation algorithms based on eigen subspace method have been developed. Among these MUSIC algorithm is considered to have exceptionally good results. The focus of this paper is to unveil the performance characteristics of MUSIC algorithm and its improved version for coherent sources. The simulation results show that the improved MUSIC algorithm is the best. Also it can be observed that the resolution of DOA estimation improves as the number of snapshots and signal to noise ratio increases.
performance analysis of MUSIC and ESPRIT DOA estimation used in adaptive arra...CSCJournals
MUSIC and ESPRIT DOA Estimaion algorithms are widely used in adaptive array to locate the desired desired signal.MUSIC is found to be more stable and accurate and widely used inthe design of adaptive array.
Dynamic Spectrum Derived Mfcc and Hfcc Parameters and Human Robot Speech Inte...IDES Editor
Using the Mel-frequency cepstral coefficients
(MFCC), Human Factor cepstral coefficients (HFCC) and
their new parameters derived from log dynamic spectrum and
dynamic log spectrum, these features are widely used for
speech recognition in various applications. But, speech
recognition systems based on these features do not perform
efficiently in the noisy conditions, mobile environment and
for speech variation between users of different genders and
ages. To maximize the recognition rate of speaker independent
isolated word recognition system, we combine both of the above
features and proposed a hybrid feature set of them. We tested
the system for this hybrid feature vector and we gained results
with accuracy of 86.17% in clean condition (closed window),
82.33% in class room open window environment, and 73.67%
in outdoor with noisy environment.
Designing an Efficient Multimodal Biometric System using Palmprint and Speech...IDES Editor
This paper proposes a multimodal biometric system
using palmprint and speech signal. In this paper, we propose a
novel approaches for both the modalities. We extract the
features using Subband Cepstral Coefficients for speech signal
and Modified Canonical method for palmprint. The individual
feature score are passed to the fusion level. Also we have
proposed a new fusion method called weighted score. This
system is tested on clean and degraded database collected by
the authors for more than 300 subjects. The results show
significant improvement in the recognition rate.
Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOAMartin Pelikan
An automated technique has recently been proposed to transfer learning in the hierarchical Bayesian optimization algorithm (hBOA) based on distance-based statistics. The technique enables practitioners to improve hBOA efficiency by collecting statistics from probabilistic models obtained in previous hBOA runs and using the obtained statistics to bias future hBOA runs on similar problems. The purpose of this paper is threefold: (1) test the technique on several classes of NP-complete problems, including MAXSAT, spin glasses and minimum vertex cover; (2) demonstrate that the technique is effective even when previous runs were done on problems of different size; (3) provide empirical evidence that combining transfer learning with other efficiency enhancement techniques can often yield nearly multiplicative speedups.
Performance Analysis of M-ary Optical CDMA in Presence of Chromatic DispersionIDES Editor
The performance of M-ary optical code division
multiple access (OCDMA) is analytically investigated in
presence of chromatic dispersion. The study is carried out for
single mode dispersion shifted and non dispersion shifted
fibers. Walsh code is used as user address. The p-i-n
photodetector is used for optoelectronic conversion process.
In our proposed model 16 different symbols are modulated
with different intensity levels and detected by direct detection
technique. The numerical results show that, the reconstruction
of the transmitted symbol is strongly dependent on the received
symbols magnitude which is reduced by fiber length and
symbol rate. It is found that the proposed OCDMA system
shows better performance when dispersion shifted fiber is
used as a communication medium.
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionCSCJournals
In this paper, a phase space reconstruction-based method is proposed for speech enhancement. The method embeds the noisy signal into a high dimensional reconstructed phase space and uses Spectral Subtraction idea. The advantages of the proposed method are fast performance, high SNR and good MOS. In order to evaluate the proposed method, ten signals of TIMIT database mixed with the white additive Gaussian noise and then the method was implemented. The efficiency of the proposed method was evaluated by using qualitative and quantitative criteria.
Optimal Synthesis of Array Pattern for Concentric Circular Antenna Array Usin...IDES Editor
In this paper one optimization heuristic search
technique, Hybrid Evolutionary Programming (HEP) is
applied to the process of synthesizing three-ring Concentric
Circular Antenna Array (CCAA) focused on maximum
sidelobe-level reduction. This paper assumes non-uniform
excitations and uniform spacing of excitation elements in each
three-ring CCAA design. Experimental results reveal that the
design of non-uniformly excited CCAAs with optimal current
excitations using the method of HEP provides a considerable
sidelobe level reduction with respect to the uniform current
excitation with d=λ/2 element-to-element spacing. Among the
various CCAA designs, the design containing central element
and 4, 6 and 8 elements in three successive concentric rings
proves to be such global optimal design with global minimum
SLL (-40.22 dB) as determined by HEP.
From nebular to pynebular: a new package for the analysis of emission linesAstroAtom
Poster presented by . Luridiana, C. Morisset, R.A. Shaw, D. Díaz-González at the IAU Symposium 283, Planetary Nebulae: an Eye to the Future, 25-29 July 2011, Tenerife, Spain.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Comparative Analysis of Distortive and Non-Distortive Techniques for PAPR Red...IDES Editor
OFDM is a popular and widely accepted modulation
and multiplexing technique in the area of wireless
communication. IEEE 802.15, a wireless specification defined
for WPAN is an emerging wireless technology for short range
multimedia applications. Two general categories of 802.15
are the low rate 802.15.4 (ZigBee) and high rate 802.15.3
(UWB). In their physical (PHY) layer design, OFDM is a
competing technique due to the various advantages it renders
in the practical wireless media. OFDM has been a popular
technique for many years and adopted as the core technique
in a number of wireless standards. It makes the system more
immune to interference like InterSymbol Interference (ISI)
and InterCarrier Interference (ICI) and dispersive effects of
the channel. It is also a spectrally efficient scheme since the
spectra of the signal are overlapping in nature. Despite these
advantages OFDM suffers from a serious problem of high
Peak to Average Power. This limits the system’s capabilities
and increases the complexity. This paper compares the signal
distortion technique of Amplitude Clipping and the
distortionless technique of SLM for Peak to Average Power
reduction
Resource-Oriented Architecture offers advantages over other web-service architectures. It is based on a simple, scalable and highly standardised application-level protocol. Multimedia content is commonly managed using the MPEG-7. The MPEG-7 is a standard for representing audiovisual information that satisfies specific requirements based on syntax, semantic and decoding. Content descriptions under MPEG-7 can be organised and characterized without ambiguity. The MPEG-7 eXperimental Model (XM) includes the best performing tools for MPEG-7 normative and non-normative elements. In this paper, multimedia content is managed using the MPEG-7 eXperimental Model functionalities and provided using web-services technology. RESTful principles are the guidelines for achieving multimedia content storage and retrieval. Quantitative evaluation of the proposed web services has shown that this approach has better performance, in term of retrieval speed and storage space
International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Performance of Matching Algorithmsfor Signal Approximationiosrjce
IOSR Journal of Electronics and Communication Engineering(IOSR-JECE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of electronics and communication engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in electronics and communication engineering. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Eigen Subspace based Direction of Arrival Estimation for Coherent SourcesINFOGAIN PUBLICATION
Direction of arrival (DOA) estimation technology plays an important role in enhancing the performance of the adaptive arrays for mobile communication. In this paper comparative performance analysis of eigen subspace based DOA estimation for coherent sources is presented. A number of DOA estimation algorithms based on eigen subspace method have been developed. Among these MUSIC algorithm is considered to have exceptionally good results. The focus of this paper is to unveil the performance characteristics of MUSIC algorithm and its improved version for coherent sources. The simulation results show that the improved MUSIC algorithm is the best. Also it can be observed that the resolution of DOA estimation improves as the number of snapshots and signal to noise ratio increases.
performance analysis of MUSIC and ESPRIT DOA estimation used in adaptive arra...CSCJournals
MUSIC and ESPRIT DOA Estimaion algorithms are widely used in adaptive array to locate the desired desired signal.MUSIC is found to be more stable and accurate and widely used inthe design of adaptive array.
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
Extraction and classification algorithms based on kernel nonlinear features are popular in the new direction of research in machine learning. This research paper considers their practical application in the iTaukei automatic speaker recognition system (ASR) for cross-language speech recognition. Second, nonlinear speaker-specific extraction methods such as kernel principal component analysis (KPCA), kernel independent component analysis (KICA), and kernel linear discriminant analysis (KLDA) are summarized. The conversion effects on subsequent classifications were tested in conjunction with Gaussian mixture modeling (GMM) learning algorithms; in most cases, computations were found to have a beneficial effect on classification performance. Additionally, the best results were achieved by the Kernel linear discriminant analysis (KLDA) algorithm. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using ATR Japanese C language corpus and self-recorded iTaukei corpus. The ASR efficiency of KLDA, KICA, and KLDA technique for 6 sec of ATR Japanese C language corpus 99.7%, 99.6%, and 99.1% and equal error rate (EER) are 1.95%, 2.31%, and 3.41% respectively. The EER improvement of the KLDA technique-based ASR system compared with KICA and KPCA is 4.25% and 8.51% respectively.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Demixing Commercial Music Productions via Human-Assisted Time-Frequency Maskingdhia_naruto
This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or
consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be
obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York
10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is
not permitted without direct permission from the Journal of the Audio Engineering Society.
STATE SPACE POINT DISTRIBUTION PARAMETER FOR SUPPORT VECTOR MACHINE BASED CV ...cscpconf
In this paper we extend Support Vector Machines (SVM) for speaker independent Consonant –
Vowel (CV) unit classification. Here we adopt the technique known as Decision Directed Acyclic
Graph (DDAG) , which is used to combine many two class classifiers into multiclass classifier.
Using Reconstructed State Space (RSS) based State Space Point Distribution (SSPD) parameters,
we obtain an average speaker independent phoneme recognition accuracy of 90% on the
Malayalam V/CV speech unit database. The recognition results indicate that this method is, efficient and can be adopted for developing a complete speech recognition system for Malayalam language.
Direction of Arrival Estimation Based on MUSIC Algorithm Using Uniform and No...IJERA Editor
In signal processing, the direction of arrival (DOA) estimation denotes the direction from which a propagating wave arrives at a point, where a set of antennas is located. Using the array antenna has an advantage over the single antenna in achieving an improved performance by applying Multiple Signal Classification (MUSIC) algorithm. This paper focuses on estimating the DOA using uniform linear array (ULA) and non-uniform linear array (NLA)of antennas to analyze the performance factors that affect the accuracy and resolution of the system based on MUSIC algorithm. The direction of arrival estimation is simulated on a MATLAB platform with a set of input parameters such as array elements, signal to noise ratio, number of snapshots and number of signal sources. An extensive simulation has been conducted and the results show that the NLA with DOA estimation for co-prime array can achieve an accurate and efficient DOA estimation
Performance Analysis of Adaptive DOA Estimation Algorithms For Mobile Applica...IJERA Editor
Spatial filtering for mobile communications has attracted a lot of attention over the last decade and is cur-rently considered a very promising technique that will help future cellular networks achieve their ambi-tious goals. One way to accomplish this is via array signal processing with algorithms which estimate the Direction-Of-Arrival (DOA) of the received waves from the mobile users. This paper evaluates the per-formance of a number of DOA estimation algorithms. In all cases a linear antenna array at the base station is assumed to be operating typical cellular environment.
Matrix Padding Method for Sparse Signal ReconstructionCSCJournals
Compressive sensing has been evolved as a very useful technique for sparse reconstruction of signals that are sampled at sub-Nyquist rates. Compressive sensing helps to reconstruct the signals from few linear projections of the sparse signal. This paper presents a technique for the sparse signal reconstruction by padding the compression matrix for solving the underdetermined system of simultaneous linear equations, followed by an iterative least mean square approximation. The performance of this method has been compared with the widely used compressive sensing recovery algorithms such as l1_ls, l1-magic, YALL1, Orthogonal Matching Pursuit, Compressive Sampling Matching Pursuit, etc.. The sounds generated by 3-blade engine, music, speech, etc. have been used to validate and compare the performance of the proposed technique with the other existing compressive sensing algorithms in ideal and noisy environments. The proposed technique is found to have outperformed the l1_ls, l1-magic, YALL1, OMP, CoSaMP, etc. as elucidated in the results.
Similar to SINGING-VOICE SEPARATION FROM MONAURAL RECORDINGS USING ROBUST PRINCIPAL COMPONENT ANALYSIS (20)
2. 2. ROBUST PRINCIPAL COMPONENT ANALYSIS
Cand` s et al. [12] proposed RPCA, which is a convex pro-
e
gram, for recovering low-rank matrices when a fraction of
their entries have been corrupted by errors, i.e., when the ma-
trix is sufficiently sparse. The approach, Principal Compo-
nent Pursuit, suggests solving the following convex optimiza-
tion problem:
minimize ||L||∗ + λ||S||1
(a) Original Matrix M
subject to L + S = M
where M ∈ Rn1 ×n2 , L ∈ Rn1 ×n2 , S ∈ Rn1 ×n2 . || · ||∗
and || · ||1 denote the nuclear norm (sum of singular values)
and the L1-norm (sum of absolute values of matrix entries),
respectively. λ > 0 is a trade-off parameter between the rank
of L and the sparsity of S. As suggested in [12], using a value
of λ = 1/ max(n1 , n2 ) is a good rule of thumb, which can
then be adjusted slightly to obtain the best possible result. We
explore λk = k/ max(n1 , n2 ) with different values of k,
thus testing different tradeoffs between the rank of L and the (b) Low-Rank Matrix L
sparsity of S.
Since music instruments can reproduce the same sounds
each time they are played and music has, in general, an under-
lying repeating musical structure, we can think of music as a
low-rank signal. Singing voices, on the contrary, have more
variation (higher rank) but are relatively sparse in the time and
frequency domains. We can then think of singing voices as
components making up the sparse matrix. By RPCA, we ex-
pect the low-rank matrix L to contain music accompaniment
and the sparse matrix S to contain vocal signals. (c) Sparse Matrix M
We perform the separation as follows: First, we compute
the spectrogram of music signals as matrix M , calculated Fig. 2. Example RPCA results for yifen 2 01 at SNR=5 for (a) the
from the Short-Time-Fourier Transform (STFT). Second, original matrix, (b) the low-rank matrix, and (c) the sparse matrix.
we use the inexact Augmented Lagrange Multiplier (ALM)
method [13], which is an efficient algorithm for solving
RPCA problem, to solve L + S = |M |, given the input time frequency masking Mb as follows:
magnitude of M . Then by RPCA, we can obtain two output
matrices L and S. From the example spectrogram, Figure 2,
we can observe that there are formant structures in the sparse 1 |S(m, n)| > gain ∗ |L(m, n)|
matrix S, which indicates vocal activity, and musical notes in Mb (m, n) = (1)
0 otherwise
the low-rank matrix L.
Note that in order to obtain waveforms of the esti- for all m = 1 . . . n1 and n = 1 . . . n2 .
mated components, we record the phase of original sig-
nals P = phase(M ), append the phase to matrix L and Once the time-frequency mask Mb is computed, it is ap-
S by L(m, n) = LejP (m,n) , S(m, n) = SejP (m,n) , for plied to the original STFT matrix M to obtain the separation
m = 1 . . . n1 and n = 1 . . . n2 , and calculate the inverse matrix Xsinging and Xmusic , as shown in Equation (2).
STFT (ISTFT). The source codes and sound examples are
available at http://mickey.ifp.uiuc.edu/wiki/Software. Xsinging (m, n) = Mb (m, n)M (m, n)
Xmusic (m, n) = (1 − Mb (m, n))M (m, n)
3. TIME-FREQUENCY MASKING (2)
for all m = 1 . . . n1 and n = 1 . . . n2 .
Given the separation results of the low-rank L and sparse S To examine the effectiveness of the binary mask, we assign
matrices, we can further apply binary time-frequency mask- Xsinging = S and Xmusic = L directly as the case with no
ing methods for better separation results. We define binary mask.
58
3. 4. EXPERIMENTAL RESULTS a binary mask, and (3) a comparison of our results with the
previous literature [8, 10] in terms of GNSDR.
4.1. Dataset (1) The effect of λk
The value λk = k/ (max(n1 , n2 ) can be used for trading
We evaluate our system using the MIR-1K dataset1 . There are
off the rank of L with the sparsity of S. The matrix S is sparser
1000 song clips encoded with a sample rate of 16 kHz, with
with higher λk , and vice versa. Intuitively, for the source sep-
a duration from 4 to 13 sec. The clips were extracted from
aration problem, if the matrix S is sparser, there is less inter-
110 Chinese karaoke pop songs performed by both male and
ference in the matrix S; however, deletions of original signal
female amateurs. The dataset includes manual annotations
components might also result in artifacts. On the other hand,
of the pitch contours, lyrics, indices and types for unvoiced
if S matrix is less sparse, the signal contains less artifacts,
frames, and indices of the vocal and non-vocal frames.
but there is more interference from the other sources that ex-
ist in matrix S. Experimental results (Figure 3) show these
4.2. Evaluation
trends for both the case of no mask and the case of binary
Following the evaluation framework in [8, 10], we create mask (gain=1) at different SNR values.
three sets of mixtures using the 1000 clips of the MIR-1K
dataset. For each clip, the singing voice and the music accom-
paniment were mixed at -5, 0, and 5 dB SNRs, respectively.
Zero indicates that the singing voice and the music are at the
same energy levels, negative values indicate the energy of the
music accompaniment is larger than the singing voice, and so
on.
For source separation evaluation, in addition to evaluating
the Global Normalized Source to Distortion Ratio (GNSDR)
as [8, 10], we also evaluate our performance in terms of
Source to Interference Ratio (SIR), Source to Artifacts Ratio
(SAR), and Source to Distortion Ratio (SDR) by BSS-EVAL
metrics [14]. The Normalized SDR (NSDR) is defined as
NSDR(ˆ, v, x) = SDR(ˆ, v) − SDR(x, v)
v v (3) (a) no mask (b) binary mask
where v is the resynthesized singing voice, v is the origi-
ˆ Fig. 3. Comparison between the case using (a) no mask and (b) a
nal clean singing voice, and x is the mixture. NSDR is for binary mask at SNR={-5,0,5}, k (of λk )={0.1, 0.5, 1, 1.5, 2, 2.5},
estimating the improvement of the SDR between the prepro- and gain=1.
cessed mixture x and the separated singing voice v . The
ˆ
GNSDR is calculated by taking the mean of the NSDRs over (2) The effect of the gain factor with a binary mask
all mixtures of each set, weighted by their length. The gain factor adjusts the energy between the sparse matrix
N
and the low-rank matrix. As shown in Figure 4, we exam-
n=1 wn NSDR(ˆn , vn , xn )
v ine different gain factors {0.1, 0.5, 1, 1.5, 2} at λ1 where
GNSDR(ˆ, v, x) =
v N
(4)
n=1 wn SNR={-5, 0, 5}. Similar to the effect of λk , a higher gain
factor results in a lower power sparse matrix S. Hence, there
where n is the index of a song and N is the total number of is larger interference and fewer artifacts at high gain, and vice
the songs, and wn is the length of the nth song. Higher values versa.
of SDR, SAR, SIR, and GNSDR represent better separation (3) Comparison with previous systems
quality2 . From previous observations, moderate values for λk and the
gain factor balance the separation results in terms of SDR,
4.3. Experiments SAR, and SIR. We empirically choose λ1 (also suggested in
In the separation process, the spectrogram of each mixture [12]) and gain = 1 to compare with previous literature on
is computed using a window size of 1024 and a hop size of singing-voice separation in terms of GNSDR using the MIR-
256 (at Fs=16,000). Three experiments were run: (1) an eval- 1K dataset [8, 10].
uation of the effect of λk for controlling low rankness and Hsu and Jang [8] performed singing-voice separation us-
sparsity, (2) an evaluation of the effect of the gain factor with ing a pitch-based inference separation method on the MIR-
1 https://sites.google.com/site/unvoicedsoundseparation/mir-1k
1K dataset. Their method combines the singing-voice separa-
2 The suppression of noise is reflected in SIR. The artifacts introduced tion method [7], the separation of the unvoiced singing-voice
by the denoising process are reflected in SAR. The overall performance is frames, and a spectral subtraction method. Rafii and Pardo
reflected in SDR. proposed a singing-voice separation method by extracting the
59
4. pand current work to speech noise reduction, since in many
situations, the noise spectrogram is relatively low-rank, while
the speech spectrogram is relatively sparse.
6. REFERENCES
[1] C.-K. Wang, R.-Y. Lyu, and Y.-C. Chiang, “An automatic
singing transcription system with multilingual singing lyric
recognizer and robust melody tracker,” in Proc. of Interspeech,
2003, pp. 1197–1200.
[2] H. Fujihara, M. Goto, J. Ogata, K. Komatani, T. Ogata, and
H. G. Okuno, “Automatic synchronization between lyrics and
music cd recordings based on viterbi alignment of segregated
vocal signals,” in Proc. of ISM, 12 2006, pp. 257–264.
[3] A. Berenzweig, D. P. W. Ellis, and S. Lawrence, “Using voice
segments to improve artist classification of music,” in AES
Fig. 4. Comparison between various gain factors using a binary
22nd International Conference: Virtual, Synthetic, and Enter-
mask with λ1
tainment Audio, 2002.
[4] H. Fujihara and M. Goto, “A music information retrieval sys-
tem based on singing voice timbre,” in ISMIR, 2007, pp. 467–
repeating musical structure estimated by the autocorrelation 470.
function [10] . [5] S. Vembu and S. Baumann, “Separation of vocals from poly-
As shown in Figure 5, both our approach using binary phonic audio recordings,” in ISMIR, 2005, pp. 337–344.
mask and our approach using no mask achieve better GNSDR [6] A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, “Adap-
compared with previous approaches [8, 10], with the no mask tation of Bayesian models for single-channel source separation
approach providing the best overall results. These methods and its application to voice/music separation in popular songs,”
are compared to an ideal situation where an ideal binary mask Audio, Speech, and Language Processing, IEEE Transactions
is used as the upper-bound performance of the singing-voice on, vol. 15, no. 5, pp. 1564 –1578, July 2007.
separation task with algorithms based on binary masking [7] Yipeng Li and DeLiang Wang, “Separation of singing voice
technique. from music accompaniment for monaural recordings,” Audio,
Speech, and Language Processing, IEEE Transactions on, vol.
15, no. 4, pp. 1475 –1487, May 2007.
[8] C.-L. Hsu and J.-S.R. Jang, “On the improvement of singing
voice separation for monaural recordings using the MIR-1K
dataset,” Audio, Speech, and Language Processing, IEEE
Transactions on, vol. 18, no. 2, pp. 310 –319, Feb. 2010.
[9] J.-L. Durrieu, G. Richard, B. David, and C. Fevotte,
“Source/filter model for unsupervised main melody extraction
from polyphonic audio signals,” Audio, Speech, and Language
Processing, IEEE Transactions on, vol. 18, no. 3, pp. 564–575,
Fig. 5. Comparison between Hsu [8], Rafii [10], the binary mask, March 2010.
the no mask, and the ideal binary mask cases in terms of GNSDR. [10] Z. Rafii and B. Pardo, “A simple music/voice separation
method based on the extraction of the repeating musical struc-
5. CONCLUSION AND FUTURE WORK ture,” in ICASSP, May 2011, pp. 221– 224.
In this paper, we proposed an unsupervised approach which [11] H. Schenker, Harmony, University of Chicago Press, 1954.
applies robust principal component analysis on singing-voice [12] Emmanuel J. Cand` s, Xiaodong Li, Yi Ma, and John Wright,
e
separation from music accompaniment for monaural record- “Robust principal component analysis?,” J. ACM, vol. 58, pp.
ings. We also examined the parameter λk in RPCA and the 11:1–11:37, Jun. 2011.
gain factor with a binary mask in detail. Without using prior [13] Z. Lin, M. Chen, L. Wu, and Y. Ma, “The augmented La-
training or requiring particular features, we are able to achieve grange multiplier method for exact recovery of corrupted low-
around 1∼1.4 dB higher GNSDR compared with two state- rank matrices,” Tech. Rep. UILU-ENG-09-2215, UIUC, Nov.
of-the-art approaches, by taking into account the rank of mu- 2009.
sic accompaniment and the sparsity of singing voices. [14] E. Vincent, R. Gribonval, and C. Fevotte, “Performance mea-
There are several ways in which this work could be ex- surement in blind audio source separation,” Audio, Speech,
and Language Processing, IEEE Transactions on, vol. 14, no.
tended, for example, (1) to investigate dynamic parameter se-
4, pp. 1462 –1469, July 2006.
lection methods according to different contexts, or (2) to ex-
60