• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Improving the global parameter signal to distortion value in music signals
 

Improving the global parameter signal to distortion value in music signals

on

  • 440 views

 

Statistics

Views

Total Views
440
Views on SlideShare
440
Embed Views
0

Actions

Likes
1
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Improving the global parameter signal to distortion value in music signals Improving the global parameter signal to distortion value in music signals Document Transcript

    • International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN INTERNATIONAL JOURNAL OF ELECTRONICS AND 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEMECOMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)ISSN 0976 – 6464(Print)ISSN 0976 – 6472(Online)Volume 4, Issue 1, January- February (2013), pp. 01-10 IJECET© IAEME: www.iaeme.com/ijecet.aspJournal Impact Factor (2012): 3.5930 (Calculated by GISI) ©IAEMEwww.jifactor.com IMPROVING THE GLOBAL PARAMETER SIGNAL TO DISTORTION VALUE IN MUSIC SIGNALS USING PANNING TECHNIQUE AND DISCRETE WAVELET TRANSFORMS VENKATESH KUMAR.N1, RAGHAVENDRA.N2, SUBASH KUMAR T. G3, MANOJ KUMAR.K4 1(SET, Asst. Professor, Department of ECE, Jain University, Jakkasandra, Ramanagar Taluk, Karnataka,India kumarsparadise@yahoo.com,) 2(Principal Staff Engineer, Google Inc, [formerly Motorola Mobility], Bangalore, India) 3(Project Leader, Jasmin Infotech Pvt Ltd, Velacherry, Chennai, India) 4(Manoj Kumar K, Consultant, Java Mentor, Bangalore, India) ABSTRACT In this paper, an attempt is made to alleviate the effect of distortion during feature extraction of a music signal. The proposed method is compared with the existing methods for performance evaluation, thereby, improving the signal to distortion value. Keywords: Blind Source Separation; DWT; FFT; Panning; STFT; Signal to Distortion ratio; 1. INTRODUCTION The singing voice, in addition to being the oldest musical instrument, is also one of the most complex from an acoustic standpoint [1]. Research on the perception of singing is not as developed as in the closely related field of speech research [2]. Some of the existing work is surveyed in this section. Chou and Gu [3] had utilized a gaussian mixture model (GMM) to detect the vocal regions. The feature vectors used for the GMM include 4Hz modulation energy, harmonic coefficients, 4 Hz harmonic coefficients, delta mel frequency cepstral coefficients (MFCC) and delta log energy. Berenzweig and Ellis [4] had used a speech recognizer’s classifier to distinguish vocal segments from accompaniment. 1
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEME Kim and Whitman [5] had developed a system for singer identification in popularmusic recordings using voice coding features. Another system for automatic singer identification had been proposed by Zhang[6]. Maddage(c) et al. [7] had proposed a framework for music structure analysis with thehelp of repeated chord pattern analysis and vocal content analysis. MaximoCobos [8] had proposed a system for extracting singing voice from stereorecordings. This system combines panning information and pitch tracking, allowing to refinethe time-frequency mask applied for extracting a vocal segment, and thus, improving theseparation.1.1 Motivation In real time applications of sound separation like lyrics recognition and musicremixing, music information retrieval requires accurate extraction of features from the musicsignal. Existing methods result in poor signal to distortion value. Hence, it is necessary toenhance music quality by improving the signal to distortion value.1.2 Problem Statement The applications of music separation algorithms in real time demand for better signalto noise and distortion ratios (SINAD). These parameters depend on the technique used forfeature extraction, where in literature it can be found that similarity measures between theShort Time Fourier Transforms of the input signals were used to identify the time-frequency(TF) regions occupied by each source based on the panning coefficient. Instead, in this work,we implement the audio source separation using the similarity measures between the DiscreteWavelet Transforms (DWT’s) of the input signals which were used to identify the time-frequency regions occupied by each source based on the panning coefficient, henceimproving the Signal to and Distortion ratio.2. PROPOSED SOURCE SEPARATION TECHNIQUE2.1 Music Source Separation Model The source separation problem can be stated as follows: given M linear mixtures of Nsources mixed via an unknown M × N mixing matrix A, estimate the underlying sources fromthe mixtures. When M = N, this can be achieved by estimating an un-mixing matrix W,which allows to estimate the original sources up to a permutation and a scale factor.Independent Component Analysis (ICA) algorithms are able to perform the separation ifsome conditions are satisfied: the sources must be non-Gaussian and statistically independent[9]. Moreover, the number of sources must be equal to the number of available mixtures, M =N, and the problem is said to be even determined. When M > N, the mixing process isdefined as over determined and the underlying sources can be estimated by least-squaresoptimization using matrix pseudo-inversion. If M < N, the mixing process is underdeterminedand the estimation of the sources becomes much more difficult [10]. When dealing withstereo commercial music recordings, only the information of the left and right channels isavailable, and thus, the mixture is generally underdetermined [11]. Sparse methods provide apowerful approach to the separation of several signals when there are more sources thansensors [12]. The sparsity property of audio signals means that in most time-frequency bins,all sources but one, at most, will have a time-frequency coefficient of zero or close tozero[13][14]. The DUET algorithm [15], originally conceived for separating underdetermined speech mixtures, assumes that because of the sparsity of speech in the Short Time 2
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), IS ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013) © IAEME 2013),Fourier Transform (STFT) domain, almost all mixture time-frequency points with significant time frequencymagnitude are in fact due to only one of the original sources. In fact, in the ideal case when eeach time-frequency point belongs only to one source, the sources are said to be W frequency W-DisjointOrthogonal (W-DO).2.2 Overview of the Proposed Model Define Similarity X1 X2 Stereo input Measure Partial Similarity measure is calculated Calculate Ambiguity Resolving Function Panning Index Analysis and Gaussian Windowing Set the window width ζ Foreground Streams DWTs of the foreground streams are obtained Apply DWT−1operator, obtaining target signal ( (t)), i=1,2 Figure 1 overview of the proposed separation model 3
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), IS ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013) © IAEME 2013),2.3 Panning Index Windowing An initial segregation of the singing voice is used by applying the sourceidentification technique developed by Avendano [16]. This technique is based on acomparison of the left and right signals in the TF plane, obtaining a two-dimensional map two dimensionalthat identifies different source components related to the panning gains used in the stereo mix panningdown. Firstly, a similarity measure is defined: (1)where * denotes a complex conjugation. If the source is panned to the center then the function will get its maximum value of one, and if the sourceis panned completely to either side, the function will attain its minimum value of zero. A mpletelyquadratic dependence on the panning knob Φ makes the function (4.2) multi-valued and an valuedambiguity appears in knowing the lateral direction of the source. The ambiguity is resolv resolvedusing the following partial similarity measures: (2)and their difference (3)The ambiguity-resolving function is: resolving (4)Finally, the panning index Ψ(k,m) is obtained as (k,m) . (5)which identifies the time-frequency components of the sources in the stereo mix when they frequencyare all panned to different positions. If several sources are equally panned, they will appear in the PI map as a singlesource. Due to the overlap with other sources, selecting only bins with Ψ = Ψ0 will exclude 0bins where the source might still have significant energy but whose panning index has beenaltered by the presence of the interference. A Gaussian window is proposed to let componentswith values equal to Ψ0 pass unmodified and weight TF points with a PI value near to Ψ0: (6)where Ψ0 is the panning index value for extracting a given source, ζ controls the width ofthe window, and ‘v’is a floor value necessary to avoid setting DWT values to zero, which noise artifacts. The Ψ0 value must be specified for centering themight result in musical-noise artiseparating window. Most of the vocal removers exploit the fact that singing voice is usuallypanned to the center. This is true for most of music recordings, so Ψ0 = 0 is normally used. Asupervised exploration along different PI values can be used for locating more exactly the pan 4
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), IS ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013) © IAEME 2013),location of the vocals. The value of ζ, used for setting up the window width, can be obtained ,using (7)where Ψcis the PI value where the window reaches a small value A, for example is , = −60dB. Once the parameters of the window have been set up, the DWTs of the initialforeground streams are simply obtained by applying the window to each of the mixturechannels: (8)These are converted back to the time domain applying the DWT−1operator, obtaining (t).The denotes the corresponding step of the separation method. The recoveredtarget signal is obtained by adding the foreground streams of both channels: (9)2.4 Performance Evaluation Separation algorithms can be evaluated by using a set of measures under someallowed distortions. These distortions depend on the kind of application considered. In [17], [1four numerical performance criteria are defined. The Signal to Distortion Ratio Th (10)the Signal to Interferences Ratio (11)the Signal to Noise Ratio (12)and the Signal to Artifacts Ratio (13) 5
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), IS ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013) © IAEME 2013),where is a version of modified by an allowed distortion, and where , and respectively are the interferences, noise and artifacts error terms resulting fromthe decomposition (14) The SIR and the SAR are indicators of the rejection of the interferences and th theabsence of “burbling” artifacts, respectively. The SNR is a measure of the rejection of thesensor noise and the SDR can be seen as a global performance measure.3. IMPLEMENTATION The model discussed above is implemented in MATLAB R2010a software and BSSEVAL toolbox [18] for MATLAB is used for performance evaluation. Later in this section ]the extracted features using our method are compared with other two feature extractiontechniques STFT and FFT.3.1 Design parameters for Source separation Table 1 Design parameters for s source separation Parameter Value Frame-size Frame 1000 Overlap 0.75% Panning Index(C) - Panning Index(0) 0 Smallest window 0.001 Value (A) Floor value 0.0005 By considering the design parameters as mentioned in the TABLE 1, we calculate the ,performance evaluation parameter SDR for the proposed model. To evaluate the extracted features they were compared in two classificationexperiments with two feature sets that have been proposed in the literature. The first featureset consists of features extracted using the STFT. The second feature set consists of features tsextracted from Fast Fourier Transform (FFT) The source separation method is applied over several wave files, where each audiofile is approximately 30 seconds long, with a frame size of 1000 samples at 44100 Hzsampling rate. 6
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEME4. RESULTS Similarity measure between the Discrete Wavelet Transforms (DWT) of the inputsignals is used to identify time-frequency regions occupied by each source based on thepanning coefficient assigned to it during the mix. Individual music components are identifiedand manipulated by clustering the time-frequency components with a given panningcoefficient. After modification, an inverse IDWT is used to synthesize a time-domainprocessed signal. The Figures below shows the plot of both the input signal and the extractedvoice signal of several wave files plotted using MATLAB. The performance evaluation parameter, SDR can be obtained using BSS EVALtoolbox in MATLAB. The music separation method is applied over several wave files, whereeach audio file is approximately 30 seconds long, with a frame size of 1000 samples at 44100Hz sampling rate. Below are the lists of experiments.Experiment 1:The Fig. 2 demonstrates input signal and the extracted voice signal from the wave file“boyfriend.wav”, composed by Ashley simpson which is 20 seconds long, with 44100 Hzsampling rate. MATLAB software is used to plot the results. The results are tabulated inTABLE 2.Figure 2.input signal and the extracted voice signal from the wave file “boyfriend.wav” Table 2 SDR’s of “boyfriend.wav” SDR I/P Wave file/Composer FFT STFT DWT Boyfriend -Ashley simpson 44.1153 51.1288 84.0558 7
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEMEExperiment 2:The Fig. 3 demonstrates input signal and the extracted voice signal from the wave file“chammak challo.wav”, composed by Vishal Shekhar which is 35 seconds long, with 44100Hz sampling rate. MATLAB software is used to plot the results. The results are tabulated inTABLE 3.Figure 3.i/p signal and the extracted voice signal from the wave file “chammakchallo.wav” Table 3 SDR’s of “chammakchallo.wav” SDR I/P Wave file/Composer FFT STFT DWT Chammakchallo - vishalshekar 35.5603 40.5822 83.47656607Experiment 3:The Fig. 4 demonstrates input signal and the extracted voice signal from the wave file“toxic.wav”, composed by Britney Spears which 27 seconds long, with 44100 Hz samplingrate. MATLAB software is used to plot the results. The results are tabulated in TABLE 4.Figure 4 input signal and the extracted voice signal from the wave file “toxic.wav” Table 4 SDR’s of “toxic.wav” SDR I/P Wave file/Composer FFT STFT DWT Toxic - Britney spears 56.2875 59.7563 85.7679 8
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEME5. CONCLUSION Audio source separation using DWT is presented. The Discrete Wavelet Transforms(DWT’s) of the input signals which were used to identify the time-frequency regionsoccupied by each source based on the panning coefficient hence improved the Signal to Noiseand Distortion ratios. From the TABLE 2, 3 and 4 respectively, it is evident that the resultsobtained using DWT as feature extractor is approximately 38% better when compared withother two feature extractors, proving that proposed method provides better Signal to Noiseand Distortion Ratios.6. ACKNOWLEDGEMENTS The authors 1 and 4 wish to acknowledge for the awesome technical support providedby Jasmin Infotech,India and Google India.REFERENCES[1]. Kim,Y. and Whitman, B. “Singer identification in popular music recordings using voicecoding features,” Proc. ISMIR 2002.[2] P. Comon, Independent component analysis, a new concept?, SignalProcessing, vol. 36,no. 3, pp. 287–314, April 1994.[3]. Chou, W. and Gu, L. “Robust singing detection in speech/music discriminator design,”Proc. ICASSP 2001.[4]. Berenzweig, A. and Ellis, D.P.W. “Locating Singing voice segments within musicsignals ,” Proc. WASPAA 2001.[5]. Kim,Y. and Whitman, B. “Singer identification in popular music recordings using voicecoding features,” Proc. ISMIR 2002.[6]. Zhang, T. “System and method for automatic singer identification,” Proc. ICME 2003.762 Proc.[7]. Maddage, N.C.(c), et al. “Content-based music structure analysis with applications tomusic semantic understanding,” Proc. ACM Multimedia 2004.[8]. MaximoCobos, and Jose J. Lopez, “Singing Voice Separation Combining PanningInformation and Pitch Tracking”, Audio Engineering Society Convention Paper Presented atthe 124th Convention 2008 May 17–20 Amsterdam, The Netherlands[9] J. F. Cardoso, “Blind signal separation: statisticalprinciples,” in Proceedings of theIEEE,vol.86, no. 10, pp. 2009-2025,[10] T. W. Lee, M. S. Lewicki, M. Girolami and T.J. Sejnowski, “Blind source separation ofmoresources than mixtures using overcomplete representations,”in IEEE Signal ProcessingLetters,vol.6, no. 4, pp.87-90, April 1999.[11] A. S. Master, “ Stereo Music Source Separationvia Bayesian Modeling,” Ph.D.Dissertation,Stanford University, June 2006.[12] P. D. O’Grady, B. A. Pearlmutter and S.T. Rickard, “Survey of sparse and non-sparsemethods in source separation,” InternationalJournal of Imaging Systems andTechnology(IJIST), vol.15, no. 1, pp.18-33, 2005.[13] C. Jutten and M. Babaie-Zadeh, “Source separationprinciples, current advances andapplications,”presented at the 2006 German-FrenchInsitute for Automation and RoboticAnnualMeeting, IAR 2006, Nancy, France, November2006. 9
    • International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN 0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 4, Issue 1, January- February (2013), © IAEME[14] K. Torkkola, “Blind separation for audio signals:are we there yet?,” in Proceedings oftheWorkshop on Independent Component Analysisand Blind Signal Separation (ICA 1999),1999[15] O. Yilmaz and S. Rickard, “Blind separationof speech mixtures via time-frequencymasking,”in IEEE Transactions on Signal Processing,vol.52, no. 7, pp.1830-1847, July 2004.[16] C. Avendano, “Frequency-domain source identificationand manipulation in stereo mixesforenhancement, suppression and re-panning applications,”in IEEE Workshop onApplicationsof Signal Processing to Audio and Acoustics,New Paltz, New York, October2003.[17] E. Vincent, R. Gribonval and C. F´evotte, “PerformanceMeasurement in Blind AudioSourceSeparation,” in IEEE Transactions on Speechand Audio Processing, vol.14, no. 4,pp.1462-1469, 2006.[18] C. F´evotte, R. Gribonval and E. Vincent,“BSS EVAL Toolbox User Guide,”IRISA,Rennes, France, 2006.[19] Ravindra M. Malkar, Vaibhav B. Magdum and Darshan N. Karnawat, “An AdaptiveSwitched Active Power Line Conditioner Using Discrete Wavelet Transform (Dwt)”International Journal of Electrical Engineering & Technology (IJEET), Volume2, Issue1,2011, pp. 14 - 24, Published by IAEME. 10