This document presents a new approach to modeling speech spectra and pitch for text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution (MSD-GMM). The MSD-GMM allows modeling of continuous pitch values for voiced frames and discrete symbols representing unvoiced frames in a unified framework. Spectral and pitch features are jointly modeled by a two-stream MSD-GMM. Maximum likelihood estimation formulae are derived for the MSD-GMM parameters. Experimental results show the MSD-GMM can efficiently model spectral and pitch features of each speaker and outperforms conventional speaker models.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This paper presents a new algorithm based on the Baum-Welch algorithm for estimating the parameters of a hidden Markov model that allows each state to be observed using a different set of features rather than relying on a common feature set. The algorithm uses likelihood ratios calculated from low-dimensional sufficient statistics for each state rather than high-dimensional probability density functions. This allows more accurate and robust parameter estimation compared to standard feature-based HMMs using a common feature set. The algorithm is mathematically equivalent to the standard Baum-Welch algorithm but avoids estimating high-dimensional densities.
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...CSCJournals
The Fractional Fourier Transform (FrFT) can provide significant interference suppression (IS) over other techniques in real-life non-stationary environments because it can operate with very few samples. However, the optimum rotational parameter ‘a’ must first be estimated. Recently, a new method to estimate ‘a’ based on the value that minimizes the projection of the product of the Wigner Distributions (WDs) of the signal-of-interest (SOI) and interference was proposed. This is more easily calculated by recognizing its equivalency to choosing ‘a’ for which the product of the energies of the SOI and interference in the FrFT domain is minimized, termed the WD-FrFT algorithm. The algorithm was shown to estimate ‘a’ more accurately than minimum mean square error FrFT (MMSE-FrFT) methods and perform far better than MMSE Fast Fourier Transform (MMSE-FFT) methods, which only operate in the frequency domain. The WD-FrFT algorithm significantly improves interference suppression (IS) capability, even at low signal-to-noise ratio (SNR). In this paper, we apply the proposed WD-FrFT technique to recovering a speech signal in non-stationary co-channel interference. Using mean-square error (MSE) between the SOI and its estimate as the performance metric, we show that the technique greatly outperforms the conventional methods, MMSE-FrFT and MMSE-FFT, which fail with just one non-stationary interferer, and continues to perform well in the presence of severe co-channel interference (CCI) consisting of multiple, equal power, non-stationary interferers. This method therefore has great potential for separating co-channel signals in harsh, noisy, non-stationary environments.
This document discusses several topics related to Fourier transforms including:
1) Representing polynomials in value representation by evaluating them at roots of unity allows for faster multiplication using the Discrete Fourier Transform (DFT).
2) The DFT reduces the complexity of the Discrete Fourier Transform (DFT) from O(n2) to O(n log n) by formulating it recursively.
3) Converting images from the spatial to frequency domain using techniques like the Discrete Cosine Transform (DCT) allows for image compression by retaining only low frequency components with large coefficients.
Comparison and analysis of combining techniques for spatial multiplexing spac...IAEME Publication
This document compares different combining techniques for space-time block coded systems in Rayleigh fading channels. It finds that maximum ratio combining outperforms other techniques like equal gain combining and selection combining for any space-time block code configuration, providing the best bit error rate. The document provides background on space-time block codes, describes the Alamouti space-time code, and discusses various receive diversity combining techniques.
Lesson 18: Maximum and Minimum Values (Section 021 slides)Matthew Leingang
The document is a lecture on finding maximum and minimum values of functions. It begins with announcements about an upcoming quiz. It then outlines the objectives of understanding the Extreme Value Theorem and Fermat's Theorem, and using the Closed Interval Method. The document discusses the Extreme Value Theorem, which states that a continuous function on a closed interval attains absolute maximum and minimum values. It also covers Fermat's Theorem, which states that if a function has a local extremum where it is differentiable, the derivative is equal to 0 at that point. Examples are provided to illustrate the importance of the hypotheses in the Extreme Value Theorem.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This paper presents a new algorithm based on the Baum-Welch algorithm for estimating the parameters of a hidden Markov model that allows each state to be observed using a different set of features rather than relying on a common feature set. The algorithm uses likelihood ratios calculated from low-dimensional sufficient statistics for each state rather than high-dimensional probability density functions. This allows more accurate and robust parameter estimation compared to standard feature-based HMMs using a common feature set. The algorithm is mathematically equivalent to the standard Baum-Welch algorithm but avoids estimating high-dimensional densities.
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...CSCJournals
The Fractional Fourier Transform (FrFT) can provide significant interference suppression (IS) over other techniques in real-life non-stationary environments because it can operate with very few samples. However, the optimum rotational parameter ‘a’ must first be estimated. Recently, a new method to estimate ‘a’ based on the value that minimizes the projection of the product of the Wigner Distributions (WDs) of the signal-of-interest (SOI) and interference was proposed. This is more easily calculated by recognizing its equivalency to choosing ‘a’ for which the product of the energies of the SOI and interference in the FrFT domain is minimized, termed the WD-FrFT algorithm. The algorithm was shown to estimate ‘a’ more accurately than minimum mean square error FrFT (MMSE-FrFT) methods and perform far better than MMSE Fast Fourier Transform (MMSE-FFT) methods, which only operate in the frequency domain. The WD-FrFT algorithm significantly improves interference suppression (IS) capability, even at low signal-to-noise ratio (SNR). In this paper, we apply the proposed WD-FrFT technique to recovering a speech signal in non-stationary co-channel interference. Using mean-square error (MSE) between the SOI and its estimate as the performance metric, we show that the technique greatly outperforms the conventional methods, MMSE-FrFT and MMSE-FFT, which fail with just one non-stationary interferer, and continues to perform well in the presence of severe co-channel interference (CCI) consisting of multiple, equal power, non-stationary interferers. This method therefore has great potential for separating co-channel signals in harsh, noisy, non-stationary environments.
This document discusses several topics related to Fourier transforms including:
1) Representing polynomials in value representation by evaluating them at roots of unity allows for faster multiplication using the Discrete Fourier Transform (DFT).
2) The DFT reduces the complexity of the Discrete Fourier Transform (DFT) from O(n2) to O(n log n) by formulating it recursively.
3) Converting images from the spatial to frequency domain using techniques like the Discrete Cosine Transform (DCT) allows for image compression by retaining only low frequency components with large coefficients.
Comparison and analysis of combining techniques for spatial multiplexing spac...IAEME Publication
This document compares different combining techniques for space-time block coded systems in Rayleigh fading channels. It finds that maximum ratio combining outperforms other techniques like equal gain combining and selection combining for any space-time block code configuration, providing the best bit error rate. The document provides background on space-time block codes, describes the Alamouti space-time code, and discusses various receive diversity combining techniques.
Lesson 18: Maximum and Minimum Values (Section 021 slides)Matthew Leingang
The document is a lecture on finding maximum and minimum values of functions. It begins with announcements about an upcoming quiz. It then outlines the objectives of understanding the Extreme Value Theorem and Fermat's Theorem, and using the Closed Interval Method. The document discusses the Extreme Value Theorem, which states that a continuous function on a closed interval attains absolute maximum and minimum values. It also covers Fermat's Theorem, which states that if a function has a local extremum where it is differentiable, the derivative is equal to 0 at that point. Examples are provided to illustrate the importance of the hypotheses in the Extreme Value Theorem.
Computational Tools and Techniques for Numerical Macro-Financial ModelingVictor Zhorin
A set of numerical tools used to create and analyze non-linear macroeconomic models with financial sector is discussed. New methods and results for computing Hansen-Scheinkman-Borovicka shock-price and shock-exposure elasticities for variety of models are presented. Spectral approximation technology (chebfun):
numerical computation in Chebyshev functions piece-wise smooth functions
breakpoints detection
rootfinding
functions with singularities
fast adaptive quadratures continuous QR, SVD, least-squares linear operators
solution of linear and non-linear ODE
Frechet derivatives via automatic differentiation PDEs in one space variable plus time
Stochastic processes:
(quazi) Monte-Carlo simulations, Polynomial Expansion (gPC), finite-differences (FD) non-linear IRF
Boroviˇcka-Hansen-Sc[heinkman shock-exposure and shock-price elasticities Malliavin derivatives
Many states:
Dimensionality Curse Cure
low-rank tensor decomposition
sparse Smolyak grids
Pricing Exotics using Change of NumeraireSwati Mital
The intention of this essay is to show how change of numeraire technique is used in pricing derivatives with complex payoffs. In the first instance, we apply the technique to pricing European Call Options and then use the same method to price an exotic Power Option.
The document discusses Approximate Bayesian Computation (ABC), a computational technique for Bayesian inference when the likelihood function is intractable. ABC allows sampling from the likelihood and making inferences based on simulated data without calculating the actual likelihood. The technique originated in population genetics models where likelihoods for genetic polymorphism data cannot be calculated in closed form. ABC is presented as both an inference machine with its own legitimacy compared to classical Bayesian approaches, as well as a way to address computational issues with intractable likelihoods.
This document describes an uncertain volatility model for pricing equity option trading strategies when the volatilities are uncertain. It uses the Black-Scholes Barenblatt equation developed by Avellaneda et al. to derive price bounds. The model is implemented in C++ using recombining trinomial trees to discretize the asset prices over time and space. The code computes the upper and lower price bounds by solving the Black-Scholes Barenblatt PDE using numerical techniques, with the volatility set based on the sign of the option gamma.
This document compares the Black-Scholes model and Merton jump diffusion model for accurately estimating option prices. It estimates parameters for both models using historical stock price and option data from several large companies. The results do not conclusively show that Merton jump diffusion more accurately prices derivatives, but it may better approximate moments from the data. The document provides background on geometric Brownian motion, the Black-Scholes option pricing formula, and the maximum likelihood estimation and EM algorithm used to estimate Merton jump diffusion model parameters.
On the stability and accuracy of finite difference method for options pricingAlexander Decker
1) The document discusses finite difference methods for pricing options, specifically the implicit and Crank-Nicolson methods.
2) It analyzes the stability and accuracy of each method. The Crank-Nicolson method is found to be more accurate and converges faster than the implicit method.
3) Numerical examples are provided to demonstrate the convergence properties of each method compared to the true Black-Scholes value. The results show that the Crank-Nicolson method converges to the solution faster than the implicit method as the time and price steps approach zero.
The document discusses a model of thermal sneutrino dark matter in the FD-term model of hybrid inflation. The FD-term model extends the MSSM with an inflaton-waterfall sector generating an effective μ-term and neutrino masses. During hybrid inflation, the inflaton field acquires a vacuum expectation value that breaks supersymmetry, giving masses to the right-handed sneutrino. Thermal production of right-handed sneutrinos in the early universe can account for the observed dark matter abundance if the sneutrinos annihilate via Yukawa interactions.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
11.the comparative study of finite difference method and monte carlo method f...Alexander Decker
This document compares the finite difference method and Monte Carlo method for pricing European options. It provides an overview of these two primary numerical methods used in financial modeling. The Monte Carlo method simulates asset price paths and averages discounted payoffs to estimate option value. It is well-suited for path-dependent options but converges slower than finite difference. The finite difference method solves the Black-Scholes PDE by approximating it on a grid. Specifically, it discusses the Crank-Nicolson scheme, which is unconditionally stable and converges faster than Monte Carlo for standard options.
From L to N: Nonlinear Predictors in Generalized Modelshtstatistics
1) Generalized nonlinear models (GNM) extend generalized linear models (GLM) by allowing the predictor to be a nonlinear function of the parameters. This allows for more parsimonious and interpretable models.
2) Estimating GNMs can be challenging due to difficulties generating starting values and the potential for multiple optima in the likelihood. The gnm package in R uses random starting values and a one-parameter-at-a-time Newton method.
3) The stereotype model is a special case of the multinomial logistic model that is suitable for ordered categorical data where only the scale of the relationship with covariates varies between categories. It can be fit as a GNM using a "Po
This summary provides the key details from the document in 3 sentences:
The document proposes a new method for encrypting two images into a single encrypted image using generalized weighted fractional Fourier transform (GWFRFT) with double random phase encoding. The encryption process involves applying pixel scrambling, phase encoding, and two rounds of GWFRFT with random phase masks on the combined image signal. This technique is shown to provide comparable security to the Advanced Encryption Standard (AES) with a 232-bit key size through a high number of possible permutations in the GWFRFT parameters and orders.
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)Jia-Bin Huang
This document presents a physics-based approach for detecting moving cast shadows in video sequences. It develops a new physical model to characterize the variation in background appearance caused by cast shadows, without making assumptions about the spectral power distributions of light sources and ambient illumination. It uses a Gaussian mixture model to learn and update the shadow model parameters over time in an unsupervised manner. Experimental results on three challenging sequences demonstrate the effectiveness of the proposed method.
This slide is my presentation for a reading circle "Machine Learning Professional Series".
Japanese version is here.
http://www.slideshare.net/matsukenbook/ss-50545587
Multiplicative Interaction Models in Rhtstatistics
The document discusses multiplicative interaction models and the gnm R package for fitting these models. The gnm package allows fitting of generalized nonlinear models, including models with standard multiplicative interactions. It uses an over-parameterized representation and a two-stage procedure involving a one-parameter-at-a-time Newton method and full Newton-Raphson to estimate model parameters. The document provides an example analysis fitting a homogeneous row-column association model to occupational status data.
System 1 and System 2 were basic early systems for image matching that used color and texture matching. Descriptor-based approaches like SIFT provided more invariance but not perfect invariance. Patch descriptors like SIFT were improved by making them more invariant to lighting changes like color and illumination shifts. The best performance came from combining descriptors with color invariance. Representing images as histograms of visual word occurrences captured patterns in local image patches and allowed measuring similarity between images. Large vocabularies of visual words provided more discriminative power but were costly to compute and store.
gnm: a Package for Generalized Nonlinear Modelshtstatistics
The gnm package provides functions for fitting generalized nonlinear models in R. It allows specification of models with multiplicative and other nonlinear terms. Models are fitted using maximum likelihood, with standard arguments that make gnm behave similarly to glm. The package handles over-parameterized representations of models and provides tools for estimating identifiable parameter combinations. An example analysis fits row-column association and homogeneous association models to occupational mobility data.
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
The document summarizes a research paper about learning moving cast shadows for foreground detection. It presents a proposed algorithm that uses a confidence-rated Gaussian mixture learning approach and Bayesian framework with Markov random fields to model local and global shadow features. This exploits the complementary nature of local and global features to improve shadow detection. The algorithm is evaluated on outdoor and indoor video sequences, showing improved accuracy over previous methods especially in adaptability to different lighting conditions. Future work could incorporate additional features and more powerful models.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.
This document provides information about asexuality. It defines asexuality as lacking sexual attraction rather than having a fear of sex, lack of libido, or being celibate. It discusses that asexual people can experience romantic attraction and form romantic relationships without sexual attraction or activity. It seeks to dispel common misconceptions that asexuality is a choice, medical condition, or phase by explaining that sexual orientation involves inherent feelings rather than behaviors or choices. The document also notes that while some asexual people do not desire relationships, others can experience romantic love, and that asexuality is not inherently anti-sex but rather advocates for sexual freedom and understanding for all orientations.
This document outlines social media and marketing recommendations for the Miami Beach Hair Institute. It discusses setting up social media accounts, engaging with online communities, building credibility through reviews and presentations, creating a video blog and chat, hosting an open house, chronicling a patient's hair restoration process through video diaries, partnering with a wish-granting organization for PR, and using augmented reality to demonstrate the hair restoration process to potential clients. The recommendations aim to build the institute's online presence, engage with their target audience, and help potential clients understand the hair restoration process.
Computational Tools and Techniques for Numerical Macro-Financial ModelingVictor Zhorin
A set of numerical tools used to create and analyze non-linear macroeconomic models with financial sector is discussed. New methods and results for computing Hansen-Scheinkman-Borovicka shock-price and shock-exposure elasticities for variety of models are presented. Spectral approximation technology (chebfun):
numerical computation in Chebyshev functions piece-wise smooth functions
breakpoints detection
rootfinding
functions with singularities
fast adaptive quadratures continuous QR, SVD, least-squares linear operators
solution of linear and non-linear ODE
Frechet derivatives via automatic differentiation PDEs in one space variable plus time
Stochastic processes:
(quazi) Monte-Carlo simulations, Polynomial Expansion (gPC), finite-differences (FD) non-linear IRF
Boroviˇcka-Hansen-Sc[heinkman shock-exposure and shock-price elasticities Malliavin derivatives
Many states:
Dimensionality Curse Cure
low-rank tensor decomposition
sparse Smolyak grids
Pricing Exotics using Change of NumeraireSwati Mital
The intention of this essay is to show how change of numeraire technique is used in pricing derivatives with complex payoffs. In the first instance, we apply the technique to pricing European Call Options and then use the same method to price an exotic Power Option.
The document discusses Approximate Bayesian Computation (ABC), a computational technique for Bayesian inference when the likelihood function is intractable. ABC allows sampling from the likelihood and making inferences based on simulated data without calculating the actual likelihood. The technique originated in population genetics models where likelihoods for genetic polymorphism data cannot be calculated in closed form. ABC is presented as both an inference machine with its own legitimacy compared to classical Bayesian approaches, as well as a way to address computational issues with intractable likelihoods.
This document describes an uncertain volatility model for pricing equity option trading strategies when the volatilities are uncertain. It uses the Black-Scholes Barenblatt equation developed by Avellaneda et al. to derive price bounds. The model is implemented in C++ using recombining trinomial trees to discretize the asset prices over time and space. The code computes the upper and lower price bounds by solving the Black-Scholes Barenblatt PDE using numerical techniques, with the volatility set based on the sign of the option gamma.
This document compares the Black-Scholes model and Merton jump diffusion model for accurately estimating option prices. It estimates parameters for both models using historical stock price and option data from several large companies. The results do not conclusively show that Merton jump diffusion more accurately prices derivatives, but it may better approximate moments from the data. The document provides background on geometric Brownian motion, the Black-Scholes option pricing formula, and the maximum likelihood estimation and EM algorithm used to estimate Merton jump diffusion model parameters.
On the stability and accuracy of finite difference method for options pricingAlexander Decker
1) The document discusses finite difference methods for pricing options, specifically the implicit and Crank-Nicolson methods.
2) It analyzes the stability and accuracy of each method. The Crank-Nicolson method is found to be more accurate and converges faster than the implicit method.
3) Numerical examples are provided to demonstrate the convergence properties of each method compared to the true Black-Scholes value. The results show that the Crank-Nicolson method converges to the solution faster than the implicit method as the time and price steps approach zero.
The document discusses a model of thermal sneutrino dark matter in the FD-term model of hybrid inflation. The FD-term model extends the MSSM with an inflaton-waterfall sector generating an effective μ-term and neutrino masses. During hybrid inflation, the inflaton field acquires a vacuum expectation value that breaks supersymmetry, giving masses to the right-handed sneutrino. Thermal production of right-handed sneutrinos in the early universe can account for the observed dark matter abundance if the sneutrinos annihilate via Yukawa interactions.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
11.the comparative study of finite difference method and monte carlo method f...Alexander Decker
This document compares the finite difference method and Monte Carlo method for pricing European options. It provides an overview of these two primary numerical methods used in financial modeling. The Monte Carlo method simulates asset price paths and averages discounted payoffs to estimate option value. It is well-suited for path-dependent options but converges slower than finite difference. The finite difference method solves the Black-Scholes PDE by approximating it on a grid. Specifically, it discusses the Crank-Nicolson scheme, which is unconditionally stable and converges faster than Monte Carlo for standard options.
From L to N: Nonlinear Predictors in Generalized Modelshtstatistics
1) Generalized nonlinear models (GNM) extend generalized linear models (GLM) by allowing the predictor to be a nonlinear function of the parameters. This allows for more parsimonious and interpretable models.
2) Estimating GNMs can be challenging due to difficulties generating starting values and the potential for multiple optima in the likelihood. The gnm package in R uses random starting values and a one-parameter-at-a-time Newton method.
3) The stereotype model is a special case of the multinomial logistic model that is suitable for ordered categorical data where only the scale of the relationship with covariates varies between categories. It can be fit as a GNM using a "Po
This summary provides the key details from the document in 3 sentences:
The document proposes a new method for encrypting two images into a single encrypted image using generalized weighted fractional Fourier transform (GWFRFT) with double random phase encoding. The encryption process involves applying pixel scrambling, phase encoding, and two rounds of GWFRFT with random phase masks on the combined image signal. This technique is shown to provide comparable security to the Advanced Encryption Standard (AES) with a 232-bit key size through a high number of possible permutations in the GWFRFT parameters and orders.
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)Jia-Bin Huang
This document presents a physics-based approach for detecting moving cast shadows in video sequences. It develops a new physical model to characterize the variation in background appearance caused by cast shadows, without making assumptions about the spectral power distributions of light sources and ambient illumination. It uses a Gaussian mixture model to learn and update the shadow model parameters over time in an unsupervised manner. Experimental results on three challenging sequences demonstrate the effectiveness of the proposed method.
This slide is my presentation for a reading circle "Machine Learning Professional Series".
Japanese version is here.
http://www.slideshare.net/matsukenbook/ss-50545587
Multiplicative Interaction Models in Rhtstatistics
The document discusses multiplicative interaction models and the gnm R package for fitting these models. The gnm package allows fitting of generalized nonlinear models, including models with standard multiplicative interactions. It uses an over-parameterized representation and a two-stage procedure involving a one-parameter-at-a-time Newton method and full Newton-Raphson to estimate model parameters. The document provides an example analysis fitting a homogeneous row-column association model to occupational status data.
System 1 and System 2 were basic early systems for image matching that used color and texture matching. Descriptor-based approaches like SIFT provided more invariance but not perfect invariance. Patch descriptors like SIFT were improved by making them more invariant to lighting changes like color and illumination shifts. The best performance came from combining descriptors with color invariance. Representing images as histograms of visual word occurrences captured patterns in local image patches and allowed measuring similarity between images. Large vocabularies of visual words provided more discriminative power but were costly to compute and store.
gnm: a Package for Generalized Nonlinear Modelshtstatistics
The gnm package provides functions for fitting generalized nonlinear models in R. It allows specification of models with multiplicative and other nonlinear terms. Models are fitted using maximum likelihood, with standard arguments that make gnm behave similarly to glm. The package handles over-parameterized representations of models and provides tools for estimating identifiable parameter combinations. An example analysis fits row-column association and homogeneous association models to occupational mobility data.
Learning Moving Cast Shadows for Foreground Detection (VS 2008)Jia-Bin Huang
The document summarizes a research paper about learning moving cast shadows for foreground detection. It presents a proposed algorithm that uses a confidence-rated Gaussian mixture learning approach and Bayesian framework with Markov random fields to model local and global shadow features. This exploits the complementary nature of local and global features to improve shadow detection. The algorithm is evaluated on outdoor and indoor video sequences, showing improved accuracy over previous methods especially in adaptability to different lighting conditions. Future work could incorporate additional features and more powerful models.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise boosts blood flow, releases endorphins, and promotes changes in the brain which help regulate emotions and stress levels.
This document provides information about asexuality. It defines asexuality as lacking sexual attraction rather than having a fear of sex, lack of libido, or being celibate. It discusses that asexual people can experience romantic attraction and form romantic relationships without sexual attraction or activity. It seeks to dispel common misconceptions that asexuality is a choice, medical condition, or phase by explaining that sexual orientation involves inherent feelings rather than behaviors or choices. The document also notes that while some asexual people do not desire relationships, others can experience romantic love, and that asexuality is not inherently anti-sex but rather advocates for sexual freedom and understanding for all orientations.
This document outlines social media and marketing recommendations for the Miami Beach Hair Institute. It discusses setting up social media accounts, engaging with online communities, building credibility through reviews and presentations, creating a video blog and chat, hosting an open house, chronicling a patient's hair restoration process through video diaries, partnering with a wish-granting organization for PR, and using augmented reality to demonstrate the hair restoration process to potential clients. The recommendations aim to build the institute's online presence, engage with their target audience, and help potential clients understand the hair restoration process.
The document describes a game that involves office chairs and blindfolds. The game is similar to capture the flag. There are two teams of attackers trying to reach the flag without getting tagged by the blindfolded defenders. A ninth person acts as the referee to start the game and ensure players follow the rules. When testing the game, it was discovered that a larger space was needed due to the number of players, and it was difficult to control blindfolded players who did not listen to their instructors or the extra noise from other players. A referee with a whistle would help maintain better control.
Raising the Chorus: CLA Presentation Nov2010alsonnie
This document summarizes the challenges faced by a New Jersey public library in removing the LGBTQ youth anthology Revolutionary Voices from its shelves due to pressure from a local conservative group. The library director personally removed all copies of the book citing vague claims of "child pornography". In response, students organized public readings of excerpts from the book to advocate for intellectual freedom and inclusion. The document outlines the controversy and different perspectives, and provides resources for further information.
A New Approach for Speech Enhancement Based On Eigenvalue Spectral SubtractionCSCJournals
In this paper, a phase space reconstruction-based method is proposed for speech enhancement. The method embeds the noisy signal into a high dimensional reconstructed phase space and uses Spectral Subtraction idea. The advantages of the proposed method are fast performance, high SNR and good MOS. In order to evaluate the proposed method, ten signals of TIMIT database mixed with the white additive Gaussian noise and then the method was implemented. The efficiency of the proposed method was evaluated by using qualitative and quantitative criteria.
This document proposes a frame-based adaptive compressed sensing technique for speech signals. It divides speech sequences into non-overlapping frames and processes each frame independently. It then classifies frames based on the correlation between the current and previous frames. Depending on the classification, different sampling strategies are applied - fewer measurements for highly correlated frames and more measurements for frames with more new information. Experimental results show the proposed adaptive technique provides better speech reconstruction quality compared to non-adaptive compressed sensing.
This document proposes novel uses of a hypothetical higher-dimensional rasterizer in future graphics hardware. It discusses six potential applications: 1) Continuous collision detection by exploiting the volume swept by moving triangles, 2) Caustic rendering using defocused triangle extents for sampling, 3) Using the rasterizer for flexible higher-dimensional sampling. It also discusses 4) Rendering glossy reflections/refractions by rendering from multiple viewpoints simultaneously, 5) Motion blurred soft shadows through appropriate use of time and sampling, and 6) Stereoscopic and multi-view rendering with motion and defocus blur. The document presents a model for an efficient 5D stochastic rasterization pipeline to support these and other potential uses.
Performance Analysis of Adaptive DOA Estimation Algorithms For Mobile Applica...IJERA Editor
Spatial filtering for mobile communications has attracted a lot of attention over the last decade and is cur-rently considered a very promising technique that will help future cellular networks achieve their ambi-tious goals. One way to accomplish this is via array signal processing with algorithms which estimate the Direction-Of-Arrival (DOA) of the received waves from the mobile users. This paper evaluates the per-formance of a number of DOA estimation algorithms. In all cases a linear antenna array at the base station is assumed to be operating typical cellular environment.
In this paper, we analyzed a numerical evaluation of the performance of MIMO radio systems in the LTE network environment. Downlink physical layer of the OFDM-MIMO based radio interface is considered for system model and a theoretical analysis of the bit error rate of the two space-time codes (SFBC 2×1 and FSTD 4×2 codes are adopted by the LTE norm as a function of the signal to noise ratio. Analytical expressions are given for transmission over a Rayleigh channel without spatial correlation which is then compared with Monte-Carlo simulations. Further evaluated channel capacity and simulation results show throughput almost reaches to the capacity limit.
MULTI RESOLUTION LATTICE DISCRETE FOURIER TRANSFORM (MRL-DFT)csandit
In many imaging applications, including sensor arrays, MRI and CT,data is often sampled on
non-rectangular point sets with non-uniform density. Moreover, in image and video processing,
a mix of non-rectangular sampling structures naturally arise. Multirate processing typically
utilizes a normalized integer indexing scheme, which masks the true physical dimensions of the
points. However, the spatial correlation of such signals often contains important
information.This paper presents a theory of signals defined on regular discrete sets called
lattices, and presents an associated form of a finite Fourier transform denoted here as
multiresolution lattice discrete Fourier transform (MRL-DFT). Multirate processing techniques
such as decimation, interpolation and polyphase representations are presented in a context
which preserves the true spatial dimensions of the sampling structure.Moreover, the polyphase
formulation enables systematic representation and processing for sampling patterns with
variable spatial density, and provides a framework for developing generalized FFT and
regridding algorithms.
Терагерцовая камера MIT сканирует закрытые книгиAnatol Alizar
Terahertz time-domain spectroscopy is used to extract occluded textual content from between the pages of a stacked paper sample. The technique uses time resolution to isolate reflections from each paper layer and a time-gated spectral analysis to enhance contrast of the textual content on each layer. This allows extraction of characters written on each of nine paper pages packed together, overcoming issues of low signal-to-noise ratio, contrast between content and layers, and occlusion from upper layers. The method provides over an order of magnitude better signal contrast than conventional amplitude mapping and could enable non-destructive inspection of layered structures.
Comparison and analysis of combining techniques for spatial multiplexing spac...IAEME Publication
This document compares different combining techniques for space-time block coded systems in Rayleigh fading channels. It finds that maximum ratio combining outperforms other techniques like equal gain combining and selection combining for any space-time block code configuration, providing the best bit error rate. The document provides background on space-time block codes, describes the Alamouti space-time code, and discusses various receive diversity combining techniques.
Comparison and analysis of combining techniques for spatial multiplexingspace...IAEME Publication
This document compares different combining techniques for space-time block coded systems in Rayleigh fading channels. It finds that maximum ratio combining outperforms other techniques like equal gain combining and selection combining for any space-time block code configuration, providing the best bit error rate. The document provides background on space-time block codes, describes the Alamouti space-time code, and discusses various receive diversity combining techniques.
Comparison and analysis of combining techniques for spatial multiplexingspace...IAEME Publication
This document compares different combining techniques for space-time block coded systems in Rayleigh fading channels. It finds that maximum ratio combining outperforms other techniques like equal gain combining and selection combining for any type of space-time block code configuration, even in Rayleigh faded environments. The document provides background on space-time block codes and describes techniques like Alamouti coding and maximum likelihood decoding. It also discusses receive diversity and combining techniques at the receiver including selection combining, maximal ratio combining, and equal gain combining.
A Text-Independent Speaker Identification System based on The Zak TransformCSCJournals
This paper presents a novel text-independent speaker identification system based on the discrete Zak transform. The system uses the Zak transform coefficients as features to model 23 speakers from the ELSDSR database. During identification, the Euclidean distance between the Zak transform of the test speech and each speaker model is calculated. The speaker with the minimum distance is identified. The system achieves an identification efficiency of 91.3% using a single test file and 100% using two test files. The Zak-based method is also faster and has comparable accuracy to MFCC-based speaker identification. The paper also explores dividing signals into segments and averaging the Zak transforms, which improves efficiency while only slightly increasing modeling time.
The accuracy of the speaker identification system reduces severely in the presence of noise. The auditory based features called gammatone frequency ceptral coefficient (GFCC) which resembles to the ability of human ear to identify the speaker identity's in noisy environment. The GFCC is based on gammatone filter bank which models the basilar membrane as a sequence of overlapping band pass filters. The system uses GFCC features and Gaussian Mixture Model - Universal Background Model (GMM - UBM ) for modeling of features of the speaker. The experiment are conducted on the English Language Speech Database for Speaker Recognition (ELSDR) databases. In order to find out the performance of the system, the test utterances are mixed with noises at various SNR levels to simulate the channel change. The results shows that the GFCC features has a good identification accuracy not only in clean sample, but also for noisy condition.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Steganographic Scheme Based on Message-Cover matchingIJECEIAES
Steganography is one of the techniques that enter into the field of information security, it is the art of dissimulating data into digital files in an imperceptible way that does not arise the suspicion. In this paper, a steganographic method based on the FaberSchauder discrete wavelet transform is proposed. The embedding of the secret data is performed in Least Significant Bit (LSB) of the integer part of the wavelet coefficients. The secret message is decomposed into pairs of bits, then each pair is transformed into another based on a permutation that allows to obtain the most matches possible between the message and the LSB of the coefficients. To assess the performance of the proposed method, experiments were carried out on a large set of images, and a comparison to prior works is accomplished. Results show a good level of imperceptibility and a good trade-off imperceptibility-capacity compared to literature.
This document summarizes an academic paper that proposes modifying well-known local linear models for system identification by replacing their original recursive learning rules with outlier-robust variants based on M-estimation. It describes three existing local linear models - local linear map (LLM), radial basis function network (RBFN), and local model network (LMN) - and then introduces the concept of M-estimation as a way to make the learning rules of these models more robust to outliers. The performance of the proposed outlier-robust variants is evaluated on three benchmark datasets and is found to provide considerable improvement in the presence of outliers compared to the original models.
SEGMENTATION OF POLARIMETRIC SAR DATA WITH A MULTI-TEXTURE PRODUCT MODELgrssieee
1) The document describes a segmentation algorithm for polarimetric SAR (PolSAR) data that can model both scalar-texture and multi-texture scattering.
2) The algorithm uses log-cumulants and hypothesis testing to determine whether a scalar-texture or dual-texture model best fits the data within each segment.
3) The algorithm is tested on simulated multi-texture PolSAR data and is shown to accurately segment the classes and estimate their texture parameters. However, when applied to real data sets, the algorithm only finds the simpler scalar-texture case.
This paper aims, a 3D-Pilot Aided Multi-Input Multi-Output Orthogonal Frequency Division Multiplexing (MIMO-OFDM) Channel Estimation (CE) for Digital Video Broadcasting -T2 (DVB-T2) for the 5 different proposed block and comb pilot patterns model and performed on different antenna configuration. The effects of multi-transceiver antenna on channel estimation are addressed with different pilot position in frequency, time and the vertical direction of spatial domain framing. This paper first focus on designing of 5-different proposed spatial correlated pilot pattern model with optimization of pilot overhead. Then it demonstrates the performance comparison of Least Square (LS) & Linear Minimum Mean Square Error (LMMSE), two linear channel estimators for 3D-Pilot Aided patterns on different antenna configurations in terms of Bit Error Rate. The simulation results are shown for Rayleigh fading noise channel environments. Also, 3x4 MIMO configuration is recommended as the most suitable configuration in this noise channel environments.
This document proposes using a hybrid model and structured sparsity for under-determined convolutive audio source separation. It presents a mathematical model that combines a convex cost function with sparse regularization terms. A hybrid model is introduced using a union of two Gabor frames, each adapted to a different "morphological layer" of the signal. Structured sparsity is incorporated using a windowed group lasso operator to better exploit time-frequency structure. Experiments on speech and music mixtures show improved source separation performance compared to baseline methods, confirming the benefits of the proposed hybrid and structured sparsity approaches.
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...NU_I_TODALAB
IEEE International Workshop on Machine Learning for Signal Processing (MLSP2017)
Nominated For Best Student Paper Award (student: Shogo Seki)
Shogo Seki, Hirokazu Kameoka, Tomoki Toda, Kazuya Takeda: Missing Component Restoration for Masked Speech Signals based on Time-Domain Spectrogram Factorization,Sep. 2017
Toda Laboratory, Department of Intelligent Systems, Graduate School of Informatics, Nagoya University
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
10.1.1.11.1180
1. SPEAKER IDENTIFICATION USING GAUSSIAN MIXTURE MODELS
BASED ON MULTI-SPACE PROBABILITY DISTRIBUTION
Chiyomi Miyajima†, Yosuke Hattori†,∗, Keiichi Tokuda†, Takashi Masuko‡, Takao Kobayashi‡, and Tadashi Kitamura†
†
Nagoya Institute of Technology, Nagoya 466-8555, Japan
‡
Tokyo Institute of Technology, Yokohama 226-8502, Japan
∗
Present address: DENSO Corporation, Kariya 448-8661, Japan
†
{chiyomi, tokuda, kitamura}@ics.nitech.ac.jp, ‡ {masuko, tkobayas}@ip.titech.ac.jp, ∗ hattori@rd.denso.co.jp
ABSTRACT In this paper a new speaker modeling technique using a
GMM based on multi-space probability distribution (MSD) [8] is
This paper presents a new approach to modeling speech spectra introduced. The MSD-GMM allows us to model feature vectors
and pitch for text-independent speaker identification using Gaus- with variable dimensionality including zero-dimensional vectors,
sian mixture models based on multi-space probability distribution i.e., discrete symbols. Consequently, continuous pitch values
(MSD-GMM). The MSD-GMM allows us to model continuous for voiced frames and discrete symbols representing “unvoiced”
pitch values for voiced frames and discrete symbols represent- can be modeled using an MSD-GMM in a unified framework,
ing unvoiced frames in a unified framework. Spectral and pitch and spectral and pitch features are jointly modeled by a multi-
features are jointly modeled by a two-stream MSD-GMM. We stream MSD-GMM, i.e., each speaker is modeled by a single
derive maximum likelihood (ML) estimation formulae for the statistical model. We derive maximum likelihood (ML) esti-
MSD-GMM parameters, and the MSD-GMM speaker models mation formulae and evaluate the MSD-GMM speaker models
are evaluated for text-independent speaker identification tasks. for text-independent speaker identification tasks comparing with
Experimental results show that the MSD-GMM can efficiently conventional GMM speaker models.
model spectral and pitch features of each speaker and outper- The rest of the paper is organized as follows. In Section
forms conventional speaker models. 2, we introduce a speaker modeling technique based on MSD-
GMM. Section 3 presents the ML estimation procedure for MSD-
1. INTRODUCTION GMM parameters. Section 4 reports experimental results, and
Section 5 gives conclusions and future works.
Gaussian mixture models (GMMs) have been successfully ap-
plied to speaker modeling in text-independent speaker identifica- 2. MULTI-STREAM MSD-GMM
tion [1]. Such identification systems mainly use spectral features
represented by cepstral coefficients as speaker features. Pitch fea- 2.1. Likelihood Calculation
Let us assume that a given observation Ót at time t consists
tures as well as spectral features contain much speaker specific
of S information sources (streams). The s-th stream Óts has
information [2, 3]. However, most of speaker recognition studies
in recent years have focused on using only spectral features. The
a set of space indices Xts and a random vector with variable
dimensionality Üts , that is
main reasons for this are i) the use of pitch features alone could
not give enough recognition performance and ii) pitch values are
not defined in unvoiced segments and this complicates speaker
modeling and feature integration.
Ót = (Ót1 , Ót2 , . . . , ÓtS ), (1)
Several works have reported that speaker recognition accu- Óts = (Xts , Üts ) . (2)
racy can be improved by the use of pitch features in addition
Note here that Xts is a subset of all possible space indices
to spectral features [4, 5, 6, 7]. There are essentially two ap-
{1, 2, . . . , Gs }, and all the spaces represented by the indices in
Xts have the same dimensionality as Üts .
proaches to integrating spectral and pitch information: i) two
separate models are used for spectral and pitch features and their
We define the output probability distribution of an S-stream
MSD-GMM λ for Ót as
scores are combined [4, 5], ii) two separate models for voiced
and unvoiced parts are trained and their scores are combined
[6, 7]. In [7], two separate GMMs are used, where the input ob- M S
servations are concatenations of cepstral coefficients and log F0 b(Ót | λ) = cm pms (Óts ), (3)
for voiced frames and cepstral coefficients alone for unvoiced m=1 s=1
frames. Since the probability distribution of the conventional
GMM is defined on a single vector space, these two kinds of where cm is the mixture weight for the m-th mixture component.
vectors require their respective models. The observation probability of Óts for mixture m is given by the
multi-space probability distribution (MSD) [8] :
A part of this work was supported by Research Fellowships from
pms (Óts ) = wmsg Nmsg (Üts ),
sg D
the Japan Society for the Promotion of Science for Young Scientists, (4)
No. 199808177. g∈Xts
2. Stream 1 (G1 = 4) t
/ashi/
Frequency (Hz) Frequency (kHz)
Observation at time t 0
Spectrum
5
wm11 1
N m11 ( ) 2
o t = (o t1, o t2, o t3 )
5
wm12 3
Nm12 (xt1 ) 4
5
wm13 pm1 (ot1 )
5 UV V UV V UV
5-dimensional Nm13 (xt1 ) 200
wm14 150
Pitch
8
Nm14 ( ) 100
ot1 = ( { 2, 3 } , xt1 )
50
ot2 = ( { 1 } , xt2 ) 7-dimensional Stream 2 (G2 = 1)
wm21 = 1
ot3 = ( { 2 } , xt3 ) 7
Nm21 (xt2 ) Fig. 2. An example of spectral and pitch sequences of a word
“/ashi/” spoken by a male speaker.
Stream 3 (G3 = 2) pm2 (ot2 )
0-dimensional
8 wm31 o t = (o t1, o t2 ) Spectrum (G1 = 1)
Nm31 ( )
pm3 (ot3 ) D-dimensional D wm11 = 1 Nm11 (xt1 )
D
0
Nm32 (xt3 ) w Nm11 (xt1)
m32
ot1 = ( { 1 } , x t1 )
Pitch (G2 = 2)
pm1 (ot1 ) = wm12 Nm12 (xt1 )
5
+ wm13 Nm13 (xt1 )
5 ot2 = ( { 1 } , x t2 ) V 1 wm21
Stream 1: Nm21 (xt2)
Stream 2: pm2 (ot2 ) = Nm21 (xt2 )
7
0 wm21 Nm21 (xt2 )
1
1-dimensional Nm22 ( ) w
or UV m22 or
Stream 3: pm3 (ot3 ) = wm32 0-dimensional wm22
Fig. 3. The m-th mixture component in a two-stream MSD-
Fig. 1. An example of the m-th mixture component of a three-
GMM based on spectra and pitch.
stream MSD-GMM.
where wmsg is the weight for the g-th vector space of the s-th voiced frames and discrete symbols representing “unvoiced” in
Dsg unvoiced frames because pitch values are defined only in voiced
stream and Nmsg ( · ) is the Dsg -variate Gaussian function with
segments.
mean vector msg and covariance matrix Σmsg (for the case
0 As shown in Fig. 3, each speaker can be modeled by a
Dsg > 0). For simplicity of notation, we define Nmsg ( · ) ≡ 1 two-stream MSD-GMM (S = 2); the first stream is for the
(for the case Dsg = 0). Note here that the multi-space probabil- spectral feature and the second stream is for the pitch feature.
ity distribution (MSD) is equivalent to continuous probability dis- The spectral stream has a D-dimensional space (G1 = 1), and
tribution and discrete probability distribution when Dsg ≡ n > 0 the pitch stream has two spaces (G2 = 2) : a one-dimensional
and Dsg ≡ 0, respectively. Also, an MSD-GMM is assumed to space and a zero-dimensional space for voiced and unvoiced
be a generalized GMM, which includes the traditional GMM as parts, respectively.
a special case when S = 1, G1 = 1, and D11 > 0.
For an observation sequence Ç = (Ó1 , Ó2 , . . . , ÓT ), the
likelihood of MSD-GMM λ is given by 3. ML-ESTIMATION FOR MSD-GMM
T In a similar way to the ML-estimation procedure in [8], P (Ç | λ)
P (Ç | λ) = b(Ót | λ). (5) is increased by iterating the maximization of an auxiliary function
t=1 Q(λ , λ) over λ to improve current parameters λ based on the
expectation maximization (EM) algorithm.
Figure 1 illustrates an example of the m-th mixture compo-
nent of a three-stream MSD-GMM (S = 3). The sample space of
the first stream consists of four spaces (G1 = 4), among which 3.1. Definition of Q-Function
the second and third spaces are triggered by the space indices The log-likelihood of λ for an observation sequence Ç , a se-
and pm1 (Ót1 ) becomes the sum of the two weighted Gaussians. quence of mixture components and a sequence of space indices
The second stream has only one space (G2 = 1) and always out- Ð can be written as
puts its Gaussian as pm2 (Ót2 ). The third stream consists of two
spaces (G3 = 2), where a zero-dimensional space is selected, T
and outputs its space weight wm32 (a discrete probability) as log P (Ç, , Ð | λ) = log cit
pm3 (Ót3 ). t=1
T S T S
log Nit slts (Üts ),
D slts
2.2. Speaker Modeling Based on MSD-GMM + log wit slts + (6)
t=1 s=1 t=1 s=1
Figure 2 shows an example of spectral and pitch sequences of
a Japanese word “/ashi/” spoken by a Japanese male speaker. where
Generally, spectral features are represented by multi-dimensional = (i1 , i2 , . . . , iT ), (7)
vectors of cepstral coefficients with continuous values. On the
other hand, pitch features are represented by one-dimensional Ð = (Ð1 , Ð2 , . . . , ÐT ), (8)
continuous values of log fundamental frequencies (log F0 ) in Ðt = (lt1 , lt2 , . . . , ltS ). (9)
3. Hence the Q-function is defined as Similarly, the second term is maximized as
Q(λ , λ) = P ( , Ð | Ç, λ ) log P (Ç , , Ð | λ) ξts (m, g)
all i,l t∈T (O ,s,g)
T wmsg = Gs
, (15)
= P ( , Ð | Ç, λ ) log cit ξts (m, l)
all i,l t=1 l=1 t∈T (O ,s,l)
T S
+ P ( , Ð | Ç, λ ) log wit slts where ξts (m, g) is the posterior probability of being in the g-th
all i,l t=1 s=1
space of stream s in the m-th mixture component at time t:
T S ξts (m, g) = P (it = m, lts = g | Ç, λ)
P ( , Ð | Ç, λ ) log Nit slts (Üts )
D
slts
+
all i,l t=1 s=1 = P (it = m | Ç, λ)P (lts = g | it = m, Ç, λ)
wmsg Nmsg (Üts )
sg D
= γt (m) .
pms (Óts )
M T (16)
= P (it = m | Ç , λ ) log cm
m=1 t=1 The third term is maximized by solving following equations:
∂
P (it = m, lts = g | Ç, λ )
M S Gs
+ ∂ msg t∈T (O ,s,g)
· log Nmsg (Üts ) = 0,
m=1 s=1 g=1 t∈T (O ,s,g) D
sg
(17)
P (it = m, lts = g | Ç, λ ) log wmsg ∂
P (it = m, lts = g | Ç, λ )
M S Gs ∂Σ−1
msg
t∈T (O ,s,g)
· log Nmsg (Üts ) = 0,
D
+ sg
(18)
m=1 s=1 g=1 t∈T (O ,s,g)
resulting in
P (it = m, lts = g | Ç, λ ) log Nmsg (Üts ),
sg D
(10)
where ξts (m, g)Üts
t∈T (O ,s,g)
T (Ç, s, g) = {t | g ∈ Xts } . (11) msg = Gs
, (19)
ξts (m, l)
È
3.2. Maximization of Q-Function l=1 t∈T (O ,s,l)
The first two terms of (10) have the form N ui log yi , which
i=1
attains a global maximum at the single point ξts (m, g)(Üts − msg )( Üts − msg )
ui t∈T (O ,s,g)
yi = N , for i = 1, 2, . . . , N, (12) Σmsg = .
Gs
uj ξts (m, l)
È
j=1 l=1 t∈T (O ,s,l)
(20)
N
under the constraints yi = 1 and yi ≥ 0. The maximiza-
i=1 The re-estimation is repeated iteratively using λ in place of λ
tion of the first term of (10) leads to the re-estimate of cm : and the final result is an ML estimation of the MSD-GMM.
T
P (it = m | Ç , λ ) 4. EXPERIMENTAL EVALUATION
t=1
cm = M T 4.1. Experimental Conditions
P (it = m | Ç , λ )
m=1 t=1
First, a speaker identification experiment was carried out using
T
the ATR Japanese speech database. We used word data spoken by
1
P (it = m | Ç , λ )
80 speakers (40 males and 40 females). Phonetically-balanced
=
T t=1
216 words are used for training each speaker model, and 520
T
common words per speaker are used for testing. The number of
1 tests was 41600 in total.
= γt (m), (13)
T t=1
Second, to evaluate the robustness of the MSD-GMM speaker
model against inter-session variability, we also conducted a speaker
where γt (m) is the posterior probability of being in the m-th identification experiment using the NTT database. The database
mixture component at time t, that is consists of sentence data uttered by 35 Japanese speakers (22
S males and 13 females) on five sessions over ten months (Aug.,
cm pms (Óts ) Sept., Dec. 1990, Mar., June 1991). In each session, 15 sen-
tences were recorded for each speaker. Ten sentences are com-
γt (m) = P (it = m | Ç , λ) =
s=1
.
b(Ót )
(14) mon to all speakers and all sessions (A-set), and five sentences
4. 10 better performance than the conventional GMM system using
ATR 95% CI NTT a spectral feature alone. Among the three systems, the MSD-
Speaker identification error rate (%)
database database
GMM GMM system gave the best results, and achieved 16 % and 18 %
8 S+P-GMMs error reductions (for the ATR database) and 38 % and 51 % error
V+UV-GMMs reductions (for the NTT database) over the GMM system when
MSD-GMM using 32 and 64 mixture models, respectively. It is also noted
6 that the MSD-GMM system requires no combination parameter
6.2
(such as α) which has to be chosen or tuned heuristically.
5.8
5.4
5.2
5.1
6.4
4
4.5
4.4
5. CONCLUSION
4.2
5.6
4.4
4.3
2 This paper has introduced a new technique for modeling speakers
3.7
3.4
3.4
based on MSD-GMM for text-independent speaker identification.
3.1
The MSD-GMM can model continuous pitch values of voiced
0 frames and discrete symbols representing “unvoiced” in a unified
32
32+32
24+8
32
64
64+64
48+16
64
32
32+32
24+8
32
64
64+64
48+16
64
framework. Spectral and pitch features can be jointly modeled
by a multi-stream MSD-GMM. We derived the ML estimation
Number of mixture components
formulae for the MSD-GMM parameters and evaluated the MSD-
GMM speaker models for text-independent speaker identification
tasks. The experimental results demonstrated the high utility of
Fig. 4. Comparison of MSD-GMM speaker models with con- the MSD-GMM speaker model and also proved its robustness
ventional GMM and two-separate GMM speaker models. against the inter-session variability.
Introduction of stream weights to the multi-stream MSD-
GMM and application of this framework to speaker verification
are different for each speaker and each session (B-set). The dura- systems will be subjects for future works.
tion of each sentence is approximately four second. We used 15
sentences (A-set + B-set from the first session) per speaker for 6. REFERENCES
training, and 20 sentences (B-set from the other four sessions)
per speaker for testing. The number of tests was 700 in total. [1] D.A. Reynolds and R.C. Rose, “Robust text-independent
The speech data were down-sampled to 10 kHz, windowed speaker identification using Gaussian mixture speaker mod-
at a 10-ms frame rate with a 25.6-ms Blackman window, and pa- els,” IEEE Trans. Speech and Audio Process., 3 (1), 72–83,
rameterized into 13 mel-cepstral coefficients using a mel-cepstral Jan. 1995.
estimation technique [9]. The 12 static parameters excluding the [2] B.S. Atal, “Automatic speaker recognition based on pitch
zero-th coefficient were used as a spectral feature. Fundamen- contours,” J. Acoust. Soc. Amer., 52 (6), 1687–1697, 1972.
tal frequencies (F0 ) were estimated at a 10-ms frame rate using [3] S. Furui, “Research on individuality features in speech
the RAPT method [10] with a 7.5-ms correlation window, and waves and automatic speaker recognition techniques,”
log F0 for the voiced frames and discrete symbols for unvoiced Speech Communication, 5 (2), 183–197, 1986.
frames were used as a pitch feature. Speakers were modeled by [4] M.J. Carey, E.S. Parris, H. Lloyd-Thomas, and S. Bennett,
GMMs or multi-stream MSD-GMMs with diagonal covariances. “Robust prosodic features for speaker identification,” Proc.
ICSLP’96, vol.3, pp.1800–1804, Oct. 1996.
4.2. Experimental Results [5] M.K. S¨ nmez, L. Heck, M. Weintraub, and E. Shriberg,
o
“A lognormal tied mixture model of pitch for prosody-
The MSD-GMM speaker identification system was compared based speaker recognition,” Proc. EUROSPEECH’97, vol.3,
with three kinds of conventional systems. Figure 4 shows speaker pp.1391–1394, Sept. 1997.
identification error rates with 95 % confidence intervals (CIs) [6] T. Matsui and S. Furui, “Text-independent speaker recog-
when using 32 and 64 component speaker models. The left and nition using vocal tract and pitch information,” Proc.
right halves of the figure correspond to the results for the ATR ICSLP’90, vol.1, pp.137–140, Nov. 1990.
and NTT databases, respectively. In the figure, “GMM” denotes [7] K.P. Markov and S. Nakagawa, “Integrating pitch and
a conventional GMM speaker model using a spectral feature LPC-residual information with LPC-cepstrum for text-
alone. “S+P-GMMs” represents a speaker model consisting of independent speaker recognition,” J. Acoust. Soc. Jpn., (E),
two GMMs for spectra and pitch. “V+UV-GMMs” is a speaker 20 (4), 281–291, July 1999.
model consisting of two GMMs for voiced (V) and unvoiced [8] K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi,
(UV) parts [7] with the optimum numbers of mixture compo- “Hidden Markov models based on multi-space probability
nents for the V-GMM and the UV-GMM, i.e., 24 (V)+8 (UV) or distribution for pitch pattern modeling,” Proc. ICASSP’99,
48 (V)+16 (UV), and a linear combination parameter α = 0.5 vol.1, pp.229–232, May 1999.
(α is the weight for the likelihood of the UV-GMM). “MSD- [9] T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, “An adap-
GMM” denotes the proposed model based on the multi-stream tive algorithm for mel-cepstral analysis of speech,” Proc.
MSD-GMM. ICASSP’92, vol.1, pp.137–140, Mar. 1992.
As shown in the figure, the additional use of pitch infor- [10] D. Talkin, “A robust algorithm for pitch tracking,” in
mation significantly improved the system performance, and the Speech Coding and Synthesis, W.B. Kleijn and K.K. Pali-
three systems using both spectral and pitch features gave much wal eds., Elsevier Science, pp.495–518, 1995.