SlideShare a Scribd company logo
1 of 53
A REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED
ON DEEP NEURAL NETWORKS
CHAPTER 1
ABSTRACT:
In contrast to the conventional minimum mean square error (MMSE)-based noise reduction
techniques, we propose a supervised method to enhance speech by means of finding a mapping
function between noisy and clean speech signals based on deep neural networks (DNNs). In
order to be able to handle a wide range of additive noises in real-world situations, a large training
set that encompasses many possible combinations of speech and noise types, is first designed.
DNN architecture is then employed as a nonlinear regression function to ensure a powerful
modeling capability. Several techniques have also been proposed to improve the DNN-based
speech enhancement system, including global variance equalization to alleviate the over-
smoothing problem of the regression model, and the dropout and noise-aware training strategies
to further improve the generalization capability of DNNs to unseen noise conditions.
Experimental results demonstrate that the proposed framework can achieve significant
improvements in both objective and subjective measures over the conventional MMSE based
technique. It is also interesting to observe that the proposed DNN approach can well suppress
highly nonstationary noise, which is tough to handle in general. Furthermore, the resulting DNN
model, trained with artificial synthesized data, is also effective in dealing with noisy speech data
recorded in real-world scenarios without the generation of the annoying musical artifact
commonly observed in conventional enhancement methods.
1.2 INTRODUCTION:
SPEECH ENHANCEMENT:
Speech enhancement aims to improve speech quality by using various algorithms. The
objective of enhancement is improvement in intelligibility and/or overall perceptual quality of
degraded speech signal using audio signal processing techniques. Enhancing of speech degraded
by noise, or noise reduction, is the most important field of speech enhancement, and used for
many applications such as mobile phones, VoIP, teleconferencing systems, speech recognition,
and hearing aids
SPEECH RECOGNITION:
Speech recognition (SR) is the inter-disciplinary sub-field of computational linguistics which
incorporates knowledge and research in the linguistics, computer science, and electrical
engineering fields to develop methodologies and technologies that enables the recognition
and translation of spoken language into text by computers and computerized devices such as
those categorized as Smart Technologies and robotics. It is also known as "automatic speech
recognition" (ASR), "computer speech recognition", or just "speech to text" (STT).
Some SR systems use "training" (also called "enrollment") where an individual speaker reads
text or isolated vocabulary into the system. The system analyzes the person's specific voice and
uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy.
Systems that do not use training are called "speaker independent"[1] systems. Systems that use
training are called "speaker dependent".
Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call
home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control,
search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering
a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-
text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice
Input).
The term voice recognitionor speaker identification refers to identifying the speaker, rather than
what they are saying. Recognizing the speaker can simplify the task of translating speech in
systems that have been trained on a specific person's voice or it can be used to authenticate or
verify the identity of a speaker as part of a security process.
From the technology perspective, speech recognition has a long history with several waves of
major innovations. Most recently, the field has benefited from advances in deep learning and big
data. The advances are evidenced not only by the surge of academic papers published in the
field, but more importantly by the world-wide industry adoption of a variety of deep learning
methods in designing and deploying speech recognition systems. These speech industry players
include Microsoft, Google, IBM, Baidu (China), Apple, Amazon, Nuance, IflyTek (China),
many of which have publicized the core technology in their speech recognition systems being
based on deep learning.
MODELS, METHODS, AND ALGORITHMS:
HIDDEN MARKOV MODELS
Modern general-purpose speech recognition systems are based on Hidden Markov Models.
These are statistical models that output a sequence of symbols or quantities. HMMs are used in
speech recognition because a speech signal can be viewed as a piecewise stationary signal or a
short-time stationary signal. In a short time-scale (e.g., 10 milliseconds), speech can be
approximated as a stationary process. Speech can be thought of as a Markov model for many
stochastic purposes.
Another reason why HMMs are popular is because they can be trained automatically and are
simple and computationally feasible to use. In speech recognition, the hidden Markov model
would output a sequence of n-dimensional real-valued vectors (with n being a small integer, such
as 10), outputting one of these every 10 milliseconds. The vectors would consist
of cepstral coefficients, which are obtained by taking a Fourier transform of a short time window
of speech and decorrelating the spectrum using a cosine transform, then taking the first (most
significant) coefficients. The hidden Markov model will tend to have in each state a statistical
distribution that is a mixture of diagonal covariance Gaussians, which will give a likelihood for
each observed vector. Each word, or (for more general speech recognition systems),
each phoneme, will have a different output distribution; a hidden Markov model for a sequence
of words or phonemes is made by concatenating the individual trained hidden Markov models
for the separate words and phonemes.
Described above are the core elements of the most common, HMM-based approach to speech
recognition. Modern speech recognition systems use various combinations of a number of
standard techniques in order to improve results over the basic approach described above. A
typical large-vocabulary system would need context dependency for the phonemes (so phonemes
with different left and right context have different realizations as HMM states); it would
use cepstral normalization to normalize for different speaker and recording conditions; for
further speaker normalization it might use vocal tract length normalization (VTLN) for male-
female normalization and maximum likelihood linear regression (MLLR) for more general
speaker adaptation.
The features would have so-called delta and delta-delta coefficients to capture speech dynamics
and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the
delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps
by heteroscedastic linear discriminant analysis or a global semi-tied co variance transform (also
known as maximum likelihood linear transform, or MLLT). Many systems use so-called
discriminative training techniques that dispense with a purely statistical approach to HMM
parameter estimation and instead optimize some classification-related measure of the training
data. Examples are maximum mutual information (MMI), minimum classification error (MCE)
and minimum phone error (MPE).
Decoding of the speech (the term for what happens when the system is presented with a new
utterance and must compute the most likely source sentence) would probably use the Viterbi
algorithm to find the best path, and here there is a choice between dynamically creating a
combination hidden Markov model, which includes both the acoustic and language model
information, and combining it statically beforehand (the finite state transducer, or FST,
approach).
A possible improvement to decoding is to keep a set of good candidates instead of just keeping
the best candidate, and to use a better scoring function (re scoring) to rate these good candidates
so that we may pick the best one according to this refined score. The set of candidates can be
kept either as a list (the N-best list approach) or as a subset of the models (a lattice). Re scoring
is usually done by trying to minimize the Bayes risk (or an approximation thereof): Instead of
taking the source sentence with maximal probability, we try to take the sentence that minimizes
the expectancy of a given loss function with regards to all possible transcriptions (i.e., we take
the sentence that minimizes the average distance to other possible sentences weighted by their
estimated probability). The loss function is usually the Levenshtein distance, though it can be
different distances for specific tasks; the set of possible transcriptions is, of course, pruned to
maintain tractability. Efficient algorithms have been devised to re score lattices represented as
weighted finite state transducers with edit distances represented themselves as a finite state
transducer verifying certain assumptions.
DYNAMIC TIME WARPING (DTW)-BASED SPEECH RECOGNITION
Dynamic time warping is an approach that was historically used for speech recognition but has
now largely been displaced by the more successful HMM-based approach.
Dynamic time warping is an algorithm for measuring similarity between two sequences that may
vary in time or speed. For instance, similarities in walking patterns would be detected, even if in
one video the person was walking slowly and if in another he or she were walking more quickly,
or even if there were accelerations and deceleration during the course of one observation. DTW
has been applied to video, audio, and graphics – indeed, any data that can be turned into a linear
representation can be analyzed with DTW.
A well-known application has been automatic speech recognition, to cope with different
speaking speeds. In general, it is a method that allows a computer to find an optimal match
between two given sequences (e.g., time series) with certain restrictions. That is, the sequences
are "warped" non-linearly to match each other. This sequence alignment method is often used in
the context of hidden Markov models.
NEURAL NETWORKS
Neural networks emerged as an attractive acoustic modeling approach in ASR in the late 1980s.
Since then, neural networks have been used in many aspects of speech recognition such as
phoneme classification, isolated word recognition, and speaker adaptation.
In contrast to HMMs, neural networks make no assumptions about feature statistical properties
and have several qualities making them attractive recognition models for speech recognition.
When used to estimate the probabilities of a speech feature segment, neural networks allow
discriminative training in a natural and efficient manner. Few assumptions on the statistics of
input features are made with neural networks. However, in spite of their effectiveness in
classifying short-time units such as individual phones and isolated words, neural networks are
rarely successful for continuous recognition tasks, largely because of their lack of ability to
model temporal dependencies.
However, recently Recurrent Neural Networks(RNN's) and Time Delay Neural
Networks(TDNN's)[44] have been used which have been shown to be able to identify latent
temporal dependencies and use this information to perform the task of speech recognition. This
however enormously increases the computational cost involved and hence makes the process of
speech recognition slower. A lot of research is still going on in this field to ensure that TDNN's
and RNN's can be used in a more computationally affordable way to improve the Speech
Recognition Accuracy immensely.
Deep Neural Networks and Denoising Autoencoders[45] are also being experimented with to
tackle this problem in an effective manner.
Due to the inability of traditional Neural Networks to model temporal dependencies, an
alternative approach is to use neural networks as a pre-processing e.g. feature transformation,
DEEP NEURAL NETWORKS AND OTHER DEEP LEARNING MODELS
A deep neural network (DNN) is an artificial neural network with multiple hidden layers of units
between the input and output layers. Similar to shallow neural networks, DNNs can model
complex non-linear relationships. DNN architectures generate compositional models, where
extra layers enable composition of features from lower layers, giving a huge learning capacity
and thus the potential of modeling complex patterns of speech data.[47] The DNN is the most
popular type of deep learning architectures successfully used as an acoustic model for speech
recognition since 2010.
The success of DNNs in large vocabulary speech recognition occurred in 2010 by industrial
researchers, in collaboration with academic researchers, where large output layers of the DNN
based on context dependent HMM states constructed by decision trees were adopted. See
comprehensive reviews of this development and of the state of the art as of October 2014 in the
recent Springer book from Microsoft Research. See also the related background of automatic
speech recognition and the impact of various machine learning paradigms including notably deep
learning in a recent overview article.
One fundamental principle of deep learning is to do away with hand-crafted feature
engineering and to use raw features. This principle was first explored successfully in the
architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features, showing
its superiority over the Mel-Cepstral features which contain a few stages of fixed transformation
from spectrograms. The true "raw" features of speech, waveforms, have more recently been
shown to produce excellent larger-scale speech recognition results.
Since the initial successful debut of DNNs for speech recognition around 2009-2011, there have
been huge new progresses made. This progress (as well as future directions) has been
summarized into the following eight major areas:
1. Scaling up/out and speedup DNN training and decoding;
2. Sequence discriminative training of DNNs;
3. Feature processing by deep models with solid understanding of the underlying
mechanisms;
4. Adaptation of DNNs and of related deep models;
5. Multi-task and transfer learning by DNNs and related deep models;
6. Convolution neural networks and how to design them to best exploit domain knowledge
of speech;
7. Recurrent neural network and its rich LSTM variants;
8. Other types of deep models including tensor-based models and integrated deep
generative/discriminative models.
Large-scale automatic speech recognition is the first and the most convincing successful case of
deep learning in the recent history, embraced by both industry and academic across the board.
Between 2010 and 2014, the two major conferences on signal processing and speech recognition,
IEEE-ICASSP and Interspeech, have seen near exponential growth in the numbers of accepted
papers in their respective annual conference papers on the topic of deep learning for speech
recognition. More importantly, all major commercial speech recognition systems (e.g., Microsoft
Cortana, Xbox, Skype Translator, Google Now, Apple Siri, Baidu and iFlyTek voice search, and
a range of Nuance speech products, etc.) nowadays are based on deep learning methods.
APPLICATIONS:
IN-CAR SYSTEMS
Typically a manual control input, for example by means of a finger control on the steering-
wheel, enables the speech recognition system and this is signalled to the driver by an audio
prompt. Following the audio prompt, the system has a "listening window" during which it may
accept a speech input for recognition.
Simple voice commands may be used to initiate phone calls, select radio stations or play music
from a compatible smartphone, MP3 player or music-loaded flash drive. Voice recognition
capabilities vary between car make and model. Some of the most recentcar models offer natural-
language speech recognition in place of a fixed set of commands. Allowing the driver to use full
sentences and common phrases. With such systems there is, therefore, no need for the user to
memorize a set of fixed command words.
HEALTH CARE
Medical documentation
In the health care sector, speech recognition can be implemented in front-end or back-end of the
medical documentation process. Front-end speech recognition is where the provider dictates into
a speech-recognition engine, the recognized words are displayed as they are spoken, and the
dictator is responsible for editing and signing off on the document. Back-end or deferred speech
recognition is where the provider dictates into a digital dictation system, the voice is routed
through a speech-recognition machine and the recognized draft document is routed along with
the original voice file to the editor, where the draft is edited and report finalized. Deferred speech
recognition is widely used in the industry currently.
One of the major issues relating to the use of speech recognition in healthcare is that the
American Recovery and Reinvestment Act of 2009 (ARRA) provides for substantial financial
benefits to physicians who utilize an EMR according to "Meaningful Use" standards. These
standards require that a substantial amount of data be maintained by the EMR (now more
commonly referred to as an Electronic Health Record or EHR). The use of speech recognition is
more naturally suited to the generation of narrative text, as part of a radiology/pathology
interpretation, progress note or discharge summary: the ergonomic gains of using speech
recognition to enter structured discrete data (e.g., numeric values or codes from a list or
a controlled vocabulary) are relatively minimal for people who are sighted and who can operate a
keyboard and mouse.
A more significant issue is that most EHRs have not been expressly tailored to take advantage of
voice-recognition capabilities. A large part of the clinician's interaction with the EHR involves
navigation through the user interface using menus, and tab/button clicks, and is heavily
dependent on keyboard and mouse: voice-based navigation provides only modest ergonomic
benefits. By contrast, many highly customized systems for radiology or pathology dictation
implement voice "macros", where the use of certain phrases - e.g., "normal report", will
automatically fill in a large number of default values and/or generate boilerplate, which will vary
with the type of the exam - e.g., a chest X-ray vs. a gastrointestinal contrast series for a radiology
system.
As an alternative to this navigation by hand, cascaded use of speech recognition and information
extraction has been studied as a way to fill out a handover form for clinical proofing and sign-
off. The results are encouraging, and the paper also opens data, together with the related
performance benchmarks and some processing software, to the research and development
community for studying clinical documentation and language-processing.
MILITARY
High-performance fighter aircraft
Substantial efforts have been devoted in the last decade to the test and evaluation of speech
recognition in fighter aircraft. Of particular note is the U.S. program in speech recognition for the
Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), and a program in
France installing speech recognition systems on Mirage aircraft, and also programs in the UK
dealing with a variety of aircraft platforms. In these programs, speech recognizers have been
operated successfully in fighter aircraft, with applications including: setting radio frequencies,
commanding an autopilot system, setting steer-point coordinates and weapons release
parameters, and controlling flight display.
Working with Swedish pilots flying in the JAS-39 Gripen cockpit, Englund (2004) found
recognition deteriorated with increasing G-loads. It was also concluded that adaptation greatly
improved the results in all cases and introducing models for breathing was shown to improve
recognition scores significantly. Contrary to what might be expected, no effects of the broken
English of the speakers were found. It was evident that spontaneous speech caused problems for
the recognizer, as could be expected. A restricted vocabulary, and above all, a proper syntax,
could thus be expected to improve recognition accuracy substantially.
The Eurofighter Typhoon currently in service with the UK RAF employs a speaker-dependent
system, i.e. it requires each pilot to create a template. The system is not used for any safety
critical or weapon critical tasks, such as weapon release or lowering of the undercarriage, but is
used for a wide range of other cockpit functions. Voice commands are confirmed by visual
and/or aural feedback. The system is seen as a major design feature in the reduction of
pilot workload, and even allows the pilot to assign targets to himself with two simple voice
commands or to any of his wingmen with only five commands.
Speaker-independent systems are also being developed and are in testing for the F35 Lightning
II (JSF) and the Alenia Aermacchi M-346 Master lead-in fighter trainer. These systems have
produced word accuracy in excess of 98%.
HELICOPTERS
The problems of achieving high recognition accuracy under stress and noise pertain strongly to
the helicopter environment as well as to the jet fighter environment. The acoustic noise problem
is actually more severe in the helicopter environment, not only because of the high noise levels
but also because the helicopter pilot, in general, does not wear a facemask, which would reduce
acoustic noise in the microphone. Substantial test and evaluation programs have been carried out
in the past decade in speech recognition systems applications in helicopters, notably by the U.S.
Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace
Establishment (RAE) in the UK. Work in France has included speech recognition in the Puma
helicopter. There has also been much useful work in Canada. Results have been encouraging,
and voice applications have included: control of communication radios, setting
of navigation systems, and control of an automated target handover system.
As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot
effectiveness. Encouraging results are reported for the AVRADA tests, although these represent
only a feasibility demonstration in a test environment. Much remains to be done both in speech
recognition and in overall speech technology in order to consistently achieve performance
improvements in operational settings.
PROBLEMS:
In recent years, single-channel speech enhancement has attracted a considerable amount of
research attention because of the growing challenges in many important real-world applications,
including mobile speech communication, hearing aids design and robust speech recognition. The
goal of speech enhancement is to improve the intelligibility and quality of a noisy speech signal
degraded in adverse conditions. However, the performance of speech enhancement in real
acoustic environments is not always satisfactory. Numerous speech enhancement methods were
developed over the past several decades. Spectral subtraction subtracts an estimate of the short-
term noise spectrum to produce an estimated spectrum of the clean speech. In the iterative wiener
filtering was presented using an all-pole model.
A common problem usually encountered in these conventional methods is that the resulting
enhanced speech often suffers from an annoying artifact called “musical noise”. Another notable
work was the minimum mean-square error (MMSE) estimator introduced by Ephraim and Malah
[6]; their MMSE log-spectral amplitude estimator could result in much lower residual noise
without further affecting the speech quality. An optimally-modified log-spectral amplitude (OM-
LSA) speech estimator and a minima controlled recursive averaging (MCRA) noise estimation
approach were also presented in although these traditional MMSE-based methods are able to
yield lower musical noise (e.g., [10], [11]), a trade-off in reducing speech distortion and residual
noise needs to be made due to the sophisticated statistical properties of the interactions between
speech and noise signals. Most of these unsupervised methods are based on either the additive
nature of the background noise, or the statistical properties of the speech and noise signals.
However they often fail to track non-stationary noise for real-world scenarios in unexpected
acoustic conditions. Considering the complex process of noise corruption, a nonlinear model,
like the neural networks, might be suitable for modeling the mapping relationship between the
noisy and clean speech signals. Early work on using shallow neural networks (SNNs) as
nonlinear filters to predict the clean signal in the time or frequency domain has been proposed
the SNN with only one hidden layer using 160 neurons was proposed to estimate the
instantaneous signal-to-noise ratios (SNRs) on the amplitude modulation spectrograms (AMS),
and then the noise could be suppressed according to the estimated SNRs of different channels.
However, the SNR was estimated in the limited frequency resolution with 15 channels and it was
not efficient to suppress the noise type with sharp spectral peaks. Furthermore, the small network
size can not fully learn the relationship between the noisy feature and the target SNRs.
1.3 LITRATURE SURVEY:
CHAPTER 2
2.0 SYSTEM ANALYSIS
2.1 EXISTING SYSTEM:
Existing method using neural networks (NNs) in conventional joint density Gaussian mixture
model (JDGMM) based spectral conversion methods perform stably and effectively. However,
the speech generated by these methods suffer severe quality degradation due to the following two
factors: 1) inadequacy of JDGMM in modeling the distribution of spectral features as well as the
non-linear mapping relationship between the source and target speakers, 2) spectral detail loss
caused by the use of high-level spectral features such as mel-cepstra. Previously, we have
proposed to use the mixture of restricted Boltzmann machines (MoRBM) and the mixture of
Gaussian bidirectional associative memories (MoGBAM) to cope with these problems.
Previous methods to use a NN to construct a global non-linear mapping relationship between the
spectral envelopes of two speakers generatively trained by cascading two RBMs, which model
the distributions of spectral envelopes of source and target speakers respectively, using a
Bernoulli BAM (BBAM). Therefore, the proposed training method takes the advantage of the
strong modeling ability of RBMs in modeling the distribution of spectral envelopes and the
superiority of BAMs in deriving the conditional distributions for conversion. Careful
comparisons and analysis among the proposed method and some conventional methods are
presented in this paper. The subjective results show that the proposed method can significantly
improve the performance in terms of both similarity and naturalness compared to conventional
methods.
2.1.1 DISADVANTAGES:
Neural networks (NNs) with a single-layer architecture learning process, a large training set
ensures a powerful modeling capability to estimate the complicated nonlinear mapping from
observed noisy speech to desired clean signals. Acoustic context was found to improve the
continuity of speech to be separated from the background noises with the annoying musical
artifact commonly observed in conventional speech enhancement algorithms.
A series of pilot experiments were conducted under multi-condition training with more than 100
hours of simulated speech data, resulting in a bad generalization capability even in mismatched
testing conditions. When compared with the logarithmic minimum mean square error approach.
NN-based algorithm tends to achieve significant improvements in terms of various objective
quality measures. Furthermore, in a subjective preference evaluation with 10 listeners, 66.35% of
the subjects were found to prefer NN-based enhanced speech to that obtained with other
conventional low level technique.
2.2 PROPOSED SYSTEM:
We proposed a regression DNN based speech enhancement framework via training a deep and
wide neural network architecture using a large collection of heterogeneous training data with
four noise types. It was found that the annoying musical noise artifact could be greatly reduced
with the DNN-based algorithm and the enhanced speech also showed an improved speech
quality both in terms of objective and subjective measures DNN-based speech enhancement
framework to handle adverse conditions and non-stationary noise types in real-world situations.
In traditional speech enhancement techniques, the noise estimate is usually updated by averaging
the noisy speech power spectrum using time and frequency dependent smoothing factors, which
are adjusted based on the estimated speech presence probability in individual frequency bins its
noise tracking capacity is limited for highly non-stationary noise cases, and it tends to distort the
speech component in mixed signals if it is tuned for better noise reduction. In this work, the
acoustic context information, including the full frequency band and context frame expanding, is
well utilized to obtain the enhanced speech with reduced discontinuity.
Furthermore to improve the generalization capability we include more than 100 different noise
types in designing the training set for DNN which proved to be quite effective in handling
unseen noise types, especially non-stationary noise components. Three strategies are also
proposed to further improve the quality of enhanced speech and generalization capability of
DNNs. First, equalization between the global variance (GV) of the enhanced features and the
reference clean speech features is proposed to alleviate the over-smoothing issue in DNN-based
speech enhancement system. The second technique, called dropout, is a recently proposed
strategy for training neural networks on data sets where over-fitting may be a concern. While this
method was not designed for noise reduction, it was demonstrated to be useful for noise robust
speech recognition and we successfully apply it to a DNN as a regression model to produce a
network that has good generalization ability to variabilities in the input. Finally, noise aware
training (NAT), first proposed in is adopted to improve performance.
2.2.1 ADVANTAGES:
DNN training procedure used in and then propose several techniques to improve the baseline
DNN system so that the quality of the enhanced speech in matched noise conditions can be
maintained while the generalization capability to unseen noise can be increased. Our proposed
normalized clean Log-power spectra with the mask-based training targets and verified the
different initialization schemes. Then the evaluations of the proposed strategies demonstrated
their effectiveness to improve the generalization capacity to unseen noises.
The suppression against highly non-stationary noise was also found overall performance
comparisons on 15 unseen noises and on real-world noises between the proposed method
normalized clean Log-power spectra target was better than IRM and FFT-MASK at all
conditions in our experimental setup. IRM and FFT-MASK got the almost the same
performance. It should be noted that the proposed clean Log-power spectra normalized to mean
zero and unit variance is crucial, which is different from the FFT-MAG with the Log
compression followed by the percent normalization.
Finally, by using more noise types and the three proposed techniques, the PESQ improvements
of the proposed DNN approach over LogMMSE under unseen noise types in Table VIII are also
comparable to that under matched noise types reported in the STOI results to represent the
intelligibility of the enhanced speech were also presented in Table IX. LogMMSE is slightly
better than the noisy with an average STOI improvement from 0.81 to 0.82. The DNN baseline
trained with 100 hours got 0.86 STOI score on average. The proposed strategies could further
improve the performance.
2.3 HARDWARE & SOFTWARE REQUIREMENTS:
2.3.1 HARDWARE REQUIREMENT:
 Processor - Pentium –IV
 Speed - 1.1 GHz
 RAM - 256 MB (min)
 Hard Disk - 20 GB
 Floppy Drive - 1.44 MB
 Key Board - Standard Windows Keyboard
 Mouse - Two or Three Button Mouse
 Monitor - SVGA
2.3.2 SOFTWARE REQUIREMENTS:
.NET
 Operating System : Windows XP or Win7
 Front End : Microsoft Visual Studio .NET 2008
 Script : C# Script
 Document : MS-Office 2007
CHAPTER 3
3.0 SYSTEM DESIGN:
Data Flow Diagram / Use Case Diagram / Flow Diagram:
 The DFD is also called as bubble chart. It is a simple graphical formalism that can be
used to represent a system in terms of the input data to the system, various processing
carried out on these data, and the output data is generated by the system
 The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used
by the process, an external entity that interacts with the system and the information flows
in the system.
 DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
 DFD is also known as bubble chart. A DFD may be used to represent a system at any
level of abstraction. DFD may be partitioned into levels that represent increasing
information flow and functional detail.
NOTATION:
SOURCE OR DESTINATION OF DATA:
External sources or destinations, which may be people or organizations or other entities
DATA SOURCE:
Here the data referenced by a process is stored and retrieved.
PROCESS:
People, procedures or devices that produce data’s in the physical component is not identified.
DATA FLOW:
Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of
data.
MODELING RULES:
There are several common modeling rules when creating DFDs:
1. All processes must have at least one data flow in and one data flow out.
2. All processes should modify the incoming data, producing new forms of outgoing data.
3. Each data store must be involved with at least one data flow.
4. Each external entity must be involved with at least one data flow.
5. A data flow must be attached to at least one process.
3.1 ARCHITECTURE DIAGRAM:
3.2 DATAFLOW DIAGRAM
UML DIAGRAMS:
3.2 USE CASE DIAGRAM:
3.3 CLASS DIAGRAM:
3.4 SEQUENCE DIAGRAM:
3.5 ACTIVITY DIAGRAM:
CHAPTER 4
4.0 IMPLEMENTATION:
MINIMUM MEAN SQUARE ERROR (MMSE):
4.1 ALGORITHM:
SPEECH ENHANCEMENT ALGORITHMS (DNN):
4.2 MODULES:
AUDIO PREPROCESSING:
NOISE AWARE TRAINING:
NOISE REDUCTION DFT:
DEEP NEURAL NETWORKS:
SPEECH ENHANCEMENT:
4.3 MODULE DESCRIPTION:
AUDIO PREPROCESSING:
NOISE AWARE TRAINING:
NOISE REDUCTION DFT:
DEEP NEURAL NETWORKS:
SPEECH ENHANCEMENT:
CHAPTER 5
5.0 SYSTEM STUDY:
5.1 FEASIBILITY STUDY:
The feasibility of the project is analyzed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is
not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
 ECONOMICAL FEASIBILITY
 TECHNICAL FEASIBILITY
 SOCIAL FEASIBILITY
5.1.1 ECONOMICAL FEASIBILITY:
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.
5.1.2 TECHNICAL FEASIBILITY:
This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands on the available technical resources. This will lead to
high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.
5.1.3 SOCIAL FEASIBILITY:
The aspect of study is to check the level of acceptance of the system by the user. This includes
the process of training the user to use the system efficiently. The user must not feel threatened by
the system, instead must accept it as a necessity. The level of acceptance by the users solely
depends on the methods that are employed to educate the user about the system and to make him
familiar with it. His level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the system.
5.2 SYSTEM TESTING:
Testing is a process of checking whether the developed system is working according to the
original objectives and requirements. It is a set of activities that can be planned in advance and
conducted systematically. Testing is vital to the success of the system. System testing makes a
logical assumption that if all the parts of the system are correct, the global will be successfully
achieved. In adequate testing if not testing leads to errors that may not appear even many
months. This creates two problems, the time lag between the cause and the appearance of the
problem and the effect of the system errors on the files and records within the system. A small
system error can conceivably explode into a much larger Problem. Effective testing early in the
purpose translates directly into long term cost savings from a reduced number of errors. Another
reason for system testing is its utility, as a user-oriented vehicle before implementation. The best
programs are worthless if it produces the correct outputs.
5.2.1 UNIT TESTING:
A program represents the logical elements of a system. For a program to run satisfactorily, it
must compile and test data correctly and tie in properly with other programs. Achieving an error
free program is the responsibility of the programmer. Program testing checks for two types of
errors: syntax and logical. Syntax error is a program statement that violates one or more rules
of the language in which it is written. An improperly defined field dimension or omitted
keywords are common syntax errors. These errors are shown through error message generated by
the computer. For Logic errors the programmer must examine the output carefully.
UNIT TESTING:
5.1.3 FUNCTIONAL TESTING:
Functional testing of an application is used to prove the application delivers correct results, using
enough inputs to give an adequate level of confidence that will work correctly for all sets of
inputs. The functional testing will need to prove that the application works for each client type
and that personalization function work correctly. When a program is tested, the actual output is
compared with the expected output. When there is a discrepancy the sequence of instructions
must be traced to determine the problem. The process is facilitated by breaking the program into
self-contained portions, each of which can be checked at certain key points. The idea is to
compare program values against desk-calculated values to isolate the problems.
Description Expected result
Test for application window
properties.
All the properties of the windows are to be
properly aligned and displayed.
Test for mouse operations.
All the mouse operations like click, drag,
etc. must perform the necessary operations
without any exceptions.
FUNCTIONAL TESTING:
5.1. 4 NON-FUNCTIONAL TESTING:
The Non Functional software testing encompasses a rich spectrum of testing strategies,
describing the expected results for every test case. It uses symbolic analysis techniques. This
testing used to check that an application will work in the operational environment. Non-
functional testing includes:
 Load testing
 Performance testing
 Usability testing
 Reliability testing
 Security testing
Description Expected result
Test for all modules.
All peers should communicate in the
group.
Test for various peer in a distributed
network framework as it display all
users available in the group.
The result after execution should
give the accurate result.
5.1.5 LOAD TESTING:
An important tool for implementing system tests is a Load generator. A Load generator is
essential for testing quality requirements such as performance and stress. A load can be a real
load, that is, the system can be put under test to real usage by having actual telephone users
connected to it. They will generate test input data for system test.
Load Testing
5.1.5 PERFORMANCE TESTING:
Performance tests are utilized in order to determine the widely defined performance of the
software system such as execution time associated with various parts of the code, response time
and device utilization. The intent of this testing is to identify weak points of the software system
and quantify its shortcomings.
Description Expected result
It is necessary to ascertain that the
application behaves correctly under
loads when ‘Server busy’ response is
received.
Should designate another active node as
a Server.
PERFORMANCE TESTING:
5.1.6 RELIABILITY TESTING:
The software reliability is the ability of a system or component to perform its required functions
under stated conditions for a specified period of time and it is being ensured in this testing.
Reliability can be expressed as the ability of the software to reveal defects under testing
conditions, according to the specified requirements. It the portability that a software system will
operate without failure under given conditions for a given time interval and it focuses on the
behavior of the software element. It forms a part of the software quality control team.
Description Expected result
This is required to assure that an
application perforce adequately, having
the capability to handle many peers,
delivering its results in expected time
and using an acceptable level of
resource and it is an aspect of
operational management.
Should handle large input values,
and produce accurate result in a
expected time.
RELIABILITY TESTING:
Description Expected result
This is to check that the server is rugged
and reliable and can handle the failure of
any of the components involved in
provide the application.
In case of failure of the server
an alternate server should take
over the job.
5.1.7 SECURITY TESTING:
Security testing evaluates system characteristics that relate to the availability, integrity and
confidentiality of the system data and services. Users/Clients should be encouraged to make sure
their security needs are very clearly known at requirements time, so that the security issues can
be addressed by the designers and testers.
SECURITY TESTING:
5.1.7 WHITE BOX TESTING:
White box testing, sometimes called glass-box testing is a test case design method that
uses the control structure of the procedural design to derive test cases. Using white box
testing method, the software engineer can derive test cases. The White box testing focuses
on the inner structure of the software structure to be tested.
Description
Expected result
Checking that the user identification is
authenticated.
In case failure it should not be
connected in the framework.
Check whether group keys in a tree are
shared by all peers.
The peers should know group key
in the same group.
5.1.8 WHITE BOX TESTING:
5.1.9 BLACK BOX TESTING:
Black box testing, also called behavioral testing, focuses on the functional requirements of the
software. That is, black testing enables the software engineer to derive sets of input
conditions that will fully exercise all functional requirements for a program. Black box
testing is not alternative to white box techniques. Rather it is a complementary approach that
is likely to uncover a different class of errors than white box methods. Black box testing
attempts to find errors which focuses on inputs, outputs, and principle function of a software
module. The starting point of the black box testing is either a specification or code. The contents
of the box are hidden and the stimulated software should produce the desired results.
Description Expected result
Exercise all logical decisions on
their true and false sides.
All the logical decisions must be valid.
Execute all loops at their boundaries
and within their operational bounds.
All the loops must be finite.
Exercise internal data structures to
ensure their validity.
All the data structures must be valid.
5.1.10 BLACK BOX TESTING:
All the above system testing strategies are carried out in as the development, documentation and
institutionalization of the proposed goals and related policies is essential.
Description Expected result
To check for incorrect or missing
functions.
All the functions must be valid.
To check for interface errors.
The entire interface must function
normally.
To check for errors in a data structures
or external data base access.
The database updation and retrieval
must be done.
To check for initialization and
termination errors.
All the functions and data structures
must be initialized properly and
terminated normally.
CHAPTER 7
7.0 SOFTWARE SPECIFICATION:
7.1 FEATURES OF .NET:
Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating
XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET
Framework is a language-neutral platform for writing programs that can easily and securely
interoperate. There’s no language barrier with .NET: there are numerous languages available to
the developer including Managed C++, C#, Visual Basic and Java Script.
The .NET framework provides the foundation for components to interact seamlessly, whether
locally or remotely on different platforms. It standardizes common data types and
communications protocols so that components created in different languages can easily
interoperate.
“.NET” is also the collective name given to various software components built upon the .NET
platform. These will be both products (Visual Studio.NET and Windows.NET Server, for
instance) and services (like Passport, .NET My Services, and so on).
7.2 THE .NET FRAMEWORK
The .NET Framework has two main parts:
1. The Common Language Runtime (CLR).
2. A hierarchical set of class libraries.
The CLR is described as the “execution engine” of .NET. It provides the environment within
which programs run. The most important features are
 Conversion from a low-level assembler-style language, called Intermediate
Language (IL), into code native to the platform being executed on.
 Memory management, notably including garbage collection.
 Checking and enforcing security restrictions on the running code.
 Loading and executing programs, with version control and other such features.
 The following features of the .NET framework are also worth description:
Managed Code
The code that targets .NET, and which contains certain extra Information - “metadata” - to
describe itself. Whilst both managed and unmanaged code can run in the runtime, only managed
code contains the information that allows the CLR to guarantee, for instance, safe execution and
interoperability.
Managed Data
With Managed Code comes Managed Data. CLR provides memory allocation and Deal location
facilities, and garbage collection. Some .NET languages use Managed Data by default, such as
C#, Visual Basic.NET and JScript.NET, whereas others, namely C++, do not. Targeting CLR
can, depending on the language you’re using, impose certain constraints on the features
available. As with managed and unmanaged code, one can have both managed and unmanaged
data in .NET applications - data that doesn’t get garbage collected but instead is looked after by
unmanaged code.
Common Type System
The CLR uses something called the Common Type System (CTS) to strictly enforce type-safety.
This ensures that all classes are compatible with each other, by describing types in a common
way. CTS define how types work within the runtime, which enables types in one language to
interoperate with types in another language, including cross-language exception handling. As
well as ensuring that types are only used in appropriate ways, the runtime also ensures that code
doesn’t attempt to access memory that hasn’t been allocated to it.
Common Language Specification
The CLR provides built-in support for language interoperability. To ensure that you can develop
managed code that can be fully used by developers using any programming language, a set of
language features and rules for using them called the Common Language Specification (CLS)
has been defined. Components that follow these rules and expose only CLS features are
considered CLS-compliant.
7.3 THE CLASS LIBRARY
.NET provides a single-rooted hierarchy of classes, containing over 7000 types. The root of the
namespace is called System; this contains basic types like Byte, Double, Boolean, and String, as
well as Object. All objects derive from System. Object. As well as objects, there are value types.
Value types can be allocated on the stack, which can provide useful flexibility. There are also
efficient means of converting value types to object types if and when necessary.
The set of classes is pretty comprehensive, providing collections, file, screen, and network I/O,
threading, and so on, as well as XML and database connectivity.
The class library is subdivided into a number of sets (or namespaces), each providing distinct
areas of functionality, with dependencies between the namespaces kept to a minimum.
7.4 LANGUAGES SUPPORTED BY .NET
The multi-language capability of the .NET Framework and Visual Studio .NET enables
developers to use their existing programming skills to build all types of applications and XML
Web services. The .NET framework supports new versions of Microsoft’s old favorites Visual
Basic and C++ (as VB.NET and Managed C++), but there are also a number of new additions to
the family.
Visual Basic .NET has been updated to include many new and improved language features that
make it a powerful object-oriented programming language. These features include inheritance,
interfaces, and overloading, among others. Visual Basic also now supports structured exception
handling, custom attributes and also supports multi-threading.
Visual Basic .NET is also CLS compliant, which means that any CLS-compliant language can
use the classes, objects, and components you create in Visual Basic .NET.
Managed Extensions for C++ and attributed programming are just some of the enhancements
made to the C++ language. Managed Extensions simplify the task of migrating existing C++
applications to the new .NET Framework.
C# is Microsoft’s new language. It’s a C-style language that is essentially “C++ for Rapid
Application Development”. Unlike other languages, its specification is just the grammar of the
language. It has no standard library of its own, and instead has been designed with the intention
of using the .NET libraries as its own.
Microsoft Visual J# .NET provides the easiest transition for Java-language developers into the
world of XML Web Services and dramatically improves the interoperability of Java-language
programs with existing software written in a variety of other programming languages.
Active State has created Visual Perl and Visual Python, which enable .NET-aware applications
to be built in either Perl or Python. Both products can be integrated into the Visual Studio .NET
environment. Visual Perl includes support for Active State’s Perl Dev Kit.
Other languages for which .NET compilers are available include
 FORTRAN
 COBOL
 Eiffel
ASP.NET
XML WEB SERVICES
Windows Forms
Base Class Libraries
Common Language Runtime
Operating System
Fig1 .Net Framework
C#.NET is also compliant with CLS (Common Language Specification) and supports
structured exception handling. CLS is set of rules and constructs that are supported by the
CLR (Common Language Runtime). CLR is the runtime environment provided by the .NET
Framework; it manages the execution of the code and also makes the development process
easier by providing services.
C#.NET is a CLS-compliant language. Any objects, classes, or components that created in
C#.NET can be used in any other CLS-compliant language. In addition, we can use objects,
classes, and components created in other CLS-compliant languages in C#.NET .The use of
CLS ensures complete interoperability among applications, regardless of the languages used
to create the application.
CONSTRUCTORS AND DESTRUCTORS:
Constructors are used to initialize objects, whereas destructors are used to destroy them. In other
words, destructors are used to release the resources allocated to the object. In C#.NET the sub
finalize procedure is available. The sub finalize procedure is used to complete the tasks that must
be performed when an object is destroyed. The sub finalize procedure is called automatically
when an object is destroyed. In addition, the sub finalize procedure can be called only from the
class it belongs to or from derived classes.
GARBAGE COLLECTION
Garbage Collection is another new feature in C#.NET. The .NET Framework monitors allocated
resources, such as objects and variables. In addition, the .NET Framework automatically releases
memory for reuse by destroying objects that are no longer in use.
In C#.NET, the garbage collector checks for the objects that are not currently in use by
applications. When the garbage collector comes across an object that is marked for garbage
collection, it releases the memory occupied by the object.
OVERLOADING
Overloading is another feature in C#. Overloading enables us to define multiple procedures with
the same name, where each procedure has a different set of arguments. Besides using
overloading for procedures, we can use it for constructors and properties in a class.
MULTITHREADING:
C#.NET also supports multithreading. An application that supports multithreading can handle
multiple tasks simultaneously, we can use multithreading to decrease the time taken by an
application to respond to user interaction.
STRUCTURED EXCEPTION HANDLING
C#.NET supports structured handling, which enables us to detect and remove errors at runtime.
In C#.NET, we need to use Try…Catch…Finally statements to create exception handlers. Using
Try…Catch…Finally statements, we can create robust and effective exception handlers to
improve the performance of our application.
7.5 THE .NET FRAMEWORK
The .NET Framework is a new computing platform that simplifies application development in
the highly distributed environment of the Internet.
OBJECTIVES OF .NET FRAMEWORK
1. To provide a consistent object-oriented programming environment whether object codes is
stored and executed locally on Internet-distributed, or executed remotely.
2. To provide a code-execution environment to minimizes software deployment and
guarantees safe execution of code.
3. Eliminates the performance problems.
There are different types of application, such as Windows-based applications and Web-based
applications.
7.6 FEATURES OF SQL-SERVER
The OLAP Services feature available in SQL Server version 7.0 is now called SQL Server 2000
Analysis Services. The term OLAP Services has been replaced with the term Analysis Services.
Analysis Services also includes a new data mining component. The Repository component
available in SQL Server version 7.0 is now called Microsoft SQL Server 2000 Meta Data
Services. References to the component now use the term Meta Data Services. The term
repository is used only in reference to the repository engine within Meta Data Services
SQL-SERVER database consist of six type of objects,
They are,
1. TABLE
2. QUERY
3. FORM
4. REPORT
5. MACRO
7.7 TABLE:
A database is a collection of data about a specific topic.
VIEWS OF TABLE:
We can work with a table in two types,
1. Design View
2. Datasheet View
DesignView
To build or modify the structure of a table we work in the table design
view. We can specify what kind of data will be hold.
Datasheet View
To add, edit or analyses the data itself we work in tables datasheet view mode.
QUERY:
A query is a question that has to be asked the data. Access gathers data that answers the question
from one or more table. The data that make up the answer is either dynaset (if you edit it) or a
snapshot (it cannot be edited).Each time we run query, we get latest information in the dynaset.
Access either displays the dynaset or snapshot for us to view or perform an action on it, such as
deleting or updating.
CHAPTER 7
7.0 APPENDIX
7.1 SAMPLE SCREEN SHOTS:
7.2 SAMPLE SOURCE CODE:
CHAPTER 8
8.1 CONCLUSION AND FUTURE:
In this paper, a DNN-based framework for speech enhancement is proposed. Among the various
DNN configurations, a large training set is crucial to learn the rich structure of the mapping
function between noisy and clean speech features. It was found that the application of more
acoustic context information improves the system performance and makes the enhanced speech
less discontinuous. Moreover, multi-condition training with many kinds of noise types can
achieve a good generalization capability to unseen noise environments. By doing so, the
proposed DNN framework is also powerful to cope with non-stationary noises in real-world
environments.
An over-smoothing problem in speech quality was found in the MMSE-optimized DNNs and
one proposed post-processing technique, called GV equalization, was effective in brightening the
formant spectra of the enhanced speech signals. Two improved training techniques were further
adopted to reduce the residual noise and increase the performance. Compared with the
LogMMSE method, significant improvements were achieved across different unseen noise
conditions. Another interesting observation was that the proposed DNN-based speech
enhancement system is quite effective for dealing with real-world noisy speech in different
languages and across different recording conditions not observed during DNN training.
In future studies, we would increase the speech diversity by first incorporating clean speech data
from a rich collection of materials covering more languages and speakers. Second, there are
many factors in designing the training set. We would utilize principles in experimental design
[54], [55] for multi-factor analysis to alleviate the requirement of a huge amount of training data
and still maintain a good generalization capability of the DNN model. Third, some other
features, such as Gammatone filterbank power spectra [50], Multi-resolution cochleagram
feature [56], will be adopted as in [50] to enrich the input information to DNNs. Finally, a
dynamic noise adaptation scheme will also be investigated for the purpose of improving tracking
of non-stationary noises.
CHAPTER 9
9.1 REFERENCES:
[1] L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, “Voice conversion using deep neural
networks with layer-wise generative training,” IEEE/ACM Trans. Audio, Speech, Lang.
Process., vol. 22, no. 12, pp. 1859–1872, Dec. 2014.
[2] Z.-H. Ling, L. Deng, and D. Yu, “Modeling spectral envelopes using restricted Boltzmann
machines and deep belief networks for statistical parametric speech synthesis,” IEEE Trans.
Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2129–2139, Oct. 2013.
[3] B.-Y. Xia and C.-C. Bao, “Speech enhancement with weighted denoising Auto-Encoder,” in
Proc. Interspeech, 2013, pp. 3444–3448. [23] X.-G. Lu, Y. Tsao, S. Matsuda, and C. Hori,
“Speech enhancement based on deep denoising Auto-Encoder,” in Proc. Interspeech, 2013, pp.
436–440.
[4] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural
networks for noise reduction in robust ASR,” in Proc. Interspeech, 2012, pp. 22–25.
[5] M. Wollmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, “Feature enhancement by
bidirectional LSTM networks for conversational speech recognition in highly non-stationary
noise,” in Proc. ICASSP, 2013, pp. 6822–6826.
[6] H. Christensen, J. Barker, N. Ma, and P. D. Green, “The CHiME corpus: A resource and a
challenge for computational hearing in multisource environments,” in Proc. Interspeech, 2010,
pp. 1918–1921.
[7] Y. X. Wang and D. L. Wang, “Towards scaling up classification-based speech separation,”
IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381–1390, Jul. 2013.

More Related Content

What's hot

Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceIlhaan Marwat
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive CodingSrishti Kakade
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)BushraShaikh44
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentationhimanshubhatti
 
filters for noise in image processing
filters for noise in image processingfilters for noise in image processing
filters for noise in image processingSardar Alam
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognitionfathitarek
 
Image Restoration
Image RestorationImage Restoration
Image RestorationPoonam Seth
 
Implementation and comparison of Low pass filters in Frequency domain
Implementation and comparison of Low pass filters in Frequency domainImplementation and comparison of Low pass filters in Frequency domain
Implementation and comparison of Low pass filters in Frequency domainZara Tariq
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 
Nyquist criterion for distortion less baseband binary channel
Nyquist criterion for distortion less baseband binary channelNyquist criterion for distortion less baseband binary channel
Nyquist criterion for distortion less baseband binary channelPriyangaKR1
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processingsivakumar m
 

What's hot (20)

Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)Mel frequency cepstral coefficient (mfcc)
Mel frequency cepstral coefficient (mfcc)
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
filters for noise in image processing
filters for noise in image processingfilters for noise in image processing
filters for noise in image processing
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Image Restoration
Image RestorationImage Restoration
Image Restoration
 
Speech encoding techniques
Speech encoding techniquesSpeech encoding techniques
Speech encoding techniques
 
Implementation and comparison of Low pass filters in Frequency domain
Implementation and comparison of Low pass filters in Frequency domainImplementation and comparison of Low pass filters in Frequency domain
Implementation and comparison of Low pass filters in Frequency domain
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Subband Coding
Subband CodingSubband Coding
Subband Coding
 
Equalization
EqualizationEqualization
Equalization
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Nyquist criterion for distortion less baseband binary channel
Nyquist criterion for distortion less baseband binary channelNyquist criterion for distortion less baseband binary channel
Nyquist criterion for distortion less baseband binary channel
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 

Similar to speech enhancement

AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT
 
Comparison and Analysis Of LDM and LMS for an Application of a Speech
Comparison and Analysis Of LDM and LMS for an Application of a SpeechComparison and Analysis Of LDM and LMS for an Application of a Speech
Comparison and Analysis Of LDM and LMS for an Application of a SpeechCSCJournals
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_Dia Abdulkerim
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...ijcsit
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...AIRCC Publishing Corporation
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...tsysglobalsolutions
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESkevig
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemeskevig
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327IJMER
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...IJECEIAES
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueCSCJournals
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONijma
 
NLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxNLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxKevinSims18
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemCSCJournals
 
AN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEMAN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEMcseij
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesTELKOMNIKA JOURNAL
 

Similar to speech enhancement (20)

AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
Mjfg now
Mjfg nowMjfg now
Mjfg now
 
Comparison and Analysis Of LDM and LMS for an Application of a Speech
Comparison and Analysis Of LDM and LMS for an Application of a SpeechComparison and Analysis Of LDM and LMS for an Application of a Speech
Comparison and Analysis Of LDM and LMS for an Application of a Speech
 
M sc thesis_presentation_
M sc thesis_presentation_M sc thesis_presentation_
M sc thesis_presentation_
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
CURVELET BASED SPEECH RECOGNITION SYSTEM IN NOISY ENVIRONMENT: A STATISTICAL ...
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Asr
AsrAsr
Asr
 
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMESEFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
EFFECT OF DYNAMIC TIME WARPING ON ALIGNMENT OF PHRASES AND PHONEMES
 
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and PhonemesEffect of Dynamic Time Warping on Alignment of Phrases and Phonemes
Effect of Dynamic Time Warping on Alignment of Phrases and Phonemes
 
E0502 01 2327
E0502 01 2327E0502 01 2327
E0502 01 2327
 
Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...Bayesian distance metric learning and its application in automatic speaker re...
Bayesian distance metric learning and its application in automatic speaker re...
 
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition TechniqueA Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
A Novel, Robust, Hierarchical, Text-Independent Speaker Recognition Technique
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITIONQUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
QUALITATIVE ANALYSIS OF PLP IN LSTM FOR BANGLA SPEECH RECOGNITION
 
NLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxNLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docx
 
SMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk SystemSMATalk: Standard Malay Text to Speech Talk System
SMATalk: Standard Malay Text to Speech Talk System
 
AN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEMAN EFFICIENT SPEECH RECOGNITION SYSTEM
AN EFFICIENT SPEECH RECOGNITION SYSTEM
 
Sentiment analysis by deep learning approaches
Sentiment analysis by deep learning approachesSentiment analysis by deep learning approaches
Sentiment analysis by deep learning approaches
 

Recently uploaded

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 

Recently uploaded (20)

Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 

speech enhancement

  • 1. A REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS CHAPTER 1 ABSTRACT: In contrast to the conventional minimum mean square error (MMSE)-based noise reduction techniques, we propose a supervised method to enhance speech by means of finding a mapping function between noisy and clean speech signals based on deep neural networks (DNNs). In order to be able to handle a wide range of additive noises in real-world situations, a large training set that encompasses many possible combinations of speech and noise types, is first designed. DNN architecture is then employed as a nonlinear regression function to ensure a powerful modeling capability. Several techniques have also been proposed to improve the DNN-based speech enhancement system, including global variance equalization to alleviate the over- smoothing problem of the regression model, and the dropout and noise-aware training strategies to further improve the generalization capability of DNNs to unseen noise conditions. Experimental results demonstrate that the proposed framework can achieve significant improvements in both objective and subjective measures over the conventional MMSE based technique. It is also interesting to observe that the proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general. Furthermore, the resulting DNN model, trained with artificial synthesized data, is also effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
  • 2. 1.2 INTRODUCTION: SPEECH ENHANCEMENT: Speech enhancement aims to improve speech quality by using various algorithms. The objective of enhancement is improvement in intelligibility and/or overall perceptual quality of degraded speech signal using audio signal processing techniques. Enhancing of speech degraded by noise, or noise reduction, is the most important field of speech enhancement, and used for many applications such as mobile phones, VoIP, teleconferencing systems, speech recognition, and hearing aids SPEECH RECOGNITION: Speech recognition (SR) is the inter-disciplinary sub-field of computational linguistics which incorporates knowledge and research in the linguistics, computer science, and electrical engineering fields to develop methodologies and technologies that enables the recognition and translation of spoken language into text by computers and computerized devices such as those categorized as Smart Technologies and robotics. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). Some SR systems use "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker independent"[1] systems. Systems that use training are called "speaker dependent". Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-
  • 3. text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). The term voice recognitionor speaker identification refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person's voice or it can be used to authenticate or verify the identity of a speaker as part of a security process. From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the world-wide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. These speech industry players include Microsoft, Google, IBM, Baidu (China), Apple, Amazon, Nuance, IflyTek (China), many of which have publicized the core technology in their speech recognition systems being based on deep learning. MODELS, METHODS, AND ALGORITHMS: HIDDEN MARKOV MODELS Modern general-purpose speech recognition systems are based on Hidden Markov Models. These are statistical models that output a sequence of symbols or quantities. HMMs are used in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary signal. In a short time-scale (e.g., 10 milliseconds), speech can be approximated as a stationary process. Speech can be thought of as a Markov model for many stochastic purposes. Another reason why HMMs are popular is because they can be trained automatically and are simple and computationally feasible to use. In speech recognition, the hidden Markov model
  • 4. would output a sequence of n-dimensional real-valued vectors (with n being a small integer, such as 10), outputting one of these every 10 milliseconds. The vectors would consist of cepstral coefficients, which are obtained by taking a Fourier transform of a short time window of speech and decorrelating the spectrum using a cosine transform, then taking the first (most significant) coefficients. The hidden Markov model will tend to have in each state a statistical distribution that is a mixture of diagonal covariance Gaussians, which will give a likelihood for each observed vector. Each word, or (for more general speech recognition systems), each phoneme, will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes. Described above are the core elements of the most common, HMM-based approach to speech recognition. Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. A typical large-vocabulary system would need context dependency for the phonemes (so phonemes with different left and right context have different realizations as HMM states); it would use cepstral normalization to normalize for different speaker and recording conditions; for further speaker normalization it might use vocal tract length normalization (VTLN) for male- female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation. The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition might use heteroscedastic linear discriminant analysis (HLDA); or might skip the delta and delta-delta coefficients and use splicing and an LDA-based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semi-tied co variance transform (also known as maximum likelihood linear transform, or MLLT). Many systems use so-called discriminative training techniques that dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of the training data. Examples are maximum mutual information (MMI), minimum classification error (MCE) and minimum phone error (MPE).
  • 5. Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the Viterbi algorithm to find the best path, and here there is a choice between dynamically creating a combination hidden Markov model, which includes both the acoustic and language model information, and combining it statically beforehand (the finite state transducer, or FST, approach). A possible improvement to decoding is to keep a set of good candidates instead of just keeping the best candidate, and to use a better scoring function (re scoring) to rate these good candidates so that we may pick the best one according to this refined score. The set of candidates can be kept either as a list (the N-best list approach) or as a subset of the models (a lattice). Re scoring is usually done by trying to minimize the Bayes risk (or an approximation thereof): Instead of taking the source sentence with maximal probability, we try to take the sentence that minimizes the expectancy of a given loss function with regards to all possible transcriptions (i.e., we take the sentence that minimizes the average distance to other possible sentences weighted by their estimated probability). The loss function is usually the Levenshtein distance, though it can be different distances for specific tasks; the set of possible transcriptions is, of course, pruned to maintain tractability. Efficient algorithms have been devised to re score lattices represented as weighted finite state transducers with edit distances represented themselves as a finite state transducer verifying certain assumptions. DYNAMIC TIME WARPING (DTW)-BASED SPEECH RECOGNITION Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach. Dynamic time warping is an algorithm for measuring similarity between two sequences that may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video the person was walking slowly and if in another he or she were walking more quickly,
  • 6. or even if there were accelerations and deceleration during the course of one observation. DTW has been applied to video, audio, and graphics – indeed, any data that can be turned into a linear representation can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. In general, it is a method that allows a computer to find an optimal match between two given sequences (e.g., time series) with certain restrictions. That is, the sequences are "warped" non-linearly to match each other. This sequence alignment method is often used in the context of hidden Markov models. NEURAL NETWORKS Neural networks emerged as an attractive acoustic modeling approach in ASR in the late 1980s. Since then, neural networks have been used in many aspects of speech recognition such as phoneme classification, isolated word recognition, and speaker adaptation. In contrast to HMMs, neural networks make no assumptions about feature statistical properties and have several qualities making them attractive recognition models for speech recognition. When used to estimate the probabilities of a speech feature segment, neural networks allow discriminative training in a natural and efficient manner. Few assumptions on the statistics of input features are made with neural networks. However, in spite of their effectiveness in classifying short-time units such as individual phones and isolated words, neural networks are rarely successful for continuous recognition tasks, largely because of their lack of ability to model temporal dependencies. However, recently Recurrent Neural Networks(RNN's) and Time Delay Neural Networks(TDNN's)[44] have been used which have been shown to be able to identify latent temporal dependencies and use this information to perform the task of speech recognition. This however enormously increases the computational cost involved and hence makes the process of speech recognition slower. A lot of research is still going on in this field to ensure that TDNN's and RNN's can be used in a more computationally affordable way to improve the Speech Recognition Accuracy immensely.
  • 7. Deep Neural Networks and Denoising Autoencoders[45] are also being experimented with to tackle this problem in an effective manner. Due to the inability of traditional Neural Networks to model temporal dependencies, an alternative approach is to use neural networks as a pre-processing e.g. feature transformation,
  • 8. DEEP NEURAL NETWORKS AND OTHER DEEP LEARNING MODELS A deep neural network (DNN) is an artificial neural network with multiple hidden layers of units between the input and output layers. Similar to shallow neural networks, DNNs can model complex non-linear relationships. DNN architectures generate compositional models, where extra layers enable composition of features from lower layers, giving a huge learning capacity and thus the potential of modeling complex patterns of speech data.[47] The DNN is the most popular type of deep learning architectures successfully used as an acoustic model for speech recognition since 2010. The success of DNNs in large vocabulary speech recognition occurred in 2010 by industrial researchers, in collaboration with academic researchers, where large output layers of the DNN based on context dependent HMM states constructed by decision trees were adopted. See comprehensive reviews of this development and of the state of the art as of October 2014 in the recent Springer book from Microsoft Research. See also the related background of automatic speech recognition and the impact of various machine learning paradigms including notably deep learning in a recent overview article. One fundamental principle of deep learning is to do away with hand-crafted feature engineering and to use raw features. This principle was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features, showing its superiority over the Mel-Cepstral features which contain a few stages of fixed transformation from spectrograms. The true "raw" features of speech, waveforms, have more recently been shown to produce excellent larger-scale speech recognition results. Since the initial successful debut of DNNs for speech recognition around 2009-2011, there have been huge new progresses made. This progress (as well as future directions) has been summarized into the following eight major areas:
  • 9. 1. Scaling up/out and speedup DNN training and decoding; 2. Sequence discriminative training of DNNs; 3. Feature processing by deep models with solid understanding of the underlying mechanisms; 4. Adaptation of DNNs and of related deep models; 5. Multi-task and transfer learning by DNNs and related deep models; 6. Convolution neural networks and how to design them to best exploit domain knowledge of speech; 7. Recurrent neural network and its rich LSTM variants; 8. Other types of deep models including tensor-based models and integrated deep generative/discriminative models. Large-scale automatic speech recognition is the first and the most convincing successful case of deep learning in the recent history, embraced by both industry and academic across the board. Between 2010 and 2014, the two major conferences on signal processing and speech recognition, IEEE-ICASSP and Interspeech, have seen near exponential growth in the numbers of accepted papers in their respective annual conference papers on the topic of deep learning for speech recognition. More importantly, all major commercial speech recognition systems (e.g., Microsoft Cortana, Xbox, Skype Translator, Google Now, Apple Siri, Baidu and iFlyTek voice search, and a range of Nuance speech products, etc.) nowadays are based on deep learning methods. APPLICATIONS: IN-CAR SYSTEMS Typically a manual control input, for example by means of a finger control on the steering- wheel, enables the speech recognition system and this is signalled to the driver by an audio prompt. Following the audio prompt, the system has a "listening window" during which it may accept a speech input for recognition.
  • 10. Simple voice commands may be used to initiate phone calls, select radio stations or play music from a compatible smartphone, MP3 player or music-loaded flash drive. Voice recognition capabilities vary between car make and model. Some of the most recentcar models offer natural- language speech recognition in place of a fixed set of commands. Allowing the driver to use full sentences and common phrases. With such systems there is, therefore, no need for the user to memorize a set of fixed command words. HEALTH CARE Medical documentation In the health care sector, speech recognition can be implemented in front-end or back-end of the medical documentation process. Front-end speech recognition is where the provider dictates into a speech-recognition engine, the recognized words are displayed as they are spoken, and the dictator is responsible for editing and signing off on the document. Back-end or deferred speech recognition is where the provider dictates into a digital dictation system, the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the editor, where the draft is edited and report finalized. Deferred speech recognition is widely used in the industry currently. One of the major issues relating to the use of speech recognition in healthcare is that the American Recovery and Reinvestment Act of 2009 (ARRA) provides for substantial financial benefits to physicians who utilize an EMR according to "Meaningful Use" standards. These standards require that a substantial amount of data be maintained by the EMR (now more commonly referred to as an Electronic Health Record or EHR). The use of speech recognition is more naturally suited to the generation of narrative text, as part of a radiology/pathology interpretation, progress note or discharge summary: the ergonomic gains of using speech recognition to enter structured discrete data (e.g., numeric values or codes from a list or a controlled vocabulary) are relatively minimal for people who are sighted and who can operate a keyboard and mouse.
  • 11. A more significant issue is that most EHRs have not been expressly tailored to take advantage of voice-recognition capabilities. A large part of the clinician's interaction with the EHR involves navigation through the user interface using menus, and tab/button clicks, and is heavily dependent on keyboard and mouse: voice-based navigation provides only modest ergonomic benefits. By contrast, many highly customized systems for radiology or pathology dictation implement voice "macros", where the use of certain phrases - e.g., "normal report", will automatically fill in a large number of default values and/or generate boilerplate, which will vary with the type of the exam - e.g., a chest X-ray vs. a gastrointestinal contrast series for a radiology system. As an alternative to this navigation by hand, cascaded use of speech recognition and information extraction has been studied as a way to fill out a handover form for clinical proofing and sign- off. The results are encouraging, and the paper also opens data, together with the related performance benchmarks and some processing software, to the research and development community for studying clinical documentation and language-processing. MILITARY High-performance fighter aircraft Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft. Of particular note is the U.S. program in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), and a program in France installing speech recognition systems on Mirage aircraft, and also programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft, with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight display.
  • 12. Working with Swedish pilots flying in the JAS-39 Gripen cockpit, Englund (2004) found recognition deteriorated with increasing G-loads. It was also concluded that adaptation greatly improved the results in all cases and introducing models for breathing was shown to improve recognition scores significantly. Contrary to what might be expected, no effects of the broken English of the speakers were found. It was evident that spontaneous speech caused problems for the recognizer, as could be expected. A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially. The Eurofighter Typhoon currently in service with the UK RAF employs a speaker-dependent system, i.e. it requires each pilot to create a template. The system is not used for any safety critical or weapon critical tasks, such as weapon release or lowering of the undercarriage, but is used for a wide range of other cockpit functions. Voice commands are confirmed by visual and/or aural feedback. The system is seen as a major design feature in the reduction of pilot workload, and even allows the pilot to assign targets to himself with two simple voice commands or to any of his wingmen with only five commands. Speaker-independent systems are also being developed and are in testing for the F35 Lightning II (JSF) and the Alenia Aermacchi M-346 Master lead-in fighter trainer. These systems have produced word accuracy in excess of 98%. HELICOPTERS The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the jet fighter environment. The acoustic noise problem is actually more severe in the helicopter environment, not only because of the high noise levels but also because the helicopter pilot, in general, does not wear a facemask, which would reduce acoustic noise in the microphone. Substantial test and evaluation programs have been carried out in the past decade in speech recognition systems applications in helicopters, notably by the U.S. Army Avionics Research and Development Activity (AVRADA) and by the Royal Aerospace Establishment (RAE) in the UK. Work in France has included speech recognition in the Puma
  • 13. helicopter. There has also been much useful work in Canada. Results have been encouraging, and voice applications have included: control of communication radios, setting of navigation systems, and control of an automated target handover system. As in fighter applications, the overriding issue for voice in helicopters is the impact on pilot effectiveness. Encouraging results are reported for the AVRADA tests, although these represent only a feasibility demonstration in a test environment. Much remains to be done both in speech recognition and in overall speech technology in order to consistently achieve performance improvements in operational settings. PROBLEMS: In recent years, single-channel speech enhancement has attracted a considerable amount of research attention because of the growing challenges in many important real-world applications, including mobile speech communication, hearing aids design and robust speech recognition. The goal of speech enhancement is to improve the intelligibility and quality of a noisy speech signal degraded in adverse conditions. However, the performance of speech enhancement in real acoustic environments is not always satisfactory. Numerous speech enhancement methods were developed over the past several decades. Spectral subtraction subtracts an estimate of the short- term noise spectrum to produce an estimated spectrum of the clean speech. In the iterative wiener filtering was presented using an all-pole model. A common problem usually encountered in these conventional methods is that the resulting enhanced speech often suffers from an annoying artifact called “musical noise”. Another notable work was the minimum mean-square error (MMSE) estimator introduced by Ephraim and Malah [6]; their MMSE log-spectral amplitude estimator could result in much lower residual noise without further affecting the speech quality. An optimally-modified log-spectral amplitude (OM- LSA) speech estimator and a minima controlled recursive averaging (MCRA) noise estimation approach were also presented in although these traditional MMSE-based methods are able to yield lower musical noise (e.g., [10], [11]), a trade-off in reducing speech distortion and residual
  • 14. noise needs to be made due to the sophisticated statistical properties of the interactions between speech and noise signals. Most of these unsupervised methods are based on either the additive nature of the background noise, or the statistical properties of the speech and noise signals. However they often fail to track non-stationary noise for real-world scenarios in unexpected acoustic conditions. Considering the complex process of noise corruption, a nonlinear model, like the neural networks, might be suitable for modeling the mapping relationship between the noisy and clean speech signals. Early work on using shallow neural networks (SNNs) as nonlinear filters to predict the clean signal in the time or frequency domain has been proposed the SNN with only one hidden layer using 160 neurons was proposed to estimate the instantaneous signal-to-noise ratios (SNRs) on the amplitude modulation spectrograms (AMS), and then the noise could be suppressed according to the estimated SNRs of different channels. However, the SNR was estimated in the limited frequency resolution with 15 channels and it was not efficient to suppress the noise type with sharp spectral peaks. Furthermore, the small network size can not fully learn the relationship between the noisy feature and the target SNRs.
  • 16. CHAPTER 2 2.0 SYSTEM ANALYSIS 2.1 EXISTING SYSTEM: Existing method using neural networks (NNs) in conventional joint density Gaussian mixture model (JDGMM) based spectral conversion methods perform stably and effectively. However, the speech generated by these methods suffer severe quality degradation due to the following two factors: 1) inadequacy of JDGMM in modeling the distribution of spectral features as well as the non-linear mapping relationship between the source and target speakers, 2) spectral detail loss caused by the use of high-level spectral features such as mel-cepstra. Previously, we have proposed to use the mixture of restricted Boltzmann machines (MoRBM) and the mixture of Gaussian bidirectional associative memories (MoGBAM) to cope with these problems. Previous methods to use a NN to construct a global non-linear mapping relationship between the spectral envelopes of two speakers generatively trained by cascading two RBMs, which model the distributions of spectral envelopes of source and target speakers respectively, using a Bernoulli BAM (BBAM). Therefore, the proposed training method takes the advantage of the strong modeling ability of RBMs in modeling the distribution of spectral envelopes and the superiority of BAMs in deriving the conditional distributions for conversion. Careful comparisons and analysis among the proposed method and some conventional methods are presented in this paper. The subjective results show that the proposed method can significantly improve the performance in terms of both similarity and naturalness compared to conventional methods.
  • 17. 2.1.1 DISADVANTAGES: Neural networks (NNs) with a single-layer architecture learning process, a large training set ensures a powerful modeling capability to estimate the complicated nonlinear mapping from observed noisy speech to desired clean signals. Acoustic context was found to improve the continuity of speech to be separated from the background noises with the annoying musical artifact commonly observed in conventional speech enhancement algorithms. A series of pilot experiments were conducted under multi-condition training with more than 100 hours of simulated speech data, resulting in a bad generalization capability even in mismatched testing conditions. When compared with the logarithmic minimum mean square error approach. NN-based algorithm tends to achieve significant improvements in terms of various objective quality measures. Furthermore, in a subjective preference evaluation with 10 listeners, 66.35% of the subjects were found to prefer NN-based enhanced speech to that obtained with other conventional low level technique.
  • 18. 2.2 PROPOSED SYSTEM: We proposed a regression DNN based speech enhancement framework via training a deep and wide neural network architecture using a large collection of heterogeneous training data with four noise types. It was found that the annoying musical noise artifact could be greatly reduced with the DNN-based algorithm and the enhanced speech also showed an improved speech quality both in terms of objective and subjective measures DNN-based speech enhancement framework to handle adverse conditions and non-stationary noise types in real-world situations. In traditional speech enhancement techniques, the noise estimate is usually updated by averaging the noisy speech power spectrum using time and frequency dependent smoothing factors, which are adjusted based on the estimated speech presence probability in individual frequency bins its noise tracking capacity is limited for highly non-stationary noise cases, and it tends to distort the speech component in mixed signals if it is tuned for better noise reduction. In this work, the acoustic context information, including the full frequency band and context frame expanding, is well utilized to obtain the enhanced speech with reduced discontinuity. Furthermore to improve the generalization capability we include more than 100 different noise types in designing the training set for DNN which proved to be quite effective in handling unseen noise types, especially non-stationary noise components. Three strategies are also proposed to further improve the quality of enhanced speech and generalization capability of DNNs. First, equalization between the global variance (GV) of the enhanced features and the reference clean speech features is proposed to alleviate the over-smoothing issue in DNN-based speech enhancement system. The second technique, called dropout, is a recently proposed strategy for training neural networks on data sets where over-fitting may be a concern. While this method was not designed for noise reduction, it was demonstrated to be useful for noise robust speech recognition and we successfully apply it to a DNN as a regression model to produce a network that has good generalization ability to variabilities in the input. Finally, noise aware training (NAT), first proposed in is adopted to improve performance.
  • 19. 2.2.1 ADVANTAGES: DNN training procedure used in and then propose several techniques to improve the baseline DNN system so that the quality of the enhanced speech in matched noise conditions can be maintained while the generalization capability to unseen noise can be increased. Our proposed normalized clean Log-power spectra with the mask-based training targets and verified the different initialization schemes. Then the evaluations of the proposed strategies demonstrated their effectiveness to improve the generalization capacity to unseen noises. The suppression against highly non-stationary noise was also found overall performance comparisons on 15 unseen noises and on real-world noises between the proposed method normalized clean Log-power spectra target was better than IRM and FFT-MASK at all conditions in our experimental setup. IRM and FFT-MASK got the almost the same performance. It should be noted that the proposed clean Log-power spectra normalized to mean zero and unit variance is crucial, which is different from the FFT-MAG with the Log compression followed by the percent normalization. Finally, by using more noise types and the three proposed techniques, the PESQ improvements of the proposed DNN approach over LogMMSE under unseen noise types in Table VIII are also comparable to that under matched noise types reported in the STOI results to represent the intelligibility of the enhanced speech were also presented in Table IX. LogMMSE is slightly better than the noisy with an average STOI improvement from 0.81 to 0.82. The DNN baseline trained with 100 hours got 0.86 STOI score on average. The proposed strategies could further improve the performance.
  • 20. 2.3 HARDWARE & SOFTWARE REQUIREMENTS: 2.3.1 HARDWARE REQUIREMENT:  Processor - Pentium –IV  Speed - 1.1 GHz  RAM - 256 MB (min)  Hard Disk - 20 GB  Floppy Drive - 1.44 MB  Key Board - Standard Windows Keyboard  Mouse - Two or Three Button Mouse  Monitor - SVGA 2.3.2 SOFTWARE REQUIREMENTS: .NET  Operating System : Windows XP or Win7  Front End : Microsoft Visual Studio .NET 2008  Script : C# Script  Document : MS-Office 2007
  • 21. CHAPTER 3 3.0 SYSTEM DESIGN: Data Flow Diagram / Use Case Diagram / Flow Diagram:  The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data is generated by the system  The data flow diagram (DFD) is one of the most important modeling tools. It is used to model the system components. These components are the system process, the data used by the process, an external entity that interacts with the system and the information flows in the system.  DFD shows how the information moves through the system and how it is modified by a series of transformations. It is a graphical technique that depicts information flow and the transformations that are applied as data moves from input to output.  DFD is also known as bubble chart. A DFD may be used to represent a system at any level of abstraction. DFD may be partitioned into levels that represent increasing information flow and functional detail.
  • 22. NOTATION: SOURCE OR DESTINATION OF DATA: External sources or destinations, which may be people or organizations or other entities DATA SOURCE: Here the data referenced by a process is stored and retrieved. PROCESS: People, procedures or devices that produce data’s in the physical component is not identified. DATA FLOW: Data moves in a specific direction from an origin to a destination. The data flow is a “packet” of data. MODELING RULES: There are several common modeling rules when creating DFDs: 1. All processes must have at least one data flow in and one data flow out. 2. All processes should modify the incoming data, producing new forms of outgoing data. 3. Each data store must be involved with at least one data flow. 4. Each external entity must be involved with at least one data flow. 5. A data flow must be attached to at least one process.
  • 24. 3.2 DATAFLOW DIAGRAM UML DIAGRAMS: 3.2 USE CASE DIAGRAM: 3.3 CLASS DIAGRAM: 3.4 SEQUENCE DIAGRAM: 3.5 ACTIVITY DIAGRAM:
  • 25. CHAPTER 4 4.0 IMPLEMENTATION: MINIMUM MEAN SQUARE ERROR (MMSE):
  • 27. 4.2 MODULES: AUDIO PREPROCESSING: NOISE AWARE TRAINING: NOISE REDUCTION DFT: DEEP NEURAL NETWORKS: SPEECH ENHANCEMENT:
  • 28. 4.3 MODULE DESCRIPTION: AUDIO PREPROCESSING: NOISE AWARE TRAINING: NOISE REDUCTION DFT: DEEP NEURAL NETWORKS: SPEECH ENHANCEMENT:
  • 29. CHAPTER 5 5.0 SYSTEM STUDY: 5.1 FEASIBILITY STUDY: The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential. Three key considerations involved in the feasibility analysis are  ECONOMICAL FEASIBILITY  TECHNICAL FEASIBILITY  SOCIAL FEASIBILITY 5.1.1 ECONOMICAL FEASIBILITY: This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased. 5.1.2 TECHNICAL FEASIBILITY: This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical
  • 30. resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system. 5.1.3 SOCIAL FEASIBILITY: The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.
  • 31. 5.2 SYSTEM TESTING: Testing is a process of checking whether the developed system is working according to the original objectives and requirements. It is a set of activities that can be planned in advance and conducted systematically. Testing is vital to the success of the system. System testing makes a logical assumption that if all the parts of the system are correct, the global will be successfully achieved. In adequate testing if not testing leads to errors that may not appear even many months. This creates two problems, the time lag between the cause and the appearance of the problem and the effect of the system errors on the files and records within the system. A small system error can conceivably explode into a much larger Problem. Effective testing early in the purpose translates directly into long term cost savings from a reduced number of errors. Another reason for system testing is its utility, as a user-oriented vehicle before implementation. The best programs are worthless if it produces the correct outputs. 5.2.1 UNIT TESTING: A program represents the logical elements of a system. For a program to run satisfactorily, it must compile and test data correctly and tie in properly with other programs. Achieving an error free program is the responsibility of the programmer. Program testing checks for two types of errors: syntax and logical. Syntax error is a program statement that violates one or more rules of the language in which it is written. An improperly defined field dimension or omitted keywords are common syntax errors. These errors are shown through error message generated by the computer. For Logic errors the programmer must examine the output carefully.
  • 32. UNIT TESTING: 5.1.3 FUNCTIONAL TESTING: Functional testing of an application is used to prove the application delivers correct results, using enough inputs to give an adequate level of confidence that will work correctly for all sets of inputs. The functional testing will need to prove that the application works for each client type and that personalization function work correctly. When a program is tested, the actual output is compared with the expected output. When there is a discrepancy the sequence of instructions must be traced to determine the problem. The process is facilitated by breaking the program into self-contained portions, each of which can be checked at certain key points. The idea is to compare program values against desk-calculated values to isolate the problems. Description Expected result Test for application window properties. All the properties of the windows are to be properly aligned and displayed. Test for mouse operations. All the mouse operations like click, drag, etc. must perform the necessary operations without any exceptions.
  • 33. FUNCTIONAL TESTING: 5.1. 4 NON-FUNCTIONAL TESTING: The Non Functional software testing encompasses a rich spectrum of testing strategies, describing the expected results for every test case. It uses symbolic analysis techniques. This testing used to check that an application will work in the operational environment. Non- functional testing includes:  Load testing  Performance testing  Usability testing  Reliability testing  Security testing Description Expected result Test for all modules. All peers should communicate in the group. Test for various peer in a distributed network framework as it display all users available in the group. The result after execution should give the accurate result.
  • 34. 5.1.5 LOAD TESTING: An important tool for implementing system tests is a Load generator. A Load generator is essential for testing quality requirements such as performance and stress. A load can be a real load, that is, the system can be put under test to real usage by having actual telephone users connected to it. They will generate test input data for system test. Load Testing 5.1.5 PERFORMANCE TESTING: Performance tests are utilized in order to determine the widely defined performance of the software system such as execution time associated with various parts of the code, response time and device utilization. The intent of this testing is to identify weak points of the software system and quantify its shortcomings. Description Expected result It is necessary to ascertain that the application behaves correctly under loads when ‘Server busy’ response is received. Should designate another active node as a Server.
  • 35. PERFORMANCE TESTING: 5.1.6 RELIABILITY TESTING: The software reliability is the ability of a system or component to perform its required functions under stated conditions for a specified period of time and it is being ensured in this testing. Reliability can be expressed as the ability of the software to reveal defects under testing conditions, according to the specified requirements. It the portability that a software system will operate without failure under given conditions for a given time interval and it focuses on the behavior of the software element. It forms a part of the software quality control team. Description Expected result This is required to assure that an application perforce adequately, having the capability to handle many peers, delivering its results in expected time and using an acceptable level of resource and it is an aspect of operational management. Should handle large input values, and produce accurate result in a expected time.
  • 36. RELIABILITY TESTING: Description Expected result This is to check that the server is rugged and reliable and can handle the failure of any of the components involved in provide the application. In case of failure of the server an alternate server should take over the job. 5.1.7 SECURITY TESTING: Security testing evaluates system characteristics that relate to the availability, integrity and confidentiality of the system data and services. Users/Clients should be encouraged to make sure their security needs are very clearly known at requirements time, so that the security issues can be addressed by the designers and testers.
  • 37. SECURITY TESTING: 5.1.7 WHITE BOX TESTING: White box testing, sometimes called glass-box testing is a test case design method that uses the control structure of the procedural design to derive test cases. Using white box testing method, the software engineer can derive test cases. The White box testing focuses on the inner structure of the software structure to be tested. Description Expected result Checking that the user identification is authenticated. In case failure it should not be connected in the framework. Check whether group keys in a tree are shared by all peers. The peers should know group key in the same group.
  • 38. 5.1.8 WHITE BOX TESTING: 5.1.9 BLACK BOX TESTING: Black box testing, also called behavioral testing, focuses on the functional requirements of the software. That is, black testing enables the software engineer to derive sets of input conditions that will fully exercise all functional requirements for a program. Black box testing is not alternative to white box techniques. Rather it is a complementary approach that is likely to uncover a different class of errors than white box methods. Black box testing attempts to find errors which focuses on inputs, outputs, and principle function of a software module. The starting point of the black box testing is either a specification or code. The contents of the box are hidden and the stimulated software should produce the desired results. Description Expected result Exercise all logical decisions on their true and false sides. All the logical decisions must be valid. Execute all loops at their boundaries and within their operational bounds. All the loops must be finite. Exercise internal data structures to ensure their validity. All the data structures must be valid.
  • 39. 5.1.10 BLACK BOX TESTING: All the above system testing strategies are carried out in as the development, documentation and institutionalization of the proposed goals and related policies is essential. Description Expected result To check for incorrect or missing functions. All the functions must be valid. To check for interface errors. The entire interface must function normally. To check for errors in a data structures or external data base access. The database updation and retrieval must be done. To check for initialization and termination errors. All the functions and data structures must be initialized properly and terminated normally.
  • 40. CHAPTER 7 7.0 SOFTWARE SPECIFICATION: 7.1 FEATURES OF .NET: Microsoft .NET is a set of Microsoft software technologies for rapidly building and integrating XML Web services, Microsoft Windows-based applications, and Web solutions. The .NET Framework is a language-neutral platform for writing programs that can easily and securely interoperate. There’s no language barrier with .NET: there are numerous languages available to the developer including Managed C++, C#, Visual Basic and Java Script. The .NET framework provides the foundation for components to interact seamlessly, whether locally or remotely on different platforms. It standardizes common data types and communications protocols so that components created in different languages can easily interoperate. “.NET” is also the collective name given to various software components built upon the .NET platform. These will be both products (Visual Studio.NET and Windows.NET Server, for instance) and services (like Passport, .NET My Services, and so on).
  • 41. 7.2 THE .NET FRAMEWORK The .NET Framework has two main parts: 1. The Common Language Runtime (CLR). 2. A hierarchical set of class libraries. The CLR is described as the “execution engine” of .NET. It provides the environment within which programs run. The most important features are  Conversion from a low-level assembler-style language, called Intermediate Language (IL), into code native to the platform being executed on.  Memory management, notably including garbage collection.  Checking and enforcing security restrictions on the running code.  Loading and executing programs, with version control and other such features.  The following features of the .NET framework are also worth description: Managed Code The code that targets .NET, and which contains certain extra Information - “metadata” - to describe itself. Whilst both managed and unmanaged code can run in the runtime, only managed code contains the information that allows the CLR to guarantee, for instance, safe execution and interoperability.
  • 42. Managed Data With Managed Code comes Managed Data. CLR provides memory allocation and Deal location facilities, and garbage collection. Some .NET languages use Managed Data by default, such as C#, Visual Basic.NET and JScript.NET, whereas others, namely C++, do not. Targeting CLR can, depending on the language you’re using, impose certain constraints on the features available. As with managed and unmanaged code, one can have both managed and unmanaged data in .NET applications - data that doesn’t get garbage collected but instead is looked after by unmanaged code. Common Type System The CLR uses something called the Common Type System (CTS) to strictly enforce type-safety. This ensures that all classes are compatible with each other, by describing types in a common way. CTS define how types work within the runtime, which enables types in one language to interoperate with types in another language, including cross-language exception handling. As well as ensuring that types are only used in appropriate ways, the runtime also ensures that code doesn’t attempt to access memory that hasn’t been allocated to it. Common Language Specification The CLR provides built-in support for language interoperability. To ensure that you can develop managed code that can be fully used by developers using any programming language, a set of language features and rules for using them called the Common Language Specification (CLS) has been defined. Components that follow these rules and expose only CLS features are considered CLS-compliant.
  • 43. 7.3 THE CLASS LIBRARY .NET provides a single-rooted hierarchy of classes, containing over 7000 types. The root of the namespace is called System; this contains basic types like Byte, Double, Boolean, and String, as well as Object. All objects derive from System. Object. As well as objects, there are value types. Value types can be allocated on the stack, which can provide useful flexibility. There are also efficient means of converting value types to object types if and when necessary. The set of classes is pretty comprehensive, providing collections, file, screen, and network I/O, threading, and so on, as well as XML and database connectivity. The class library is subdivided into a number of sets (or namespaces), each providing distinct areas of functionality, with dependencies between the namespaces kept to a minimum. 7.4 LANGUAGES SUPPORTED BY .NET The multi-language capability of the .NET Framework and Visual Studio .NET enables developers to use their existing programming skills to build all types of applications and XML Web services. The .NET framework supports new versions of Microsoft’s old favorites Visual Basic and C++ (as VB.NET and Managed C++), but there are also a number of new additions to the family. Visual Basic .NET has been updated to include many new and improved language features that make it a powerful object-oriented programming language. These features include inheritance, interfaces, and overloading, among others. Visual Basic also now supports structured exception handling, custom attributes and also supports multi-threading.
  • 44. Visual Basic .NET is also CLS compliant, which means that any CLS-compliant language can use the classes, objects, and components you create in Visual Basic .NET. Managed Extensions for C++ and attributed programming are just some of the enhancements made to the C++ language. Managed Extensions simplify the task of migrating existing C++ applications to the new .NET Framework. C# is Microsoft’s new language. It’s a C-style language that is essentially “C++ for Rapid Application Development”. Unlike other languages, its specification is just the grammar of the language. It has no standard library of its own, and instead has been designed with the intention of using the .NET libraries as its own. Microsoft Visual J# .NET provides the easiest transition for Java-language developers into the world of XML Web Services and dramatically improves the interoperability of Java-language programs with existing software written in a variety of other programming languages. Active State has created Visual Perl and Visual Python, which enable .NET-aware applications to be built in either Perl or Python. Both products can be integrated into the Visual Studio .NET environment. Visual Perl includes support for Active State’s Perl Dev Kit. Other languages for which .NET compilers are available include  FORTRAN  COBOL  Eiffel
  • 45. ASP.NET XML WEB SERVICES Windows Forms Base Class Libraries Common Language Runtime Operating System Fig1 .Net Framework C#.NET is also compliant with CLS (Common Language Specification) and supports structured exception handling. CLS is set of rules and constructs that are supported by the CLR (Common Language Runtime). CLR is the runtime environment provided by the .NET Framework; it manages the execution of the code and also makes the development process easier by providing services. C#.NET is a CLS-compliant language. Any objects, classes, or components that created in C#.NET can be used in any other CLS-compliant language. In addition, we can use objects, classes, and components created in other CLS-compliant languages in C#.NET .The use of CLS ensures complete interoperability among applications, regardless of the languages used to create the application.
  • 46. CONSTRUCTORS AND DESTRUCTORS: Constructors are used to initialize objects, whereas destructors are used to destroy them. In other words, destructors are used to release the resources allocated to the object. In C#.NET the sub finalize procedure is available. The sub finalize procedure is used to complete the tasks that must be performed when an object is destroyed. The sub finalize procedure is called automatically when an object is destroyed. In addition, the sub finalize procedure can be called only from the class it belongs to or from derived classes. GARBAGE COLLECTION Garbage Collection is another new feature in C#.NET. The .NET Framework monitors allocated resources, such as objects and variables. In addition, the .NET Framework automatically releases memory for reuse by destroying objects that are no longer in use. In C#.NET, the garbage collector checks for the objects that are not currently in use by applications. When the garbage collector comes across an object that is marked for garbage collection, it releases the memory occupied by the object. OVERLOADING Overloading is another feature in C#. Overloading enables us to define multiple procedures with the same name, where each procedure has a different set of arguments. Besides using overloading for procedures, we can use it for constructors and properties in a class.
  • 47. MULTITHREADING: C#.NET also supports multithreading. An application that supports multithreading can handle multiple tasks simultaneously, we can use multithreading to decrease the time taken by an application to respond to user interaction. STRUCTURED EXCEPTION HANDLING C#.NET supports structured handling, which enables us to detect and remove errors at runtime. In C#.NET, we need to use Try…Catch…Finally statements to create exception handlers. Using Try…Catch…Finally statements, we can create robust and effective exception handlers to improve the performance of our application.
  • 48. 7.5 THE .NET FRAMEWORK The .NET Framework is a new computing platform that simplifies application development in the highly distributed environment of the Internet. OBJECTIVES OF .NET FRAMEWORK 1. To provide a consistent object-oriented programming environment whether object codes is stored and executed locally on Internet-distributed, or executed remotely. 2. To provide a code-execution environment to minimizes software deployment and guarantees safe execution of code. 3. Eliminates the performance problems. There are different types of application, such as Windows-based applications and Web-based applications.
  • 49. 7.6 FEATURES OF SQL-SERVER The OLAP Services feature available in SQL Server version 7.0 is now called SQL Server 2000 Analysis Services. The term OLAP Services has been replaced with the term Analysis Services. Analysis Services also includes a new data mining component. The Repository component available in SQL Server version 7.0 is now called Microsoft SQL Server 2000 Meta Data Services. References to the component now use the term Meta Data Services. The term repository is used only in reference to the repository engine within Meta Data Services SQL-SERVER database consist of six type of objects, They are, 1. TABLE 2. QUERY 3. FORM 4. REPORT 5. MACRO
  • 50. 7.7 TABLE: A database is a collection of data about a specific topic. VIEWS OF TABLE: We can work with a table in two types, 1. Design View 2. Datasheet View DesignView To build or modify the structure of a table we work in the table design view. We can specify what kind of data will be hold. Datasheet View To add, edit or analyses the data itself we work in tables datasheet view mode. QUERY: A query is a question that has to be asked the data. Access gathers data that answers the question from one or more table. The data that make up the answer is either dynaset (if you edit it) or a snapshot (it cannot be edited).Each time we run query, we get latest information in the dynaset. Access either displays the dynaset or snapshot for us to view or perform an action on it, such as deleting or updating.
  • 51. CHAPTER 7 7.0 APPENDIX 7.1 SAMPLE SCREEN SHOTS: 7.2 SAMPLE SOURCE CODE:
  • 52. CHAPTER 8 8.1 CONCLUSION AND FUTURE: In this paper, a DNN-based framework for speech enhancement is proposed. Among the various DNN configurations, a large training set is crucial to learn the rich structure of the mapping function between noisy and clean speech features. It was found that the application of more acoustic context information improves the system performance and makes the enhanced speech less discontinuous. Moreover, multi-condition training with many kinds of noise types can achieve a good generalization capability to unseen noise environments. By doing so, the proposed DNN framework is also powerful to cope with non-stationary noises in real-world environments. An over-smoothing problem in speech quality was found in the MMSE-optimized DNNs and one proposed post-processing technique, called GV equalization, was effective in brightening the formant spectra of the enhanced speech signals. Two improved training techniques were further adopted to reduce the residual noise and increase the performance. Compared with the LogMMSE method, significant improvements were achieved across different unseen noise conditions. Another interesting observation was that the proposed DNN-based speech enhancement system is quite effective for dealing with real-world noisy speech in different languages and across different recording conditions not observed during DNN training. In future studies, we would increase the speech diversity by first incorporating clean speech data from a rich collection of materials covering more languages and speakers. Second, there are many factors in designing the training set. We would utilize principles in experimental design [54], [55] for multi-factor analysis to alleviate the requirement of a huge amount of training data and still maintain a good generalization capability of the DNN model. Third, some other features, such as Gammatone filterbank power spectra [50], Multi-resolution cochleagram feature [56], will be adopted as in [50] to enrich the input information to DNNs. Finally, a dynamic noise adaptation scheme will also be investigated for the purpose of improving tracking of non-stationary noises.
  • 53. CHAPTER 9 9.1 REFERENCES: [1] L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, “Voice conversion using deep neural networks with layer-wise generative training,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1859–1872, Dec. 2014. [2] Z.-H. Ling, L. Deng, and D. Yu, “Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2129–2139, Oct. 2013. [3] B.-Y. Xia and C.-C. Bao, “Speech enhancement with weighted denoising Auto-Encoder,” in Proc. Interspeech, 2013, pp. 3444–3448. [23] X.-G. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising Auto-Encoder,” in Proc. Interspeech, 2013, pp. 436–440. [4] A. L. Maas, Q. V. Le, T. M. O’Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, “Recurrent neural networks for noise reduction in robust ASR,” in Proc. Interspeech, 2012, pp. 22–25. [5] M. Wollmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, “Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise,” in Proc. ICASSP, 2013, pp. 6822–6826. [6] H. Christensen, J. Barker, N. Ma, and P. D. Green, “The CHiME corpus: A resource and a challenge for computational hearing in multisource environments,” in Proc. Interspeech, 2010, pp. 1918–1921. [7] Y. X. Wang and D. L. Wang, “Towards scaling up classification-based speech separation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381–1390, Jul. 2013.