SlideShare a Scribd company logo
1 of 7
Voice Morphing
Zainab M.I. Sayyed#1
, Sameera M.A. Khan#2
, Shafaque M.Islam#3
#
Zainab-Sameera Department of Information Technology
#
Shafaque Department of Computer Engineering
M.H.Saboo Siddik Polytechnic, India
1
sheroskills@hotmail.com
2
khan_sameera85@yahoo.com
3
shafaque.mislam@gmail.com
ABSTRACT
Voice morphing means the transition of one speech signal into another. The new
morphed signal will have the same information content as the two input speech
signals but a different pitch, which is determined by the morphing algorithm. To do
this, each signal's information has to be converted into another representation, which
enables the pitch and spectral envelope to be encoded on orthogonal axes. Individual
components of the speech signal are then matched and the signal’s amplitudes are
then interpolated to produce a new speech signal. This new signal's representation
then has to be converted back to an acoustic waveform. This project vividly describes
the representations of the signals required to affect the morph and also the techniques
required to match the signal components, interpolate the amplitudes and invert the
new signal’s representation back to an acoustic waveform.
1. INTRODUCTION
Voice morphing means the transition of one speech signal into another. Like
image morphing, speech morphing aims to preserve the shared characteristics of the
starting and final signals, while generating a smooth transition between them. Speech
morphing is analogous to image morphing. In image morphing the in-between images
all show one face smoothly changing its shape and texture until it turns into the target
face. It is this feature that a speech morph should possess. One speech signal should
smoothly change into another, keeping the shared characteristics of the starting and
ending signals but smoothly changing the other properties. The major properties of
concern as far as a speech signal is concerned are its pitch and envelope information.
These two reside in a convolved form in a speech signal. Hence some efficient
method for extracting each of these is necessary. We have adopted an uncomplicated
approach namely cepstral analysis to do the same. Pitch and formant information in
each signal is extracted using the cepstral approach. Necessary processing to obtain
the morphed speech signal include methods like Cross fading of envelope
information, Dynamic Time Warping to match the major signal features (pitch) and
Signal Re-estimation to convert the morphed speech signal back into the acoustic
waveform.
2. AN INTROSPECTION OF THE MORPHING PROCESS
Speech morphing can be achieved by transforming the signals representation
from the acoustic waveform obtained by sampling of the analog signal, with which
many people are familiar with, to another representation. To prepare the signal for the
transformation, it is split into a number of 'frames' - sections of the waveform. The
transformation is then applied to each frame of the signal. This provides another way
of viewing the signal information. The new representation (said to be in the frequency
domain) describes the average energy present at each frequency band.
Further analysis enables two pieces of information to be obtained: pitch
information and the overall envelope of the sound. A key element in the morphing is
the manipulation of the pitch information. If two signals with different pitches were
simply cross-faded it is highly likely that two separate sounds will be heard. This
occurs because the signal will have two distinct pitches causing the auditory system to
perceive two different objects. A successful morph must exhibit a smoothly changing
pitch throughout. The pitch information of each sound is compared to provide the best
match between the two signals' pitches. To do this match, the signals are stretched and
compressed so that important sections of each signal match in time. The interpolation
of the two sounds can then be performed which creates the intermediate sounds in the
morph. The final stage is then to convert the frames back into a normal waveform.
However, after the morphing has been performed, the legacy of the earlier analysis
becomes apparent. The conversion of the sound to a representation in which the pitch
and spectral envelope can be separated loses some information. Therefore, this
information has to be re-estimated for the morphed sound. This process obtains an
acoustic waveform, which can then be stored or listened to.
Fig1. Schematic block diagram of the speech morphing process
3. MORPHING PROCESS: A COMPREHENSIVE ANALYSIS
The algorithm to be used is shown in the simplified block diagram given below. The
algorithm contains a number of fundamental signal processing methods including
sampling, the discrete Fourier transform and its inverse, cepstral analysis. However
the main processes can be categorized as follows.
I. Preprocessing or representation conversion: This involves processes like signal
acquisition in discrete form and windowing.
II. Cepstral analysis or Pitch and Envelope analysis: This process will extract
the pitch and formant information in the speech signal.
III. Morphing which includes Warping and interpolation.
IV. Signal re-estimation.
Fig2. Block diagram of the simplified speech morphing algorithm
Representation
Conversion
Cepstral
Analysis
Morphing
Signal
Estimation
Representation
Conversion Cepstral
Analysis
Speech Signal 1
Pitch
Envelope
Speech Signal 2 Pitch
Envelope
Morph
4. EXAMPLE OF MORPHING
Both signals will have a number of 'time-varying properties'. To create an effective
morph, it is necessary to match one or more of these properties of each signal to those
of the other signal in some way. The property of concern is the pitch of the signal
although other properties such as the amplitude could be used - and will have a
number of features. It is almost certain that matching features do not occur at exactly
the same point in each signal. Therefore, the feature must be moved to some point in
between the position in the first sound and the second sound. In other words, to
smoothly morph the pitch information, the pitch present in each signals needs to be
matched and then the amplitude at each frequency cross-faded. To perform the pitch
matching, a pitch contour for the entire signal is required. This is obtained by using
the pitch peak location in each cepstral pitch slice.
Consider the simple case of two signals, each with two features occurring in different
positions as shown in the figure below.
Fig3. The match path between two signals with differently located features
The match path shows the amount of movement (or warping) required in order
aligning corresponding features in time. Such a match path is obtained by Dynamic
Time Warping (DTW).
The Dynamic Time Warping (DTW) Algorithm
The basic DTW algorithm is symmetrical - in other words, every frame in signals
must be used. The constraints placed upon the matching process are:
• Matching paths cannot go backwards in time;
• Every frame in each signal must be used in a matching path;
• Local distance scores are combined by adding to give a global distance.
4 RESULTS AND EVALUATION
Figures 4, 5, 6 and 7 show some results of our morphing on speech from male-to-
male, female-to-female, female-to-male, and male-to-female pairs of speakers.
Fig 4. Source, Target and Morphed sentence waveform for a male-to-male speaker
speech transformation of the utterance .Don’t ask me to carry.
Fig5. Source, Target and Morphed sentence waveform for a female-to-female speaker
speech transformation of the utterance .Don ‘t ask me to carry.
Fig 6: Source, Target and Morphed sentence waveform for a female-to-male speaker
speech transformation of the utterance .She had your dark suit.
Fig7. Source, Target and Morphed sentence waveform for a male-to-female speaker
speech transformation of the utterance .She had your dark suit.
CONCLUSION
Voice morphing means the transition of one speech signal into another. Like
image morphing, speech morphing aims to preserve the shared characteristics of the
starting and final signals, while generating a smooth transition between them. One
speech signal should smoothly change into another, keeping the shared characteristics
of the starting and ending signals but smoothly changing the other properties.
Pitch and formant information in each signal is extracted using the cepstral
approach. Necessary processing to obtain the morphed speech signal include methods
like Cross fading of envelope information, Dynamic Time Warping to match the
major signal features (pitch) and Signal Re-estimation to convert the morphed speech
signal back into the acoustic waveform.
ACKNOWLEGEMENT
The National Conference conducted by Navjeevan Polytechnic, gave this opportunity
to share our study in field of Voice Morphing. As Voice Morphing is not a topic or
subject we can consider it as a branch or course of engineering like Computers, IT,
Mechanical etc.
First we would like to express our gratitude to all committee members who put their
precious time for arrangement of this conference. We received constant
encouragement from our colleagues and friends which made our paper much easier
and be completed in time.
REFERENCES
[1] Drioli C., Radial basis function networks forconversion of sound speech spectra,
EURASIP Journal on Applied Signal Processing, Vol. 2001, No.1, 2001, pp. 36-40.
[2] Valbret H. , Voice transformation using PSOLA technique , Speech
Communication, Vol .11, No 2-3, 1992, pp. 175-187.
[3] Arslan L., Speaker transformation algorithm using segmental codebooks, Speech
Communication, No. 28, 1999, pp. 211-226.
[4] Arslan L. and Talkin D, Voice Conversion by Codebook Mapping of Line
Spectral Frequencies and Excitation Spectrum, Proceedings of Eurospeech, 1997, pp.
1347-1350.
[5] Stylianou Y., Cappe O. and Moulines E., Statistical Methods for Voice Quality
Transformation, Proceedings of Eurospeech, 1995, pp. 447-450.

More Related Content

What's hot

Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technologySrijanKumar18
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Silentsound documentation
Silentsound documentationSilentsound documentation
Silentsound documentationRaj Niranjan
 
Silent sound technology final report
Silent sound technology final reportSilent sound technology final report
Silent sound technology final reportLohit Dalal
 
Software defined radio....
Software defined radio....Software defined radio....
Software defined radio....Bise Mond
 
Abstract Silent Sound Technology
Abstract   Silent Sound TechnologyAbstract   Silent Sound Technology
Abstract Silent Sound Technologyvishnu murthy
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound TechnologyHafiz Sanni
 
Chapter 6 - Multimedia Over Ip
Chapter 6 - Multimedia Over IpChapter 6 - Multimedia Over Ip
Chapter 6 - Multimedia Over IpPratik Pradhan
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
pulse position modulation ppm | Communication Systems
pulse position modulation ppm | Communication Systemspulse position modulation ppm | Communication Systems
pulse position modulation ppm | Communication SystemsLearn By Watch
 
Silent sound technologyrevathippt
Silent sound technologyrevathipptSilent sound technologyrevathippt
Silent sound technologyrevathipptrevathiyadavb
 
Sniffer for the mobile phones
Sniffer for the mobile phonesSniffer for the mobile phones
Sniffer for the mobile phonesUpender Upr
 
Silent sound technology NEW
Silent sound technology NEW Silent sound technology NEW
Silent sound technology NEW Neha Tyagi
 
silent sound new by RAJ NIRANJAN
silent sound new by RAJ NIRANJANsilent sound new by RAJ NIRANJAN
silent sound new by RAJ NIRANJANRaj Niranjan
 
silent sound technology
silent sound technologysilent sound technology
silent sound technologykamesh0007
 
Silent sound technology
Silent sound technologySilent sound technology
Silent sound technologynixytl
 

What's hot (20)

Voice morphing-
Voice morphing-Voice morphing-
Voice morphing-
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Silentsound documentation
Silentsound documentationSilentsound documentation
Silentsound documentation
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound Technology
 
Silent sound technology final report
Silent sound technology final reportSilent sound technology final report
Silent sound technology final report
 
Software defined radio....
Software defined radio....Software defined radio....
Software defined radio....
 
Abstract Silent Sound Technology
Abstract   Silent Sound TechnologyAbstract   Silent Sound Technology
Abstract Silent Sound Technology
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound Technology
 
Chapter 6 - Multimedia Over Ip
Chapter 6 - Multimedia Over IpChapter 6 - Multimedia Over Ip
Chapter 6 - Multimedia Over Ip
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
pulse position modulation ppm | Communication Systems
pulse position modulation ppm | Communication Systemspulse position modulation ppm | Communication Systems
pulse position modulation ppm | Communication Systems
 
Silent sound technologyrevathippt
Silent sound technologyrevathipptSilent sound technologyrevathippt
Silent sound technologyrevathippt
 
Sniffer for the mobile phones
Sniffer for the mobile phonesSniffer for the mobile phones
Sniffer for the mobile phones
 
Voicexml ppt
Voicexml pptVoicexml ppt
Voicexml ppt
 
Silent sound technology NEW
Silent sound technology NEW Silent sound technology NEW
Silent sound technology NEW
 
silent sound new by RAJ NIRANJAN
silent sound new by RAJ NIRANJANsilent sound new by RAJ NIRANJAN
silent sound new by RAJ NIRANJAN
 
silent sound technology
silent sound technologysilent sound technology
silent sound technology
 
Silent sound technology
Silent sound technologySilent sound technology
Silent sound technology
 

Viewers also liked (20)

All In One Olathe - revised 10-24-11
All In One Olathe - revised 10-24-11All In One Olathe - revised 10-24-11
All In One Olathe - revised 10-24-11
 
March 3 2004 for the ai cie
March 3 2004 for the ai cieMarch 3 2004 for the ai cie
March 3 2004 for the ai cie
 
Voice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from LaryngectomyVoice Morphing System for People Suffering from Laryngectomy
Voice Morphing System for People Suffering from Laryngectomy
 
Airborn internet
Airborn internetAirborn internet
Airborn internet
 
Broadband
Broadband Broadband
Broadband
 
REPORT ON :- Airborne internet
REPORT ON :- Airborne internetREPORT ON :- Airborne internet
REPORT ON :- Airborne internet
 
The airborne internet final my
The airborne internet final myThe airborne internet final my
The airborne internet final my
 
FINAL REVIEW
FINAL REVIEWFINAL REVIEW
FINAL REVIEW
 
Airborne internet
Airborne internetAirborne internet
Airborne internet
 
Rain Technology
Rain TechnologyRain Technology
Rain Technology
 
Single electron transistor
Single electron transistorSingle electron transistor
Single electron transistor
 
Touch screen sensor
Touch screen sensorTouch screen sensor
Touch screen sensor
 
Polymer memory
Polymer memoryPolymer memory
Polymer memory
 
Paper battery
Paper batteryPaper battery
Paper battery
 
Presentation on Paper battery
Presentation on Paper battery Presentation on Paper battery
Presentation on Paper battery
 
Virtual Reality
Virtual RealityVirtual Reality
Virtual Reality
 
Virtual Reality
Virtual RealityVirtual Reality
Virtual Reality
 
Touchscreen PPT
Touchscreen PPTTouchscreen PPT
Touchscreen PPT
 
Paper battery
Paper batteryPaper battery
Paper battery
 
Chip morphing
Chip morphingChip morphing
Chip morphing
 

Similar to Voice Morphing

IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET Journal
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
 
Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...eSAT Publishing House
 
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionLine Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionIDES Editor
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognitionsunnysyed
 
Compression Using Wavelet Transform
Compression Using Wavelet TransformCompression Using Wavelet Transform
Compression Using Wavelet TransformCSCJournals
 
Enhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationEnhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationIRJET Journal
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
An Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
An Effective Approach for Chinese Speech Recognition on Small Size of VocabularyAn Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
An Effective Approach for Chinese Speech Recognition on Small Size of Vocabularysipij
 
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...ijsrd.com
 
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...IRJET Journal
 

Similar to Voice Morphing (20)

Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
Bz33462466
Bz33462466Bz33462466
Bz33462466
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
 
Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...
 
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis FunctionLine Spectral Pairs Based Voice Conversion using Radial Basis Function
Line Spectral Pairs Based Voice Conversion using Radial Basis Function
 
G010424248
G010424248G010424248
G010424248
 
44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition44 i9 advanced-speaker-recognition
44 i9 advanced-speaker-recognition
 
H0814247
H0814247H0814247
H0814247
 
Compression Using Wavelet Transform
Compression Using Wavelet TransformCompression Using Wavelet Transform
Compression Using Wavelet Transform
 
Enhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationEnhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition application
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
An Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
An Effective Approach for Chinese Speech Recognition on Small Size of VocabularyAn Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
An Effective Approach for Chinese Speech Recognition on Small Size of Vocabulary
 
50120140501002
5012014050100250120140501002
50120140501002
 
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...
An Adaptive Approach to Switching Coded Modulation in OFDM System Under AWGN ...
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
Artificial Intelligent Algorithm for the Analysis, Quality Speech & Different...
 

Voice Morphing

  • 1. Voice Morphing Zainab M.I. Sayyed#1 , Sameera M.A. Khan#2 , Shafaque M.Islam#3 # Zainab-Sameera Department of Information Technology # Shafaque Department of Computer Engineering M.H.Saboo Siddik Polytechnic, India 1 sheroskills@hotmail.com 2 khan_sameera85@yahoo.com 3 shafaque.mislam@gmail.com ABSTRACT Voice morphing means the transition of one speech signal into another. The new morphed signal will have the same information content as the two input speech signals but a different pitch, which is determined by the morphing algorithm. To do this, each signal's information has to be converted into another representation, which enables the pitch and spectral envelope to be encoded on orthogonal axes. Individual components of the speech signal are then matched and the signal’s amplitudes are then interpolated to produce a new speech signal. This new signal's representation then has to be converted back to an acoustic waveform. This project vividly describes the representations of the signals required to affect the morph and also the techniques required to match the signal components, interpolate the amplitudes and invert the new signal’s representation back to an acoustic waveform. 1. INTRODUCTION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. Speech morphing is analogous to image morphing. In image morphing the in-between images all show one face smoothly changing its shape and texture until it turns into the target face. It is this feature that a speech morph should possess. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties. The major properties of concern as far as a speech signal is concerned are its pitch and envelope information. These two reside in a convolved form in a speech signal. Hence some efficient method for extracting each of these is necessary. We have adopted an uncomplicated approach namely cepstral analysis to do the same. Pitch and formant information in each signal is extracted using the cepstral approach. Necessary processing to obtain the morphed speech signal include methods like Cross fading of envelope information, Dynamic Time Warping to match the major signal features (pitch) and Signal Re-estimation to convert the morphed speech signal back into the acoustic waveform.
  • 2. 2. AN INTROSPECTION OF THE MORPHING PROCESS Speech morphing can be achieved by transforming the signals representation from the acoustic waveform obtained by sampling of the analog signal, with which many people are familiar with, to another representation. To prepare the signal for the transformation, it is split into a number of 'frames' - sections of the waveform. The transformation is then applied to each frame of the signal. This provides another way of viewing the signal information. The new representation (said to be in the frequency domain) describes the average energy present at each frequency band. Further analysis enables two pieces of information to be obtained: pitch information and the overall envelope of the sound. A key element in the morphing is the manipulation of the pitch information. If two signals with different pitches were simply cross-faded it is highly likely that two separate sounds will be heard. This occurs because the signal will have two distinct pitches causing the auditory system to perceive two different objects. A successful morph must exhibit a smoothly changing pitch throughout. The pitch information of each sound is compared to provide the best match between the two signals' pitches. To do this match, the signals are stretched and compressed so that important sections of each signal match in time. The interpolation of the two sounds can then be performed which creates the intermediate sounds in the morph. The final stage is then to convert the frames back into a normal waveform. However, after the morphing has been performed, the legacy of the earlier analysis becomes apparent. The conversion of the sound to a representation in which the pitch and spectral envelope can be separated loses some information. Therefore, this information has to be re-estimated for the morphed sound. This process obtains an acoustic waveform, which can then be stored or listened to. Fig1. Schematic block diagram of the speech morphing process
  • 3. 3. MORPHING PROCESS: A COMPREHENSIVE ANALYSIS The algorithm to be used is shown in the simplified block diagram given below. The algorithm contains a number of fundamental signal processing methods including sampling, the discrete Fourier transform and its inverse, cepstral analysis. However the main processes can be categorized as follows. I. Preprocessing or representation conversion: This involves processes like signal acquisition in discrete form and windowing. II. Cepstral analysis or Pitch and Envelope analysis: This process will extract the pitch and formant information in the speech signal. III. Morphing which includes Warping and interpolation. IV. Signal re-estimation. Fig2. Block diagram of the simplified speech morphing algorithm Representation Conversion Cepstral Analysis Morphing Signal Estimation Representation Conversion Cepstral Analysis Speech Signal 1 Pitch Envelope Speech Signal 2 Pitch Envelope Morph
  • 4. 4. EXAMPLE OF MORPHING Both signals will have a number of 'time-varying properties'. To create an effective morph, it is necessary to match one or more of these properties of each signal to those of the other signal in some way. The property of concern is the pitch of the signal although other properties such as the amplitude could be used - and will have a number of features. It is almost certain that matching features do not occur at exactly the same point in each signal. Therefore, the feature must be moved to some point in between the position in the first sound and the second sound. In other words, to smoothly morph the pitch information, the pitch present in each signals needs to be matched and then the amplitude at each frequency cross-faded. To perform the pitch matching, a pitch contour for the entire signal is required. This is obtained by using the pitch peak location in each cepstral pitch slice. Consider the simple case of two signals, each with two features occurring in different positions as shown in the figure below. Fig3. The match path between two signals with differently located features The match path shows the amount of movement (or warping) required in order aligning corresponding features in time. Such a match path is obtained by Dynamic Time Warping (DTW). The Dynamic Time Warping (DTW) Algorithm The basic DTW algorithm is symmetrical - in other words, every frame in signals must be used. The constraints placed upon the matching process are: • Matching paths cannot go backwards in time; • Every frame in each signal must be used in a matching path; • Local distance scores are combined by adding to give a global distance.
  • 5. 4 RESULTS AND EVALUATION Figures 4, 5, 6 and 7 show some results of our morphing on speech from male-to- male, female-to-female, female-to-male, and male-to-female pairs of speakers. Fig 4. Source, Target and Morphed sentence waveform for a male-to-male speaker speech transformation of the utterance .Don’t ask me to carry. Fig5. Source, Target and Morphed sentence waveform for a female-to-female speaker speech transformation of the utterance .Don ‘t ask me to carry.
  • 6. Fig 6: Source, Target and Morphed sentence waveform for a female-to-male speaker speech transformation of the utterance .She had your dark suit. Fig7. Source, Target and Morphed sentence waveform for a male-to-female speaker speech transformation of the utterance .She had your dark suit.
  • 7. CONCLUSION Voice morphing means the transition of one speech signal into another. Like image morphing, speech morphing aims to preserve the shared characteristics of the starting and final signals, while generating a smooth transition between them. One speech signal should smoothly change into another, keeping the shared characteristics of the starting and ending signals but smoothly changing the other properties. Pitch and formant information in each signal is extracted using the cepstral approach. Necessary processing to obtain the morphed speech signal include methods like Cross fading of envelope information, Dynamic Time Warping to match the major signal features (pitch) and Signal Re-estimation to convert the morphed speech signal back into the acoustic waveform. ACKNOWLEGEMENT The National Conference conducted by Navjeevan Polytechnic, gave this opportunity to share our study in field of Voice Morphing. As Voice Morphing is not a topic or subject we can consider it as a branch or course of engineering like Computers, IT, Mechanical etc. First we would like to express our gratitude to all committee members who put their precious time for arrangement of this conference. We received constant encouragement from our colleagues and friends which made our paper much easier and be completed in time. REFERENCES [1] Drioli C., Radial basis function networks forconversion of sound speech spectra, EURASIP Journal on Applied Signal Processing, Vol. 2001, No.1, 2001, pp. 36-40. [2] Valbret H. , Voice transformation using PSOLA technique , Speech Communication, Vol .11, No 2-3, 1992, pp. 175-187. [3] Arslan L., Speaker transformation algorithm using segmental codebooks, Speech Communication, No. 28, 1999, pp. 211-226. [4] Arslan L. and Talkin D, Voice Conversion by Codebook Mapping of Line Spectral Frequencies and Excitation Spectrum, Proceedings of Eurospeech, 1997, pp. 1347-1350. [5] Stylianou Y., Cappe O. and Moulines E., Statistical Methods for Voice Quality Transformation, Proceedings of Eurospeech, 1995, pp. 447-450.