SlideShare a Scribd company logo
HOME ASSIGNMENT
SUBMITTED BY:
MANVI PRIYA BE/10007/14
CHIRAG JAIN BE/10038/14
ANKITA SINGH BE/10136/14
MEC2163 - SPEECH PROCESSING AND RECOGNITION
EFFECT OF
WATERMARKING IN
SPEECH SIGNAL
INTRODUCTION
• Watermarking is the technique and art of hiding additional
data (such as watermarked bits, logo and text message) in the
host signal which includes image, video, audio, speech, text,
without any perceptibility of the existence of additional
information. The additional information which is embedded in
the host signal should be extractable and must resist various
intentional and unintentional attacks. Digital speech
watermarking process is depicted in Fig.
TYPES OF DIGITAL SPEECH
WATERMARKING
• There are two main types of digital speech watermarking in terms of
robustness:
1. Robust digital speech watermarking in which embedded and
additional information must resist channel attacks.
2. Fragile digital speech watermarking in which additional
information must be destroyed if any attack or transformation
takes place like for paper watermarks in bank notes.
• In terms of source and extraction module for digital speech
watermarking are found three main categories:
1. Blind speech watermarking which does not need any extra
information such as original signal, logo or watermarked bits.
2. Semi-blind speech watermarking which may need extra
information for the extraction phase like access to the published
watermarked signal that is the original signal after just adding the
watermark.
3. Non-blind speech watermarking which needs the original signal
and the watermarked signal.
APPLICATIONS OF DIGITAL
SPEECH WATERMARKING
• Different applications of digital speech watermarking are known:
1. Copy control: Cryptography algorithms are very slow and the cracker may use software,
e.g. DeCSS or reverse engineering techniques to decrypt a valid key. However,
watermarking can be combined with certain content for the recording device to refuse
to copy so that the watermarked bits are detectable easily.
2. Device control is in the border category and copy control is one of its applications. For
example, Digimarc’s MediaBridge interacts with a TV program by using action toys.
Skipping advertisements can be done automatically by turning functions on and off.
3. Owner identification: According to American laws, when the owner’s right is misused,
the system can restrict the owner’s material. Even the copyright is not considered. This
application considering helps to protect the holder’s right without considering the
copyright in the distributed copies.
4. Proof of ownership: Creating a central repository for every copyright is too costly when
textual copyright is needed. Watermarking can be used as alternative to proof of
ownership.
In case of authentication, fragile watermark is used by embedding the watermark in the
original data. If the impostor manipulates the content, then the watermark will be altered. As
a consequence, the media will not be taken as genuine.
Another watermarking application is using fingerprints to enable the holder to detect and
investigate the source of the authorized version by restricting the unauthorized users. Other
applications of watermarking are broadcast monitoring, copy prevention, access control and
transaction tracking.
WATERMARK DESIGN
• Each audio signal is watermarked with a unique codeword.
• Our watermarking scheme is based on a repeated application
of a basic watermarking operation on processed versions of
the audio signal.
• The basic method uses three steps to watermark an audio
segment as shown in Fig.
• The complete watermarking scheme is shown in Fig Below we
provide a detailed explanation of the basic watermarking step
and the complete watermarking technique.
METHODS OF DIGITAL
WATERMARKING
Different methods are used for digital speech watermarking. Figure presents
an overview of these methods.
AUDITORY MASKING
• Auditory masking in general is defined by the American
standards agency as ‘the process by which the threshold of
audibility for one sound is raised by the presence of another
sound’ and ‘the amount by which the threshold of audibility of
sound is raised by the presence of another sound’.
TEMPORAL MASKING
• Temporal masking consists of pre-masking and post-masking.
With a stronger masker, the weaker maskee region becomes
inaudible from 50 to 200 ms after the masker. In pre-masking,
the weaker signal becomes inaudible before the stronger
masker from about 5 to 20 ms before the masker. The pre
masking effect is much harder to detect compared to the post-
masking effect. The temporal masking can be detected by
using time domain.
WATERMARK EMBEDDING:
• By applying the masking effect in the frequency and temporal
domain, the watermark which is a noise-like sequence, is shaped.
1. First, the speech signal is segmented into a block with a
predefined size.
2. Second, the power spectrum of this block is calculated by FFT or
DWT. Third, the frequency masking of this block is computed.
3. Fourth, the masking weights for shaping the watermark bits
(noise-like sequence) are applied.
4. Fifth, the inverse power spectrum which is Inverse FFT or Inverse
DWT is computed.
5. Sixth, the temporal masking for shaping the noise-like sequence is
calculated. Seventh, the temporal and frequency domain for
embedding the watermark into the speech signal are used.
The process is shown in Fig.
Fig.
Using the temporal masking model guarantees that the watermark
cannot be heard. Applying the frequency domain itself may not be
enough, for example, when a fixed window of Fourier transform is not
provided with a suitable time resolution. In some cases, when FFT is
applied on the watermark, it can spear over whole blocks. When the
block’s energy is not enough and shorter than the block size under
analysis, then the watermark is masked inside the subinterval. This
situation causes distortion.
WATERMARK EXTRACTION
• The watermark bits must be detectable even if the speech signal has been under
various signal processing attacks. Although the watermark is trading as noise for
speech, the attacker still attempts to destroy it blindly. For example, N is the
number of recovered speech sample and the extraction algorithm has the proper
location of the received speech signal, the samples may still contain watermark
bits or may not. It can be assumed that r(i)=s(i)+d(i) where d(i) is a contaminant
which is noise alone or watermark and noise. The watermark bits are extracted
by hypothesis testing as in the following Eq. (1)
where n(i) is noise and w′(i) is modified watermark.
• In another paper (Swanson et al. 1998) a similar measurement is used to
evaluate the robustness of the algorithm by calculating the original
watermark w(i) and extracted watermark w′(i) as in the following Eq.
• The watermark is compared to the threshold to evaluate the system’s
robustness. However sometimes cases, the extraction system may not
find the exact location of the watermark bits as in r(i)=s(i+τ)+d(i),
0≤i≤N−1, where all the parameters are like before, just τ is the delay
corresponding to time shifting through the samples.
• In this case, for the evaluation of robustness, a generalized likelihood
ratio test (Swanson et al. 1998) is done to determine whether the
received speech may or may not have the watermark as in following
Eq. (The watermark is again compared to a threshold. The higher ratio
means that the watermark is present. The generalized likelihood ratio
test is performed, if the speech signal is also suspected under time
scaling attack.
Perceptual distance between
watermarked and original
speech
• There are many methods available for calculating the
perceptual distance, the more common is Lp-norm. By
increasing the p, high energy regions are given more weight
for better measurement. Applying L1 norm is shown in the
following Eq.
where c2 is the additional calibration constant to improve the sensitivity of
this model. Equation (5) can lead to analytical expression for masking
threshold as seen in Eq. (6). The majority of quantization noise speech
watermarking assumes that X and ε are uncorrelated E(Xε)=0. Equation (6)
is shown as follows:
Another assumption is related to a masking situation when just negligible
errors may corrupt the clean speech signal.
QUESTIONS:
1. What is the use of dynamic time warping?
2. What are the merits and demerits of silence part of the speech signal?
3. Consider an HMM representation of a coin tossing experiment. Assume a
three state model corresponding to three different coins with
probabilities
State 1 State 2 State 3
P(heads) 0.50 0.25 0.25
P(tails) 0.50 0.75 0.75
And with all state transition probabilities equal to 1/3 (assume initial state
probabilities of 1/3)
Sequence O = {HHHHTHTTTT}
What state sequence is most likely? What is the probability of the
observation sequence and this most likely state sequence?
QUESTIONS(contd.)
4. Consider the observation sequence O’= {HTTHTHHTTH}.
How would your answer to the question change?
5. Differentiate between LPC and LPCC. How LPCC is superior
for speech recognition?
Speech processing and recognition

More Related Content

What's hot

Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnAutomatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
ijcsa
 
Application of Estimation Techniques to Sonar Lofargram
Application of Estimation Techniques to Sonar LofargramApplication of Estimation Techniques to Sonar Lofargram
Application of Estimation Techniques to Sonar LofargramJustin Bell, M.S.E.E.
 
Dcn 3
Dcn 3Dcn 3
Localization of Objects Using Cross-Correlation of Shadow Fading Noise and Co...
Localization of Objects Using Cross-Correlation of Shadow Fading Noise and Co...Localization of Objects Using Cross-Correlation of Shadow Fading Noise and Co...
Localization of Objects Using Cross-Correlation of Shadow Fading Noise and Co...
Rana Basheer
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
IJERA Editor
 
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
IOSRJVSP
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
ijsrd.com
 
General and emergency communication principles
General and emergency communication principlesGeneral and emergency communication principles
General and emergency communication principlesMassimiliano Mesenasco
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Power delay profile,delay spread and doppler spread
Power delay profile,delay spread and doppler spreadPower delay profile,delay spread and doppler spread
Power delay profile,delay spread and doppler spreadManish Srivastava
 
Indoor Localization in Wireless Sensor Networks
Indoor Localization in Wireless Sensor NetworksIndoor Localization in Wireless Sensor Networks
Indoor Localization in Wireless Sensor Networks
International Journal of Engineering Inventions www.ijeijournal.com
 

What's hot (14)

M2 l1
M2 l1M2 l1
M2 l1
 
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnnAutomatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
Automatic speech emotion and speaker recognition based on hybrid gmm and ffbnn
 
Application of Estimation Techniques to Sonar Lofargram
Application of Estimation Techniques to Sonar LofargramApplication of Estimation Techniques to Sonar Lofargram
Application of Estimation Techniques to Sonar Lofargram
 
Dcn 3
Dcn 3Dcn 3
Dcn 3
 
Localization of Objects Using Cross-Correlation of Shadow Fading Noise and Co...
Localization of Objects Using Cross-Correlation of Shadow Fading Noise and Co...Localization of Objects Using Cross-Correlation of Shadow Fading Noise and Co...
Localization of Objects Using Cross-Correlation of Shadow Fading Noise and Co...
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
K31074076
K31074076K31074076
K31074076
 
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Speech Enhancement Using Spectral Flatness Measure Based Spectral SubtractionSpeech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
General and emergency communication principles
General and emergency communication principlesGeneral and emergency communication principles
General and emergency communication principles
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Kf2517971799
Kf2517971799Kf2517971799
Kf2517971799
 
Power delay profile,delay spread and doppler spread
Power delay profile,delay spread and doppler spreadPower delay profile,delay spread and doppler spread
Power delay profile,delay spread and doppler spread
 
Indoor Localization in Wireless Sensor Networks
Indoor Localization in Wireless Sensor NetworksIndoor Localization in Wireless Sensor Networks
Indoor Localization in Wireless Sensor Networks
 

Similar to Speech processing and recognition

A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
ijma
 
A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...
ijma
 
The Audio watermarking
The Audio watermarkingThe Audio watermarking
Survey on Different Methods of Digital Audio Watermarking
Survey on Different Methods of Digital Audio WatermarkingSurvey on Different Methods of Digital Audio Watermarking
Survey on Different Methods of Digital Audio Watermarking
IJERA Editor
 
Report on Digital Watermarking Technology
Report on Digital Watermarking Technology Report on Digital Watermarking Technology
Report on Digital Watermarking Technology
vijay rastogi
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarking
prdpgpt
 
Digital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptxDigital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptx
AyushJaiswal781174
 
A Survey On Audio Watermarking Methods
A Survey On Audio Watermarking  MethodsA Survey On Audio Watermarking  Methods
A Survey On Audio Watermarking Methods
IRJET Journal
 
Digital video watermarking scheme using discrete wavelet transform and standa...
Digital video watermarking scheme using discrete wavelet transform and standa...Digital video watermarking scheme using discrete wavelet transform and standa...
Digital video watermarking scheme using discrete wavelet transform and standa...
eSAT Publishing House
 
Data hiding in audio signals ppt
Data hiding in audio signals pptData hiding in audio signals ppt
Data hiding in audio signals ppt
jackkhush
 
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION
ijcsit
 
Audio watermarking
Audio watermarkingAudio watermarking
Audio watermarkingLikan Patra
 
Digital audio watermarking applications
Digital audio watermarking applicationsDigital audio watermarking applications
Digital audio watermarking applicationsIAEME Publication
 
A Robust Watermarking Technique Based On Dwt on Digital Images
A Robust Watermarking Technique Based On Dwt on Digital  ImagesA Robust Watermarking Technique Based On Dwt on Digital  Images
A Robust Watermarking Technique Based On Dwt on Digital Images
IJMER
 
A Survey on Video Watermarking Technologies based on Copyright Protection and...
A Survey on Video Watermarking Technologies based on Copyright Protection and...A Survey on Video Watermarking Technologies based on Copyright Protection and...
A Survey on Video Watermarking Technologies based on Copyright Protection and...
Editor IJCATR
 
50120140506015
5012014050601550120140506015
50120140506015
IAEME Publication
 
[IJET-V1I6P5] Authors: Tawde Priyanka, Londhe Archana, Nazirkar Sandhya, Khat...
[IJET-V1I6P5] Authors: Tawde Priyanka, Londhe Archana, Nazirkar Sandhya, Khat...[IJET-V1I6P5] Authors: Tawde Priyanka, Londhe Archana, Nazirkar Sandhya, Khat...
[IJET-V1I6P5] Authors: Tawde Priyanka, Londhe Archana, Nazirkar Sandhya, Khat...
IJET - International Journal of Engineering and Techniques
 

Similar to Speech processing and recognition (20)

A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
A Robust Audio Watermarking in Cepstrum Domain Composed of Sample's Relation ...
 
A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...A robust audio watermarking in cepstrum domain composed of sample's relation ...
A robust audio watermarking in cepstrum domain composed of sample's relation ...
 
The Audio watermarking
The Audio watermarkingThe Audio watermarking
The Audio watermarking
 
Survey on Different Methods of Digital Audio Watermarking
Survey on Different Methods of Digital Audio WatermarkingSurvey on Different Methods of Digital Audio Watermarking
Survey on Different Methods of Digital Audio Watermarking
 
Report on Digital Watermarking Technology
Report on Digital Watermarking Technology Report on Digital Watermarking Technology
Report on Digital Watermarking Technology
 
Digital watermarking
Digital watermarkingDigital watermarking
Digital watermarking
 
Digital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptxDigital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptx
 
A Survey On Audio Watermarking Methods
A Survey On Audio Watermarking  MethodsA Survey On Audio Watermarking  Methods
A Survey On Audio Watermarking Methods
 
Digital video watermarking scheme using discrete wavelet transform and standa...
Digital video watermarking scheme using discrete wavelet transform and standa...Digital video watermarking scheme using discrete wavelet transform and standa...
Digital video watermarking scheme using discrete wavelet transform and standa...
 
Data hiding in audio signals ppt
Data hiding in audio signals pptData hiding in audio signals ppt
Data hiding in audio signals ppt
 
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION
ADAPTIVE WATERMARKING TECHNIQUE FOR SPEECH SIGNAL AUTHENTICATION
 
Audio watermarking
Audio watermarkingAudio watermarking
Audio watermarking
 
Digital audio watermarking applications
Digital audio watermarking applicationsDigital audio watermarking applications
Digital audio watermarking applications
 
A Robust Watermarking Technique Based On Dwt on Digital Images
A Robust Watermarking Technique Based On Dwt on Digital  ImagesA Robust Watermarking Technique Based On Dwt on Digital  Images
A Robust Watermarking Technique Based On Dwt on Digital Images
 
watermarking
watermarkingwatermarking
watermarking
 
A Survey on Video Watermarking Technologies based on Copyright Protection and...
A Survey on Video Watermarking Technologies based on Copyright Protection and...A Survey on Video Watermarking Technologies based on Copyright Protection and...
A Survey on Video Watermarking Technologies based on Copyright Protection and...
 
50120140506015
5012014050601550120140506015
50120140506015
 
[IJET-V1I6P5] Authors: Tawde Priyanka, Londhe Archana, Nazirkar Sandhya, Khat...
[IJET-V1I6P5] Authors: Tawde Priyanka, Londhe Archana, Nazirkar Sandhya, Khat...[IJET-V1I6P5] Authors: Tawde Priyanka, Londhe Archana, Nazirkar Sandhya, Khat...
[IJET-V1I6P5] Authors: Tawde Priyanka, Londhe Archana, Nazirkar Sandhya, Khat...
 
1709 1715
1709 17151709 1715
1709 1715
 
1709 1715
1709 17151709 1715
1709 1715
 

Recently uploaded

MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 

Recently uploaded (20)

MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 

Speech processing and recognition

  • 1. HOME ASSIGNMENT SUBMITTED BY: MANVI PRIYA BE/10007/14 CHIRAG JAIN BE/10038/14 ANKITA SINGH BE/10136/14 MEC2163 - SPEECH PROCESSING AND RECOGNITION
  • 3. INTRODUCTION • Watermarking is the technique and art of hiding additional data (such as watermarked bits, logo and text message) in the host signal which includes image, video, audio, speech, text, without any perceptibility of the existence of additional information. The additional information which is embedded in the host signal should be extractable and must resist various intentional and unintentional attacks. Digital speech watermarking process is depicted in Fig.
  • 4. TYPES OF DIGITAL SPEECH WATERMARKING • There are two main types of digital speech watermarking in terms of robustness: 1. Robust digital speech watermarking in which embedded and additional information must resist channel attacks. 2. Fragile digital speech watermarking in which additional information must be destroyed if any attack or transformation takes place like for paper watermarks in bank notes. • In terms of source and extraction module for digital speech watermarking are found three main categories: 1. Blind speech watermarking which does not need any extra information such as original signal, logo or watermarked bits. 2. Semi-blind speech watermarking which may need extra information for the extraction phase like access to the published watermarked signal that is the original signal after just adding the watermark. 3. Non-blind speech watermarking which needs the original signal and the watermarked signal.
  • 5. APPLICATIONS OF DIGITAL SPEECH WATERMARKING • Different applications of digital speech watermarking are known: 1. Copy control: Cryptography algorithms are very slow and the cracker may use software, e.g. DeCSS or reverse engineering techniques to decrypt a valid key. However, watermarking can be combined with certain content for the recording device to refuse to copy so that the watermarked bits are detectable easily. 2. Device control is in the border category and copy control is one of its applications. For example, Digimarc’s MediaBridge interacts with a TV program by using action toys. Skipping advertisements can be done automatically by turning functions on and off. 3. Owner identification: According to American laws, when the owner’s right is misused, the system can restrict the owner’s material. Even the copyright is not considered. This application considering helps to protect the holder’s right without considering the copyright in the distributed copies. 4. Proof of ownership: Creating a central repository for every copyright is too costly when textual copyright is needed. Watermarking can be used as alternative to proof of ownership. In case of authentication, fragile watermark is used by embedding the watermark in the original data. If the impostor manipulates the content, then the watermark will be altered. As a consequence, the media will not be taken as genuine. Another watermarking application is using fingerprints to enable the holder to detect and investigate the source of the authorized version by restricting the unauthorized users. Other applications of watermarking are broadcast monitoring, copy prevention, access control and transaction tracking.
  • 6. WATERMARK DESIGN • Each audio signal is watermarked with a unique codeword. • Our watermarking scheme is based on a repeated application of a basic watermarking operation on processed versions of the audio signal. • The basic method uses three steps to watermark an audio segment as shown in Fig. • The complete watermarking scheme is shown in Fig Below we provide a detailed explanation of the basic watermarking step and the complete watermarking technique.
  • 7.
  • 8. METHODS OF DIGITAL WATERMARKING Different methods are used for digital speech watermarking. Figure presents an overview of these methods.
  • 9. AUDITORY MASKING • Auditory masking in general is defined by the American standards agency as ‘the process by which the threshold of audibility for one sound is raised by the presence of another sound’ and ‘the amount by which the threshold of audibility of sound is raised by the presence of another sound’.
  • 10. TEMPORAL MASKING • Temporal masking consists of pre-masking and post-masking. With a stronger masker, the weaker maskee region becomes inaudible from 50 to 200 ms after the masker. In pre-masking, the weaker signal becomes inaudible before the stronger masker from about 5 to 20 ms before the masker. The pre masking effect is much harder to detect compared to the post- masking effect. The temporal masking can be detected by using time domain.
  • 11. WATERMARK EMBEDDING: • By applying the masking effect in the frequency and temporal domain, the watermark which is a noise-like sequence, is shaped. 1. First, the speech signal is segmented into a block with a predefined size. 2. Second, the power spectrum of this block is calculated by FFT or DWT. Third, the frequency masking of this block is computed. 3. Fourth, the masking weights for shaping the watermark bits (noise-like sequence) are applied. 4. Fifth, the inverse power spectrum which is Inverse FFT or Inverse DWT is computed. 5. Sixth, the temporal masking for shaping the noise-like sequence is calculated. Seventh, the temporal and frequency domain for embedding the watermark into the speech signal are used. The process is shown in Fig.
  • 12. Fig. Using the temporal masking model guarantees that the watermark cannot be heard. Applying the frequency domain itself may not be enough, for example, when a fixed window of Fourier transform is not provided with a suitable time resolution. In some cases, when FFT is applied on the watermark, it can spear over whole blocks. When the block’s energy is not enough and shorter than the block size under analysis, then the watermark is masked inside the subinterval. This situation causes distortion.
  • 13. WATERMARK EXTRACTION • The watermark bits must be detectable even if the speech signal has been under various signal processing attacks. Although the watermark is trading as noise for speech, the attacker still attempts to destroy it blindly. For example, N is the number of recovered speech sample and the extraction algorithm has the proper location of the received speech signal, the samples may still contain watermark bits or may not. It can be assumed that r(i)=s(i)+d(i) where d(i) is a contaminant which is noise alone or watermark and noise. The watermark bits are extracted by hypothesis testing as in the following Eq. (1) where n(i) is noise and w′(i) is modified watermark.
  • 14. • In another paper (Swanson et al. 1998) a similar measurement is used to evaluate the robustness of the algorithm by calculating the original watermark w(i) and extracted watermark w′(i) as in the following Eq. • The watermark is compared to the threshold to evaluate the system’s robustness. However sometimes cases, the extraction system may not find the exact location of the watermark bits as in r(i)=s(i+τ)+d(i), 0≤i≤N−1, where all the parameters are like before, just τ is the delay corresponding to time shifting through the samples. • In this case, for the evaluation of robustness, a generalized likelihood ratio test (Swanson et al. 1998) is done to determine whether the received speech may or may not have the watermark as in following Eq. (The watermark is again compared to a threshold. The higher ratio means that the watermark is present. The generalized likelihood ratio test is performed, if the speech signal is also suspected under time scaling attack.
  • 15. Perceptual distance between watermarked and original speech • There are many methods available for calculating the perceptual distance, the more common is Lp-norm. By increasing the p, high energy regions are given more weight for better measurement. Applying L1 norm is shown in the following Eq. where c2 is the additional calibration constant to improve the sensitivity of this model. Equation (5) can lead to analytical expression for masking threshold as seen in Eq. (6). The majority of quantization noise speech watermarking assumes that X and ε are uncorrelated E(Xε)=0. Equation (6) is shown as follows:
  • 16. Another assumption is related to a masking situation when just negligible errors may corrupt the clean speech signal.
  • 17. QUESTIONS: 1. What is the use of dynamic time warping? 2. What are the merits and demerits of silence part of the speech signal? 3. Consider an HMM representation of a coin tossing experiment. Assume a three state model corresponding to three different coins with probabilities State 1 State 2 State 3 P(heads) 0.50 0.25 0.25 P(tails) 0.50 0.75 0.75 And with all state transition probabilities equal to 1/3 (assume initial state probabilities of 1/3) Sequence O = {HHHHTHTTTT} What state sequence is most likely? What is the probability of the observation sequence and this most likely state sequence?
  • 18. QUESTIONS(contd.) 4. Consider the observation sequence O’= {HTTHTHHTTH}. How would your answer to the question change? 5. Differentiate between LPC and LPCC. How LPCC is superior for speech recognition?