This power-point presentation contains 45 slides. It describes SR system (a brief intro), what are the applications, the biological architecture of human speech recognition vs machine architecture, recognition process, flow summery of recognition process and the approaches to the SRS. All this is described in the first few slides (the first part, let's say), after that, this presentation describes the evolution process of SRS through the decades (the middle part), and at the last this presentation describes the machine learning approach in SRS. How neural net enhance the efficiency of a SRS.
Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation.
This power-point presentation contains 45 slides. It describes SR system (a brief intro), what are the applications, the biological architecture of human speech recognition vs machine architecture, recognition process, flow summery of recognition process and the approaches to the SRS. All this is described in the first few slides (the first part, let's say), after that, this presentation describes the evolution process of SRS through the decades (the middle part), and at the last this presentation describes the machine learning approach in SRS. How neural net enhance the efficiency of a SRS.
Linear Predictive Coding (LPC) is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate. It provides extremely accurate estimates of speech parameters, and is relatively efficient for computation.
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
The task of speaker identification is to determine the identity of a speaker by machine. To recognize the voice, the voices must be familiar in the case of human beings as well as machines.
The objective of speaker identification is to determine the identity of a speaker by machine on the basis of his/her voice. No identity is claimed by the user.
GitHub Link:https://github.com/TrilokiDA/Speaker-Identification-from-Voice
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
In this paper, a system, developed for speech encoding, analysis, synthesis and gender identification is
presented. A typical gender recognition system can be divided into front-end system and back-end system.
The task of the front-end system is to extract the gender related information from a speech signal and
represents it by a set of vectors called feature. Features like power spectrum density, frequency at
maximum power carry speaker information. The feature is extracted using First Fourier Transform (FFT)
algorithm. The task of the back-end system (also called classifier) is to create a gender model to recognize
the gender from his/her speech signal in recognition phase. This paper also presents the digital processing
of a speech signals (pronounced “A” and “B”) which are taken from 10 persons, 5 of them are Male and
the rest of them are Female. Power Spectrum Estimation of the signal is examined .The frequency at
maximum power of the English Phonemes is extracted from the estimated power spectrum. The system uses
threshold technique as identification tool. The recognition accuracy of this system is 80% on average.
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
Speech is the most natural way of information exchange. It provides an efficient means of means of manmachine communication using speech interfacing. Speech interfacing involves speech synthesis and speech recognition. Speech recognition allows a computer to identify the words that a person speaks to a microphone or telephone. The two main components, normally used in speech recognition, are signal processing component at front-end and pattern matching component at back-end. In this paper, a setup that uses Mel frequency cepstral coefficients at front-end and artificial neural networks at back-end has been developed to perform the experiments for analyzing the speech recognition performance. Various experiments have been performed by varying the number of layers and type of network transfer function, which helps in deciding the network architecture to be used for acoustic modelling at back end.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
2. • Speech processing is the study
of speech signals and the processing methods of
these signals. The signals are usually processed
in a digital representation, so speech processing
can be regarded as a special case of digital signal
processing, applied to speech signal. Aspects of
speech processing includes the acquisition,
manipulation, storage, transfer and output
of speech signals.
• Speech processing is generally can be divided as:
• 1-recognition (will be discussed here).
• 2-synthesis (will not be discussed here).
3. Disciplines related to speech
Processing
•1. Signal Processing
•The process of extracting information from speech in efficient
manner
•2. Physics
•The science of understanding the relationship between speech
signal and physiological mechanisms
•3. Pattern recognition
•the set of algorithms to create patterns and match data to them
according to the degree of likeliness
4. Computer Science
To make efficient algorithms for implementing in HW or SW the
methods of speech recognitions system
5. Linguistics
The relationship between sounds , words in a language , the meaning of
those words and the overall meaning of sentences
8. Pre-processing
•We can treat (pre-process) speech signal after it has been
received by as an analog signal with three general ways:
•1-In time domain (Speech Wave)
2-In frequency domain(Spectral Envelope)
3-Combination (Spectrogram)
freq..
Energy
1KHz 2KHz
9. Time domain
Speech is captured by a microphone , e.g.
sampled periodically ( 16KHz) by an analogue-to-digital converter (ADC)
Each sample converted is a 16-bit data.
If sampling is too slow, sampling may fail (Nyquist Theorem)
10. • A sound is sampled at 22-KHz and resolution is 16 bit.
How many bytes are needed to store the sound wave
for 10 seconds?
• Answer:
• One second has 22K samples , so for 10 seconds: 22K
x 2bytes x 10 seconds =440K bytes
• *note: 2 bytes are used because 16-bit = 2 bytes
11. Time framing
• Since our ear cannot response to very fast change of speech
data content, we normally cut the speech data into frames
before analysis. (similar to watch fast changing still pictures to
perceive motion )
• Frame size is 10-30ms
• Frames can be overlapped, normally the overlapping region
ranges from 0 to 75% of the frame size .
12. Time framing : Continued…
For a 22-KHz/16 bit sampling speech wave, frame size is 15 ms and frame
overlapping period is 40 % of the frame size. Draw the frame block diagram.
Answer:
Number of samples in one frame (N)= 15 ms / (1/22k)=330
Overlapping samples = 132, m=N-132=198.
x=Overlapping time = 132 * (1/22k)=6ms;
Time in one frame= 330* (1/22k)=15ms.
i=1 (first window), length
=N
m
N
i=2 (second window)
n
s
n
time
x
13. The frequency domain
•Use DFT or FFT to transform the wave from time domain to
frequency domain (i.e. to spectral envelope).
complexisso,
numberscomplex12are
which...afterdomian)(FrequecnyOutput
samples)Ntotal(...domain)(timeInput
1),sin()cos(and,
2
,...,3,2,1,0,
numbers)(realnumbers)(complex
,2/,2,1,0
,1,2,1,01,..2,1,0
1
0
2
1..,2,1,02/.,1,0
m
j
mm
N
NNk
j
N
k
N
km
j
km
NkNm
XeXX
)(N/
XXXXFT
SSSSS
jje
N
meSX
}SFT {X
m
|Xm|= (real2+imginary2)^0.5
15. The spectrogram:to see the spectral
envelopeas time movesforward
Specgram: The white
bands are the formants
which represent high
energy frequency
contents of the speech
signal
18. (A) Filtering
• Ways to find the spectral envelope
• Filter banks: uniform
• Filter banks can also be non-uniform
• LPC and Cepstral LPC parameters
filter1
output
filter2
output
filter3
output
Spectral
envelop
energy
19. Spectral envelope SEar=“ar”
Speech recognition idea using 4 linear
filters, each bandwidth is 2.5KHz
• Two sounds with two Spectral Envelopes SEar,SEei ,E.g. Spectral
Envelop (SE) “ar”, Spectral envelop “ei”
energy
energy
Freq.
Freq.
Spectrum A Spectrum B
filter 1 2 3 4
filter 1 2 3 4
v1 v2 v3 v4 w1 w2 w3 w4
Spectral envelope SEei=“ei”
Filter
out
Filter
out
10KHz10KHz
0 0
20. Difference between two sounds (or
spectral envelopes SE SE’)
• Difference between two sounds, E.g.
• SEar={v1,v2,v3,v4}=“ar”,
• SEei={w1,w2,w3,w4}=“ei”
• A simple measure of the difference is
• Dist =sqrt(|v1-w1|2+|v2-w2|2+|v3-w3|2+|v4-w4|2)
• Where |x|=magnitude of x
21. (B) Linear Predictive coding LPC
•The concept is to find a set of parameters ie. 1, 2, 3, 4,.. p=8 to represent the
same waveform (typical values of p=8->13)
1, 2, 3, 4,.. 8
Each time frame y=512 samples
(S0,S1,S2,. Sn,SN-1=511)
512 integer numbers (16-bit each)
Each set has 8 floating point
numbers (data compressed)
’1, ’2, ’3, ’4,.. ’8
’’1, ’’2, ’’3, ’’4,..
’’8
:
Can reconstruct the waveform from
these LPC codes
Time frame y
Time frame y+1
Time frame y+2
Input waveform
30ms
30ms
30ms
For example
23. example
• A speech waveform S has the values
s0,s1,s2,s3,s4,s5,s6,s7,s8= [1,3,2,1,4,1,2,4,3]. The frame
size is 4.
• Find auto-correlation parameter r0, r1, r2 for the first frame.
• If we use LPC order 2 for our feature extraction system, find
LPC coefficients a1, a2.
25. (C) Cepstrum
A new word by reversing the first 4 letters of spectrum
cepstrum.
It is the spectrum of a spectrum of a signal.
26. Glottis and cepstrum
Speechwave(S)= Excitation(E) . Filter (H)
•
(H)
(Vocal
tract filter)Output
So voice has a
strong glottis
Excitation
Frequency
content
In Ceptsrum
We can easily
identify and
remove the
glottal excitation
Glottal excitation
From
Vocal cords
(Glottis)
(E)
(S)
27. Cepstral analysis
• Signal(s)=convolution(*) of
• glottal excitation (e) and vocal_tract_filter (h)
• s(n)=e(n)*h(n), n is time index
• After Fourier transform FT: FT{s(n)}=FT{e(n)*h(n)}
• Convolution(*) becomes multiplication (.)
• n(time) w(frequency),
• S(w) = E(w).H(w)
• Find Magnitude of the spectrum
• |S(w)| = |E(w)|.|H(w)|
• log10 |S(w)|= log10{|E(w)|}+ log10{|H(w)|}
Ref: http://iitg.vlab.co.in/?sub=59&brch=164&sim=615&cnt=1
28. Cepstrum
• C(n)=IDFT[log10 |S(w)|]=
• IDFT[ log10{|E(w)|} + log10{|H(w)|} ]
• In c(n), you can see E(n) and H(n) at two different positions
• Application: useful for (i) glottal excitation (ii) vocal tract filter
analysis
windowing DFT Log|x(w)| IDFT
X(n) X(w) Log|x(w)|
N=time index
w=frequency
I-DFT=Inverse-discrete Fourier transform
S(n) C(n)
29. •
Glottal excitation cepstrum
Vocal track
cepstrum
s(n) time
domain signal
x(n)=windowed(s(n))
Suppress two sides
|x(w)|=
Log (|x(w)|)
C(n)=
iDft(Log (|x(w)|))
gives Cepstrum
30. Liftering (to remove glottal
excitation)
• Low time liftering:
• Magnify (or Inspect) the
low time to find the
vocal tract filter
cepstrum
• High time liftering:
• Magnify (or Inspect) the
high time to find the
glottal excitation
cepstrum (remove this
part for speech
recognition.
Glottal excitation
Cepstrum, useless for
speech recognition,
Frequency =FS/ quefrency
FS=sample frequency
=22050
Vocal tract
Cepstrum
Used for
Speech
recognitio
n
Cut-off Found
by experiment
31. Reasons for liftering
Cepstrum of speech
• Why we need this?
• Answer: remove the ripples
• of the spectrum caused by
• glottal excitation.
Input speech signal x
Spectrum of x
Too many ripples in the
spectrum caused by vocal
cord vibrations (glottal
excitation).
But we are more interested in
the speech envelope for
recognition and reproduction
Fourier
Transform
http://isdl.ee.washington.edu/people/stevenschimmel/sphsc503/files/notes10.pdf
33. Speech Recognition
• speech recognition (SR) is the translation of
spoken words into text. It is also known as
"automatic speech recognition" (ASR),
"computer speech recognition", or just "speech
to text" (STT).
34. speech recognition procedure
We will inplement all the methods we have taken to connect all
the dots and clarify the recognition system , note that only step
(4) is regarded to recognition process and the other points are
connected to the other parts we have taken before.
Steps
1. End-point detection
2. (2a) Frame blocking and (2b) Windowing
3. Feature extraction
Find cepstral cofficients by LPC
1. Auto-correlation analysis
2. LPC analysis,
3. Find Cepstral coefficients,
4. Distortion measure calculations
35. Step1: Get one frame and
execute end point detection
• To determine the start and end points of the speech
sound
• It is not always easy since the energy of the starting
energy is always low.
• Determined by energy & zero crossing rate
recorded
end-point
detected
n
s(n)
In our example it
is about 1 second
36. Step2(a):Frame blockingand Windowing
• To choose the frame size (N samples )and adjacent frames
separated by m samples.
• I.e.. a 16KHz sampling signal, a 10ms window has N=160
samples, m=40 samples.
m
N
N
n
sn
l=2 window, length = N
l=1 window, length = N
37. Step2(b): Windowing
•To smooth out the discontinuities at the beginning and end.
•Hamming or Hanning windows can be used.
•Hamming window
•Tutorial: write a program segment to find the result of passing
a speech frame, stored in an array int s[1000], into the
Hamming window.
10
1
2
cos46.054.0)()()(
~
Nn
N
n
nWnSnS
39. Step3.1: Auto-correlation analysis
• Auto-correlation of every frame (l =1,2,..)of a
windowed signal is calculated.
• If the required output is p-th ordered LPC
• Auto-correlation for the l-th frame is
pm
mnSSmr l
mN
n
ll
,..,1,0
)(
~~
)(
1
0
41. Step3.3: LPC to Cepstral
coefficients conversion
• Cepstral coefficient is more accurate in describing the
characteristics of speech signal
• Normally cepstral coefficients of order 1<=m<=p are
enough to describe the speech signal.
• Calculate c1, c2, c3,.. cp from LPC a1, a2, a3,.. ap
)neededif(,
1,
1
1
1
00
pmac
m
k
c
pmac
m
k
ac
rc
kmk
m
pmk
m
kmk
m
k
mm
42. Step(4) Matching method:
Dynamic programming DP
• Correlation is a simply method for pattern
matching BUT:
• The most difficult problem in speech
recognition is time alignment. No two
speech sounds are exactly the same even
produced by the same person.
• Align the speech features by an elastic
matching method -- DP.
43. (B) Dynamic programming algorithm
• Step 1: calculate the distortion matrix dist( )
• Step 2: calculate the accumulated matrix
• by using
D( i, j)D( i-1, j)
D( i, j-1)D( i-1, j-1)
1,(
),,1(
),1,1(
min),(),(
jiD
jiD
jiD
jidistjiD
44. To findthe optimalpathin the accumulatedmatrix
(and the minimumaccumulateddistortion/distance)
• Starting from the top row and right most column, find
the lowest cost D (i,j)t : it is found to be the cell at
(i,j)=(3,5), D(3,5)=7 in the top row. *(this cost is called
the “minimum accumulated distance” , or “minimum
accumulated distortion”)
• From the lowest cost position p(i,j)t, find the next
position (i,j)t-1 =argument_min_i,j{D(i-1,j), D(i-1,j-1),
D(i,j-1)}.
• E.g. p(i,j)t-1 =argument_mini,j{11,5,12)} = 5 is selected.
• Repeat above until the path reaches the left most
column or the lowest row.
• Note: argument_min_i,j{cell1, cell2, cell3} means the
argument i,j of the cell with the lowest value is selected.
45. Optimal path
• It should be from any element in the top row or right
most column to any element in the bottom row or left
most column.
• The reason is noise may be corrupting elements at the
beginning or the end of the input sequence.
• However, in fact, in actual processing the path should be
restrained near the 45 degree diagonal (from bottom left
to top right), see the attached diagram, the path cannot
passes the restricted regions. The user can set this
regions manually. That is a way to prohibit
unrecognizable matches. See next page.
47. Example: for DP
• The Cepstrum codes of the speech sounds of ‘YES’and ‘NO’
and an unknown ‘input’ are shown. Is the ‘input’ = ‘Yes’ or
‘NO’?
YES' 2 4 6 9 3 4 5 8 1
NO' 7 6 2 4 7 6 10 4 5
Input 3 5 5 8 4 2 3 7 2
2
')( xxdistdistortion
48. • Answer
• Starting from the top row and
right most column, find the
lowest cost D (i,j)t : it is found
to be the cell at (i,j)=(9,9),
D(9,9)=13.
• From the lowest cost position
(i,j)t, find the next position
(i,j)t-1
• =argument_mini,j{D(i-1,j), D(i-
1,j-1), D(i,j-1)}. E.g. position
(i,j)t-1
=argument_mini,j{48,12,47)}
=(9-1,9-1)=(8,8) that contains
“12” is selected.
• Repeat above until the path
reaches the right most column
or the lowest row.
• Note: argument_min_i,j{cell1,
cell2, cell3} means the
argument i,j of the cell with the
lowest value is selected.
•