7. Creating Noise Profile
BUILD NOISE PROFILE
Update the computed Noise Profile
AVERAGE OVER TIME
1
π
[ππ’π ππ πΉπΉπ πΆππππππππ‘ > 10 ππππππ ]
FOURIER TRANSFORM
FFT of 32ms Audio Samples
8. Spectral Subtraction
INVERSE FOURIER TRANSFORM
Rebuild the Signal
SUBTRACT NOISE PROFILE (STATIC AND MUSICAL)
Over Subtraction Short Segment Removal
FOURIER TRANSFORM OF SIGNAL
FFT of 32ms Audio Samples
12. Voice Activity Detection Process I
CALCULATE THE TRIGGER
π‘ π€ = π + πΌπΏ π€
COMPUTE mean AND variance
SAMPLE
10 Frame Sampling
13. Voice Activity Detection Process II
CLASSIFY
If greater than trigger then voice
COMPUTE CLASSIFICATION MEASURE
READ THE SAMPLE
Read the frame
ππ 1 π = ππ 1(π) 1 β π π 1 π ππ
16. Feature Extraction Process I
APPLY MEL FILTERBANK
Multiply Filterbank(20-40) by Periodogram Estimate
CALCULATE PERIODOGRAM ESTIMATE
ππ π =
1
π
| ππ(π)|2
FRAMING
Divide Audio into Sections of 20ms-40ms
17. Feature Extraction Process II
KEEP REQUIRED COEFFICIENTS
Keep Required Number of Coefficients
DISCRETE COSINE TRANSFORM OF ENERGIES
Take DCT of Coefficients of Above Step
SCALING
Take Logarithm of Filterbank Energies
20. Language Model Training
UPDATE DICTIONARY
Add to existing Dictionary
CREATE LANGUAGE MODELS
Derive NGram Models
FETCH RAW DATA
Currently From News Websites
21. Language Model Based Classification
SELECT BEST
π ππ ππβ1 = π1 π ππ ππβ1 + π2 π(ππ)
GET POSSIBLE CANDIDATES
From Acoustic Model
READ PREVIOUS WORD
24. Training the Acoustic Model
TRAIN USING BAUM WELCH ALGORITHM
SELECT HMM MODEL
READ MFCC COEFFICIENTS AND WORD
25. Using the Acoustic Model
OUTPUT WORD CORRESPONDING TO MODEL
SELECT MODEL WITH MAXIMUM PROBABILITY
FIND LOG PROBABILITY OF WORD FOR EACH MODEL
READ MFCC COEFFICIENTS OF WORD
27. Trained vs. Untrained Input
β’ 3 Speakers
β’ 5X10 Words Each
β’ 5 Testing Set Each
86.67
66.67
0
10
20
30
40
50
60
70
80
90
100
Accuracy of System
Using Trained and Untrained Input
Trained Set Untrained Set
28. Noise Reduced vs. Not Noise Reduced
β’ 3 Speakers
β’ 5X10 Words Each
β’ Untrained Input Files for Testing
β’ 5 Testing Set Each
46.67
66.67
0
10
20
30
40
50
60
70
80
Accuracy of System
Effect of Noise Reduction
Noise Not Reduced Noise Reduced
29. Gender Based Results
β’ 7 Speakers
β’ 3 Females, 4 Males
β’ Animal Names as Test
β’ Untrained Input Files for Testing
36
64
59
66
44
56
51
54
58
0
10
20
30
40
50
60
70
Female Voice Training Male Voice Training Female and Male Voice
Training
Gender Based Result
Male Female Male + Female