Nepali Speech Recognition

•Download as PPTX, PDF•

1 like•747 views

Aavaas Gajurel

Final year project presentation

Software

Supervisor
Dr. Basanta Joshi
Aavaas Gajurel (068/BCT/501)
Anup Pokhrel (068/BCT/505)
Manish K. Sharma (068/BCT/523)

System Block Diagram - Training
Noise
Reduction
Split Module
(VAD Based) Training Set
MFCC
Features Train HMM

System Block Diagram - Recognition
Audio Input
Noise
Reduction
Split Module
(VAD Based)
MFCC
Computation
HMM
Audio
ClassifierLanguage Model Output

After Spectral Subtraction After Musical Noise Removal
Before Noise Removal Spectral Subtraction output

Voice Activity Detection Process I
CALCULATE THE TRIGGER
𝑡 𝑤 = 𝜇 + 𝛼𝛿 𝑤
COMPUTE mean AND variance
SAMPLE
10 Frame Sampling

Voice Activity Detection Process II
CLASSIFY
If greater than trigger then voice
COMPUTE CLASSIFICATION MEASURE
READ THE SAMPLE
Read the frame
𝑊𝑠1 𝑚 = 𝑃𝑠1(𝑚) 1 − 𝑍 𝑠1 𝑚 𝑆𝑐

Feature Extraction Process I
APPLY MEL FILTERBANK
Multiply Filterbank(20-40) by Periodogram Estimate
CALCULATE PERIODOGRAM ESTIMATE
𝑃𝑖 𝑘 =
1
𝑁
| 𝑆𝑖(𝑘)|2
FRAMING
Divide Audio into Sections of 20ms-40ms

Feature Extraction Process II
KEEP REQUIRED COEFFICIENTS
Keep Required Number of Coefficients
DISCRETE COSINE TRANSFORM OF ENERGIES
Take DCT of Coefficients of Above Step
SCALING
Take Logarithm of Filterbank Energies

Language Model Training
UPDATE DICTIONARY
Add to existing Dictionary
CREATE LANGUAGE MODELS
Derive NGram Models
FETCH RAW DATA
Currently From News Websites

Language Model Based Classification
SELECT BEST
𝑃 𝑊𝑖 𝑊𝑖−1 = 𝜆1 𝑃 𝑊𝑛 𝑊𝑛−1 + 𝜆2 𝑃(𝑊𝑛)
GET POSSIBLE CANDIDATES
From Acoustic Model
READ PREVIOUS WORD

Training the Acoustic Model
TRAIN USING BAUM WELCH ALGORITHM
SELECT HMM MODEL
READ MFCC COEFFICIENTS AND WORD

Using the Acoustic Model
OUTPUT WORD CORRESPONDING TO MODEL
SELECT MODEL WITH MAXIMUM PROBABILITY
FIND LOG PROBABILITY OF WORD FOR EACH MODEL
READ MFCC COEFFICIENTS OF WORD

Trained vs. Untrained Input
• 3 Speakers
• 5X10 Words Each
• 5 Testing Set Each
86.67
66.67
0
10
20
30
40
50
60
70
80
90
100
Accuracy of System
Using Trained and Untrained Input
Trained Set Untrained Set

Noise Reduced vs. Not Noise Reduced
• 3 Speakers
• 5X10 Words Each
• Untrained Input Files for Testing
• 5 Testing Set Each
46.67
66.67
0
10
20
30
40
50
60
70
80
Accuracy of System
Effect of Noise Reduction
Noise Not Reduced Noise Reduced

Gender Based Results
• 7 Speakers
• 3 Females, 4 Males
• Animal Names as Test
• Untrained Input Files for Testing
36
64
59
66
44
56
51
54
58
0
10
20
30
40
50
60
70
Female Voice Training Male Voice Training Female and Male Voice
Training
Gender Based Result
Male Female Male + Female

Limitations
Limited Vocabulary
User Specific Noise Profiles
Static MFCC Coefficients Only
Training Data Storage Absent
Non-Continuous Recognition

Recommendations
Using Dynamic Coefficients
Continuous HMM Model
Extensive Training
Better Phonemic Modeling
Dynamic Noise Modeling

Usage Scenario I
Easy Nepali Input
Automated Telecom Assistance
Speech Controlled Interface
Automated Transcribing

Usage Scenario II
Military Sector for Automated Wire Tapping
Public Guidance System
Automated User Support (banks, corporate houses,etc.)

Similar to Nepali Speech Recognition

Mp3Shirley Aranjo

Icmmse slidesManoj Shukla

Implementation Adaptive Noise Canceler Akshatha suresh

A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...IOSR Journals

Acoustic echo cancellation using nlms adaptive algorithm ranbeerRanbeer Tyagi

Active Noise CancellationIJERA Editor

By www.fastrackengine.com Performance Enhancement of Automotive Silencer usin...Fastrack Engine

By www.fastrackengine.com- Performance Enhancement of Automotive Silencer usi...Fastrack Engine

Active Noise CancellationJibranMughal

design of cabin noise cancellationmohamud mire

Track 1 session 3 - st dev con 2016 - smart home and buildingST_World

Ph.D defence (Shinnosuke Takamichi)Shinnosuke Takamichi

Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...CSCJournals

ppt hariharikumar pappuri

N017428692IOSR Journals

Automatic speech recognitionRichie

Speech Analysis and synthesis using VocoderIJTET Journal

Emotion Recognition.pptxtazim68

Text independent speaker recognition systemDeepesh Lekhak

Similar to Nepali Speech Recognition (20)

Mp3

Icmmse slides

Implementation Adaptive Noise Canceler

A New Speech Enhancement Technique to Reduce Residual Noise Using Perceptual ...

Acoustic echo cancellation using nlms adaptive algorithm ranbeer

Active Noise Cancellation

By www.fastrackengine.com Performance Enhancement of Automotive Silencer usin...

By www.fastrackengine.com- Performance Enhancement of Automotive Silencer usi...

Active Noise Cancellation

design of cabin noise cancellation

Track 1 session 3 - st dev con 2016 - smart home and building

Ph.D defence (Shinnosuke Takamichi)

Performance Evaluation of Adaptive Filters Structures for Acoustic Echo Cance...

ppt hari

N017428692

Automatic speech recognition

Speech Analysis and synthesis using Vocoder

Emotion Recognition.pptx

Text independent speaker recognition system

Recently uploaded

Software Quality Assurance Interview QuestionsArshad QA

DNT_Corporate presentation know about usDynamic Netsoft

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Test Automation Strategy for Frontend and BackendArshad QA

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110

why an Opensea Clone Script might be your perfect match.pdfjoe51371421

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Right Money Management App For Your Financial GoalsJhone kinadey

Professional Resume Template for Software DevelopersVinodh Ram

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

Exploring iOS App Development: Simplifying the ProcessEvangelist Apps https://twitter.com/EvangelistSW/

Recently uploaded (20)

Software Quality Assurance Interview Questions

DNT_Corporate presentation know about us

Cloud Management Software Platforms: OpenStack

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Test Automation Strategy for Frontend and Backend

Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...

why an Opensea Clone Script might be your perfect match.pdf

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

How To Troubleshoot Collaboration Apps for the Modern Connected Worker

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

Right Money Management App For Your Financial Goals

Professional Resume Template for Software Developers

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

HR Software Buyers Guide in 2024 - HRSoftware.com

Unlocking the Future of AI Agents with Large Language Models

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

Exploring iOS App Development: Simplifying the Process

Nepali Speech Recognition

1. Supervisor Dr. Basanta Joshi Aavaas Gajurel (068/BCT/501) Anup Pokhrel (068/BCT/505) Manish K. Sharma (068/BCT/523)

2. System Overview

3. System Block Diagram - Training Noise Reduction Split Module (VAD Based) Training Set MFCC Features Train HMM

4. System Block Diagram - Recognition Audio Input Noise Reduction Split Module (VAD Based) MFCC Computation HMM Audio ClassifierLanguage Model Output

5. SYSTEM DESIGN METHODOLOGY

6. NOISE REDUCTION

7. Creating Noise Profile BUILD NOISE PROFILE Update the computed Noise Profile AVERAGE OVER TIME 1 𝑁 [𝑆𝑢𝑚 𝑜𝑓 𝐹𝐹𝑇 𝐶𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 > 10 𝑓𝑟𝑎𝑚𝑒𝑠] FOURIER TRANSFORM FFT of 32ms Audio Samples

8. Spectral Subtraction INVERSE FOURIER TRANSFORM Rebuild the Signal SUBTRACT NOISE PROFILE (STATIC AND MUSICAL) Over Subtraction Short Segment Removal FOURIER TRANSFORM OF SIGNAL FFT of 32ms Audio Samples

9. After Spectral Subtraction After Musical Noise Removal Before Noise Removal Spectral Subtraction output

10. VOICE ACTIVITY DETECTION

11. Voice Activity Detection

12. Voice Activity Detection Process I CALCULATE THE TRIGGER 𝑡 𝑤 = 𝜇 + 𝛼𝛿 𝑤 COMPUTE mean AND variance SAMPLE 10 Frame Sampling

13. Voice Activity Detection Process II CLASSIFY If greater than trigger then voice COMPUTE CLASSIFICATION MEASURE READ THE SAMPLE Read the frame 𝑊𝑠1 𝑚 = 𝑃𝑠1(𝑚) 1 − 𝑍 𝑠1 𝑚 𝑆𝑐

14. Feature Extraction

15. Audio Feature Extraction

16. Feature Extraction Process I APPLY MEL FILTERBANK Multiply Filterbank(20-40) by Periodogram Estimate CALCULATE PERIODOGRAM ESTIMATE 𝑃𝑖 𝑘 = 1 𝑁 | 𝑆𝑖(𝑘)|2 FRAMING Divide Audio into Sections of 20ms-40ms

17. Feature Extraction Process II KEEP REQUIRED COEFFICIENTS Keep Required Number of Coefficients DISCRETE COSINE TRANSFORM OF ENERGIES Take DCT of Coefficients of Above Step SCALING Take Logarithm of Filterbank Energies

18. Language Model

19. Using Language Model

20. Language Model Training UPDATE DICTIONARY Add to existing Dictionary CREATE LANGUAGE MODELS Derive NGram Models FETCH RAW DATA Currently From News Websites

21. Language Model Based Classification SELECT BEST 𝑃 𝑊𝑖 𝑊𝑖−1 = 𝜆1 𝑃 𝑊𝑛 𝑊𝑛−1 + 𝜆2 𝑃(𝑊𝑛) GET POSSIBLE CANDIDATES From Acoustic Model READ PREVIOUS WORD

22. ACOUSTIC MODEL

23. HMM Based Classification

24. Training the Acoustic Model TRAIN USING BAUM WELCH ALGORITHM SELECT HMM MODEL READ MFCC COEFFICIENTS AND WORD

25. Using the Acoustic Model OUTPUT WORD CORRESPONDING TO MODEL SELECT MODEL WITH MAXIMUM PROBABILITY FIND LOG PROBABILITY OF WORD FOR EACH MODEL READ MFCC COEFFICIENTS OF WORD

26. RESULTS

27. Trained vs. Untrained Input • 3 Speakers • 5X10 Words Each • 5 Testing Set Each 86.67 66.67 0 10 20 30 40 50 60 70 80 90 100 Accuracy of System Using Trained and Untrained Input Trained Set Untrained Set

28. Noise Reduced vs. Not Noise Reduced • 3 Speakers • 5X10 Words Each • Untrained Input Files for Testing • 5 Testing Set Each 46.67 66.67 0 10 20 30 40 50 60 70 80 Accuracy of System Effect of Noise Reduction Noise Not Reduced Noise Reduced

29. Gender Based Results • 7 Speakers • 3 Females, 4 Males • Animal Names as Test • Untrained Input Files for Testing 36 64 59 66 44 56 51 54 58 0 10 20 30 40 50 60 70 Female Voice Training Male Voice Training Female and Male Voice Training Gender Based Result Male Female Male + Female

30. LIMITATIONS AND RECOMMENDATIONS

31. Limitations Limited Vocabulary User Specific Noise Profiles Static MFCC Coefficients Only Training Data Storage Absent Non-Continuous Recognition

32. Recommendations Using Dynamic Coefficients Continuous HMM Model Extensive Training Better Phonemic Modeling Dynamic Noise Modeling

33. USAGE SCENARIO

34. Usage Scenario I Easy Nepali Input Automated Telecom Assistance Speech Controlled Interface Automated Transcribing

35. Usage Scenario II Military Sector for Automated Wire Tapping Public Guidance System Automated User Support (banks, corporate houses,etc.)

36. Thank You !

Editor's Notes

15 sec
15 sec (classification to recognition)
45 sec (add train hmm)
40 sec
40 sec
40 sec
40 sec
40 sec
3 min
1 min
1 min

Nepali Speech Recognition

Recommended

Recommended

More Related Content

Similar to Nepali Speech Recognition

Similar to Nepali Speech Recognition (20)

More from Aavaas Gajurel

More from Aavaas Gajurel (6)

Recently uploaded

Recently uploaded (20)

Nepali Speech Recognition

Editor's Notes