SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1782
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE
FOR DEAF PEOPLE
1Assistant professor, Dept. Of Computer Science Engineering, VNRVJIET college, Telangana, India
2,3,4,5 Student, Dept. Of Computer Science Engineering, VNRVJIET college, Telangana, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Internet has taken the place of traditional
communication channels. The use of online communication
has resulted in a rapid rise in audio and visual information.
Although it has been beneficial for the majority of people,
those with special needs, such as the deaf, have few resources
at their disposal. A speech-to-text Conversion programme is
written.The audio input is converted to text using speech
recognition technology. Algorithms for Natural Language
Processing are used to extract root words and segment words
and translate to different languages.
Our goal is to create a multilingual speech-to-text conversion
system employing Hugging Face for hearing-impairedpeople.
The technology will make it possible for deaf individuals to
instantly translate spokenlanguageintotext, assistingthemin
a variety of tasks including listening to lectures, participating
in meetings, or even having conversations with others.
Hugging Face's[11] cutting-edge neural network models and
natural language processing algorithms will improve the
suggested system's precision and effectiveness. Thesystemwill
also support many languages, enabling a widerangeofpeople
throughout the world able to utilise it. Overall, the suggested
technology would enable seamless communication and
engagement in many contexts, greatly enhancing the quality
of life for those with hearing impairments.
Key Words: transformers, whisper, Machine Learning,
Automatic Speech recognition, web page, tokenizer,
pipeline.
1.INTRODUCTION
With the developmentofmachinelearningand deeplearning
algorithms, automated voice recognitionhasbecomea major
study area. Using pre-trained language models, like Hugging
Face, for fine-tuning is one such method.
Hugging Face, a well-known NLP library, offers pre-trained
models for a variety of NLP tasks, including text
categorization,question-answering,andlanguageproduction
[11]. Access to pre-trained Transformer architecture-based
speech recognition models is also made available by the
library.
In order to increase the model's precision and performance
on a certain speech recognition task, fine-tuning Hugging
Face models [11] for voice recognition entails training the
pre-trained models on a particular dataset. When trained on
domain-specific datasets, this method has been proven to
greatly increase the accuracy of voice recognition systems.
In this study, our efforts with honing Hugging Face models
for voice recognition were discussed. Outlined the training
dataset, the fine-tuning procedure, and the assessment
measures employed to gauge the model's effectiveness.Also
included front end to record voice frommicrophoneinorder
to test the model’s efficiency and accuracy. The front-end
enables easy use of the model and provides good user-
experience.
Our findings show the value of fine-tuning pre-trained
language models like Hugging Face[11] for speech
recognition, and thought this strategy has a great deal of
promise for raising the precision and efficiency of speech
recognition systems across a variety of domains.
2.LITERATURE SURVEY
Applications that target rescue weretheonesthatwere most
prevalent when some comparable ones enteredtheresearch
sector. They are:
In [1], The model proposed in this paper uses standardinput
speech to text Conversion engine to take the input speech.
With the use of cutting-edge artificial intelligence methods
like Deepnets and Q learning, the search space is reduced
using the HMM (Hidden Markov) model, and the optimised
results are then used in real time. The accuracy of the
suggested phonetic model is 90%.
In [2], The Proposed approach deals with recognition of two
different languages – kannada and English. A Deep learning
voice recognition model is used in conjunction with a word
prediction model. When evaluated on a multilingual dataset,
the accuracy is 85%, and when the user tests it in real time,
the accuracy is 71%. Cosine similarity model was employed.
When predictions were made using the average similarityof
each class, a 59% average accuracy for the cosine similarity
model was attained.
In [3] This paper presents a complete speech-to-text
conversion systemforBangla languageusingDeepRecurrent
Neural Networks. Possible optimization such as Broken
Language Format has been proposed which is based on
Dr.N.V Sailaja, Billakanti Sushma, Aredla Likitha Reddy, Charitha Parachuri, Chandra Akash
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1783
properties of the Bangla Language for reducing the training
time of the network. It was trained with collected data and
which yielded over 95% accuracy incaseoftrainingdata and
50% accuracy in case of testing data.
In [4] The open-sourced Sphinx 4 framework, which is built
in Java and offers the necessary procedural coding tools to
construct an acoustic model for a unique language like
Bengali, is required for the proposed model. To edit the
recorded data, we utilised Audacity, a free digital audio
workstation (DAW). Used an audio dataset of recorded
speech from 10 different speakers, including both male and
female speakers, and produced their own unique transcript
files to test the performance. Experimental findings show
that the suggested model had an average accuracy of 71.7%
for the dataset they analysed.
In [5] It is an encoder-decoder model based on
Transformers, commonly known as a sequence-to-sequence
model. A series of audio spectrogram properties are
translated into a series of text tokens using this approach.
First, the feature extractor transforms the raw audio input
into a log-Mel spectrogram.Thespectrogramisthen encoded
using the Transformer encoder to create a series of encoder
hidden states. Finally, the decoder predicts text tokens
autoregressively based on both the hidden states of the
encoder and the preceding tokens.
In [6] This model used Recurrent Neural Networks to
convert Speech to text. This model includes prosodic
modelling method. Prosodic modelling is done at the audio
decoding process post-processing stage with the goal of
identifying word-boundary cues to help with language
decoding. To comprehend the input prosodic information
and the output text-boundary, the RNN Model has three
layers. After the training of RNN Model then it is used to
generate the word -boundary cues to solve the problem of
word-boundary ambiguity. Two Schemas were proposed.
First Schema directly takes RNN outputs anddirectlyaddsto
the word-sequence hypothesis. Schema 2 is extension of
Schema 1 by adding Finite State Machines for setting path
constraints.TheCharacteraccuracywere73.6%,74.6%using
the schemas 1 and 2.
3.METHODOLOGY
The suggested system uses a Transformer-based encoder-
decoder as its foundation. The encoder receives a log-Mel
spectrogram as input. Cross-attention techniques provide
the decoder with the final concealed states of the encoder.
Text tokens are conditionally predicted by the decoder
autoregressively based on hidden states from the encoder
and previously predicted tokens.
Fig. 1. Architecture diagram
3.1 Data Collection
A number of crowd-sourced dataset called CommonVoicein
which people record Wikipedia textina varietyoflanguages.
Version 11 of the Common Voice dataset is used in this
instance. As for our chosen language, Hindi, an Indian
language used in the northern, central, eastern, and western
parts of India, serve as the basis forfurtherdevelopingof our
model.
3.2 Feature Extraction and Tokenizing
Speech is represented asa temporallyvarying1-dimensional
array. The signal's amplitude at every particular time step is
represented by the value of the array at that instant. Speech
includes an endless variety of amplitude values since it is
continuous. Computer hardware that relies on finite arrays
will have issues. As a result, discretized our voice signal by
taking samples of its values at predetermined timeintervals.
The sampling rate, which is often expressed in samples per
second or Hertz (Hz), is the interval with which sounds are
sampled.
Although storing more information per second is necessary,
sampling with a greater sample rate produces a better
approximation of the continuous speech input.
The feature extractor carries out two tasks. A series of audio
samples are originally padded or truncated so that each one
has a 30-second input time. By adding zeros to the end ofthe
sequence, samples that are less than 30 seconds are
stretched to 30 seconds (zeroes in an audio transmission
signify a silent or no sound). No need to use attention mask
as every batch is padded or truncated to maximum length in
input. Tokenizer has been pre-trained using the
transcriptions for 96 pre-training languages [11]. Load the
pre-trained check point and train it on the data set and fine
tune it.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1784
3.3 Training and Evaluation
Define a data collator: This programme takes the data that
has already been processedandcreatesPyTorchtensorsthat
are required for the model.
Evaluation Metrics: Used word error rate (WER) measure to
assess the model.
Load pre-trained checkpoint: Load the checkpoint that has
already been trained and properly set it up for training.
Define the training arguments: The Hugging Trainer[11]will
use them to build the training schedule.
3.4 Web Application
For the project's purposes, a web application is created that
provides complete control over the fine-tuned model. The
web application serves as a user interface for the model and
is written using Gradio.
An interface page with controller choices makes up the
application. On the page, optionslikerecording,clearing,and
submitting are provided. The end-user can manually enter
voice from microphone with the help of these control
choices.
The backend support of the application is fine-tuned model.
The View portion is a straightforward front-end component
with only one page. The computer and model must be linked
to a Wi-Fi network in order for the web application to work.
No extra setup is required. First clicking on record button
enables microphone and records the voice that is spoken.
Stop button is provided in order to stop the recording. After
recording, submit and clear buttons are provided. When
submit is clicked the model transcribes it and displays the
text result in the text box provided. You can use Flag button
to store that recording.
3.5 Results
The procedure goes as follows. The machine learning model
is directly connected to the microphone. When an audio is
recorded through the microphone the machine learning
model processes the audio and displays the text
corresponding to the recorded audio on the screen. The
recognized speech is then segmented into 30 seconds and if
it is less than that then 0’s appended at the end. These
segments then are converted into Log-Mel Spectrum inside
the model and through several layers it divides speech into
tokens and prints the resulted text. While transcribing the
added 0’s will be deleted. The WER of the model is 32.42%
which is very less.
Fig. 2 Testing the model
Fig. 3. Application demo
3. CONCLUSIONS
Our work helps the deaf people in understanding others
speech by converting it to textbyusinghuggingface[11].The
motivation for this kind of study and experiments is the fact
that whisper [12] is a simple ASR system for its perfect
understandability in human face-to-face communication.
Speech Recognition for Hindi Language and Conversion into
Hindi Text is done. The implemented model is trained on a
dataset of 3500 audios and achieved a word error rate of
32.42%. It is shown that whisper signals, can give high
scores in word recognition for speech. For Open AI whisper
small the WER is 87.30%[12] and now our work fine-tuned
the model and reduced the Word Error rate to 32.42%.
REFERENCES
[1] Gulbakshee Dharmale, Dipti D. Patil and V. M. Thakare,
“Implementation of EfficientSpeechRecognitionSystem
on Mobile Device for Hindi and English Language”
International Journal ofAdvancedComputerScienceand
Applications(IJACSA), 10(2), 2019.
[2] Multilingual Speech to Text using Deep Learning based
on MFCC Features P Deepak Reddy, Chirag Rudresh and
Adithya A S, PES University, India.
[3] M. T. Tausif, S. Chowdhury, M. S. Hawlader, M.
Hasanuzzaman and H. Heickal, "Deep Learning Based
Bangla Speech-to-Text Conversion," 2018 5th
International Conference on Computational Science/
Intelligence and Applied Informatics (CSII), Yonago,
Japan, 2018, pp. 49-54, doi: 10.1109/CSII.2018.00016.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1785
[4] A. U. Nasib, H. Kabir, R. Ahmed and J. Uddin, "A Real
Time Speech to Text Conversion Technique for Bengali
Language," 2018 International Conference on Computer,
Communication, Chemical, Material and Electronic
Engineering (IC4ME2), Rajshahi, Bangladesh, 2018, pp.
1-4, doi: 10.1109/IC4ME2.2018.8465680.
[5] https://arxiv.org/abs/2212.04356
[6] Wern-Jun Wang, Yuan-Fu Liao, Sin-Horng Chen,
RNN-based prosodic modeling for mandarinspeechand
its application to speech-to-text conversion,Speech
Communication,Volume 36, Issues3–4,2002,Pages247-
265,ISSN 0167-6393,https://doi.org/10.1016/S0167-
6393(01)00006-1.
[7] Software Engineering - A Practitioner’sApproach,Roger
S. Pressman, McGraw-Hill
[8] Machine Learning, Tom M. Mitchell, McGraw-Hill
International Edition, 6th Edition, 2001
[9] https://huggingface.co/datasets/mozilla-
foundation/common_voice_11_0
[10] https://huggingface.co/blog/fine-tune-whisper
[11] https://huggingface.co/docs
[12] https://openai.com/research/whisper

More Related Content

Similar to MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
 
A study on the techniques for speech to speech translation
A study on the techniques for speech to speech translationA study on the techniques for speech to speech translation
A study on the techniques for speech to speech translationIRJET Journal
 
Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...eSAT Publishing House
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...IRJET Journal
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET Journal
 
IRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech AnalysisIRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech AnalysisIRJET Journal
 
Speech To Speech Translation
Speech To Speech TranslationSpeech To Speech Translation
Speech To Speech TranslationIRJET Journal
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...kevig
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...kevig
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...tsysglobalsolutions
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing SystemIRJET Journal
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET Journal
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationeSAT Journals
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationeSAT Publishing House
 

Similar to MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE (20)

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
A study on the techniques for speech to speech translation
A study on the techniques for speech to speech translationA study on the techniques for speech to speech translation
A study on the techniques for speech to speech translation
 
Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...Identification of frequency domain using quantum based optimization neural ne...
Identification of frequency domain using quantum based optimization neural ne...
 
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...IRJET-  	  A Review on Audible Sound Analysis based on State Clustering throu...
IRJET- A Review on Audible Sound Analysis based on State Clustering throu...
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
 
IRJET - Audio Emotion Analysis
IRJET - Audio Emotion AnalysisIRJET - Audio Emotion Analysis
IRJET - Audio Emotion Analysis
 
IRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech AnalysisIRJET - Threat Prediction using Speech Analysis
IRJET - Threat Prediction using Speech Analysis
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
 
Speech To Speech Translation
Speech To Speech TranslationSpeech To Speech Translation
Speech To Speech Translation
 
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
 
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
Ieee transactions on 2018 TOPICS with Abstract in audio, speech, and language...
 
Modeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector QuantizationModeling Text Independent Speaker Identification with Vector Quantization
Modeling Text Independent Speaker Identification with Vector Quantization
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
H42045359
H42045359H42045359
H42045359
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 

Recently uploaded (20)

Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 

MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1782 MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE 1Assistant professor, Dept. Of Computer Science Engineering, VNRVJIET college, Telangana, India 2,3,4,5 Student, Dept. Of Computer Science Engineering, VNRVJIET college, Telangana, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - Internet has taken the place of traditional communication channels. The use of online communication has resulted in a rapid rise in audio and visual information. Although it has been beneficial for the majority of people, those with special needs, such as the deaf, have few resources at their disposal. A speech-to-text Conversion programme is written.The audio input is converted to text using speech recognition technology. Algorithms for Natural Language Processing are used to extract root words and segment words and translate to different languages. Our goal is to create a multilingual speech-to-text conversion system employing Hugging Face for hearing-impairedpeople. The technology will make it possible for deaf individuals to instantly translate spokenlanguageintotext, assistingthemin a variety of tasks including listening to lectures, participating in meetings, or even having conversations with others. Hugging Face's[11] cutting-edge neural network models and natural language processing algorithms will improve the suggested system's precision and effectiveness. Thesystemwill also support many languages, enabling a widerangeofpeople throughout the world able to utilise it. Overall, the suggested technology would enable seamless communication and engagement in many contexts, greatly enhancing the quality of life for those with hearing impairments. Key Words: transformers, whisper, Machine Learning, Automatic Speech recognition, web page, tokenizer, pipeline. 1.INTRODUCTION With the developmentofmachinelearningand deeplearning algorithms, automated voice recognitionhasbecomea major study area. Using pre-trained language models, like Hugging Face, for fine-tuning is one such method. Hugging Face, a well-known NLP library, offers pre-trained models for a variety of NLP tasks, including text categorization,question-answering,andlanguageproduction [11]. Access to pre-trained Transformer architecture-based speech recognition models is also made available by the library. In order to increase the model's precision and performance on a certain speech recognition task, fine-tuning Hugging Face models [11] for voice recognition entails training the pre-trained models on a particular dataset. When trained on domain-specific datasets, this method has been proven to greatly increase the accuracy of voice recognition systems. In this study, our efforts with honing Hugging Face models for voice recognition were discussed. Outlined the training dataset, the fine-tuning procedure, and the assessment measures employed to gauge the model's effectiveness.Also included front end to record voice frommicrophoneinorder to test the model’s efficiency and accuracy. The front-end enables easy use of the model and provides good user- experience. Our findings show the value of fine-tuning pre-trained language models like Hugging Face[11] for speech recognition, and thought this strategy has a great deal of promise for raising the precision and efficiency of speech recognition systems across a variety of domains. 2.LITERATURE SURVEY Applications that target rescue weretheonesthatwere most prevalent when some comparable ones enteredtheresearch sector. They are: In [1], The model proposed in this paper uses standardinput speech to text Conversion engine to take the input speech. With the use of cutting-edge artificial intelligence methods like Deepnets and Q learning, the search space is reduced using the HMM (Hidden Markov) model, and the optimised results are then used in real time. The accuracy of the suggested phonetic model is 90%. In [2], The Proposed approach deals with recognition of two different languages – kannada and English. A Deep learning voice recognition model is used in conjunction with a word prediction model. When evaluated on a multilingual dataset, the accuracy is 85%, and when the user tests it in real time, the accuracy is 71%. Cosine similarity model was employed. When predictions were made using the average similarityof each class, a 59% average accuracy for the cosine similarity model was attained. In [3] This paper presents a complete speech-to-text conversion systemforBangla languageusingDeepRecurrent Neural Networks. Possible optimization such as Broken Language Format has been proposed which is based on Dr.N.V Sailaja, Billakanti Sushma, Aredla Likitha Reddy, Charitha Parachuri, Chandra Akash
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1783 properties of the Bangla Language for reducing the training time of the network. It was trained with collected data and which yielded over 95% accuracy incaseoftrainingdata and 50% accuracy in case of testing data. In [4] The open-sourced Sphinx 4 framework, which is built in Java and offers the necessary procedural coding tools to construct an acoustic model for a unique language like Bengali, is required for the proposed model. To edit the recorded data, we utilised Audacity, a free digital audio workstation (DAW). Used an audio dataset of recorded speech from 10 different speakers, including both male and female speakers, and produced their own unique transcript files to test the performance. Experimental findings show that the suggested model had an average accuracy of 71.7% for the dataset they analysed. In [5] It is an encoder-decoder model based on Transformers, commonly known as a sequence-to-sequence model. A series of audio spectrogram properties are translated into a series of text tokens using this approach. First, the feature extractor transforms the raw audio input into a log-Mel spectrogram.Thespectrogramisthen encoded using the Transformer encoder to create a series of encoder hidden states. Finally, the decoder predicts text tokens autoregressively based on both the hidden states of the encoder and the preceding tokens. In [6] This model used Recurrent Neural Networks to convert Speech to text. This model includes prosodic modelling method. Prosodic modelling is done at the audio decoding process post-processing stage with the goal of identifying word-boundary cues to help with language decoding. To comprehend the input prosodic information and the output text-boundary, the RNN Model has three layers. After the training of RNN Model then it is used to generate the word -boundary cues to solve the problem of word-boundary ambiguity. Two Schemas were proposed. First Schema directly takes RNN outputs anddirectlyaddsto the word-sequence hypothesis. Schema 2 is extension of Schema 1 by adding Finite State Machines for setting path constraints.TheCharacteraccuracywere73.6%,74.6%using the schemas 1 and 2. 3.METHODOLOGY The suggested system uses a Transformer-based encoder- decoder as its foundation. The encoder receives a log-Mel spectrogram as input. Cross-attention techniques provide the decoder with the final concealed states of the encoder. Text tokens are conditionally predicted by the decoder autoregressively based on hidden states from the encoder and previously predicted tokens. Fig. 1. Architecture diagram 3.1 Data Collection A number of crowd-sourced dataset called CommonVoicein which people record Wikipedia textina varietyoflanguages. Version 11 of the Common Voice dataset is used in this instance. As for our chosen language, Hindi, an Indian language used in the northern, central, eastern, and western parts of India, serve as the basis forfurtherdevelopingof our model. 3.2 Feature Extraction and Tokenizing Speech is represented asa temporallyvarying1-dimensional array. The signal's amplitude at every particular time step is represented by the value of the array at that instant. Speech includes an endless variety of amplitude values since it is continuous. Computer hardware that relies on finite arrays will have issues. As a result, discretized our voice signal by taking samples of its values at predetermined timeintervals. The sampling rate, which is often expressed in samples per second or Hertz (Hz), is the interval with which sounds are sampled. Although storing more information per second is necessary, sampling with a greater sample rate produces a better approximation of the continuous speech input. The feature extractor carries out two tasks. A series of audio samples are originally padded or truncated so that each one has a 30-second input time. By adding zeros to the end ofthe sequence, samples that are less than 30 seconds are stretched to 30 seconds (zeroes in an audio transmission signify a silent or no sound). No need to use attention mask as every batch is padded or truncated to maximum length in input. Tokenizer has been pre-trained using the transcriptions for 96 pre-training languages [11]. Load the pre-trained check point and train it on the data set and fine tune it.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1784 3.3 Training and Evaluation Define a data collator: This programme takes the data that has already been processedandcreatesPyTorchtensorsthat are required for the model. Evaluation Metrics: Used word error rate (WER) measure to assess the model. Load pre-trained checkpoint: Load the checkpoint that has already been trained and properly set it up for training. Define the training arguments: The Hugging Trainer[11]will use them to build the training schedule. 3.4 Web Application For the project's purposes, a web application is created that provides complete control over the fine-tuned model. The web application serves as a user interface for the model and is written using Gradio. An interface page with controller choices makes up the application. On the page, optionslikerecording,clearing,and submitting are provided. The end-user can manually enter voice from microphone with the help of these control choices. The backend support of the application is fine-tuned model. The View portion is a straightforward front-end component with only one page. The computer and model must be linked to a Wi-Fi network in order for the web application to work. No extra setup is required. First clicking on record button enables microphone and records the voice that is spoken. Stop button is provided in order to stop the recording. After recording, submit and clear buttons are provided. When submit is clicked the model transcribes it and displays the text result in the text box provided. You can use Flag button to store that recording. 3.5 Results The procedure goes as follows. The machine learning model is directly connected to the microphone. When an audio is recorded through the microphone the machine learning model processes the audio and displays the text corresponding to the recorded audio on the screen. The recognized speech is then segmented into 30 seconds and if it is less than that then 0’s appended at the end. These segments then are converted into Log-Mel Spectrum inside the model and through several layers it divides speech into tokens and prints the resulted text. While transcribing the added 0’s will be deleted. The WER of the model is 32.42% which is very less. Fig. 2 Testing the model Fig. 3. Application demo 3. CONCLUSIONS Our work helps the deaf people in understanding others speech by converting it to textbyusinghuggingface[11].The motivation for this kind of study and experiments is the fact that whisper [12] is a simple ASR system for its perfect understandability in human face-to-face communication. Speech Recognition for Hindi Language and Conversion into Hindi Text is done. The implemented model is trained on a dataset of 3500 audios and achieved a word error rate of 32.42%. It is shown that whisper signals, can give high scores in word recognition for speech. For Open AI whisper small the WER is 87.30%[12] and now our work fine-tuned the model and reduced the Word Error rate to 32.42%. REFERENCES [1] Gulbakshee Dharmale, Dipti D. Patil and V. M. Thakare, “Implementation of EfficientSpeechRecognitionSystem on Mobile Device for Hindi and English Language” International Journal ofAdvancedComputerScienceand Applications(IJACSA), 10(2), 2019. [2] Multilingual Speech to Text using Deep Learning based on MFCC Features P Deepak Reddy, Chirag Rudresh and Adithya A S, PES University, India. [3] M. T. Tausif, S. Chowdhury, M. S. Hawlader, M. Hasanuzzaman and H. Heickal, "Deep Learning Based Bangla Speech-to-Text Conversion," 2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII), Yonago, Japan, 2018, pp. 49-54, doi: 10.1109/CSII.2018.00016.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page 1785 [4] A. U. Nasib, H. Kabir, R. Ahmed and J. Uddin, "A Real Time Speech to Text Conversion Technique for Bengali Language," 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, 2018, pp. 1-4, doi: 10.1109/IC4ME2.2018.8465680. [5] https://arxiv.org/abs/2212.04356 [6] Wern-Jun Wang, Yuan-Fu Liao, Sin-Horng Chen, RNN-based prosodic modeling for mandarinspeechand its application to speech-to-text conversion,Speech Communication,Volume 36, Issues3–4,2002,Pages247- 265,ISSN 0167-6393,https://doi.org/10.1016/S0167- 6393(01)00006-1. [7] Software Engineering - A Practitioner’sApproach,Roger S. Pressman, McGraw-Hill [8] Machine Learning, Tom M. Mitchell, McGraw-Hill International Edition, 6th Edition, 2001 [9] https://huggingface.co/datasets/mozilla- foundation/common_voice_11_0 [10] https://huggingface.co/blog/fine-tune-whisper [11] https://huggingface.co/docs [12] https://openai.com/research/whisper