Danfoss NeoCharge Technology -A Revolution in 2024.pdf
Lip Reading.pptx
1. Zeroth Review - 2021
M.Vignesh
221003105
IV-CSE A
221003105@sastra.ac.in
S.Mahadevan
221003057
IV-CSE A
221003057@sastra.ac.in
Guided By
Ms. Bhavani R
APII/CSE/SRC/SASTRA
2. PROJECT
OVERVIEW
• Visual Speech Recognition
• Extract lip features
• Neural Network model to train the lip sequence
• Transcribing and evaluate Lip movements into text
3
3. INTRODUCTION
4
• There are many Recognition System that recognize words from audio features.
• Lip reading System is one of the developing technology
• Aims to recognize words only by visual feature without audio
• Classify and recognize words by visemes movements
5. OBJECTIVE
6
• Extract textual or speech data from facial features
• Train a Neural network system to process visemes sequence
• Develop a Speaker Independed system
• Recognize and classify ten different words
7. EXISTING VS PROPOSED SYSTEM
8
Existing System:
• Uses BBCLRS2 Dataset
• Recognize and classify only ASCII characters and decode words
• Complex processing and gives better accuracy after 2000 epochs
Proposed System:
• Uses MIRACL-VC1 Dataset
• Recognize and classify ten different words
• Simple and gives better accuracy after 200 epochs
8. 9
S.No Paper Title, author Journal details with Date
of publication
Methodology applied Merits and demerits
1 Lip-Reading Driven Deep
Learning Approach for
Speech Enhancement, Ahsan
Adeel, Mandar Gogate, Amir
Hussain, and William M. Whitmer
2019- IEEE Transaction This paper uses LSTM driven
Audio Visual
mapping approach.
• Increased accuracy
• Autonomous Speech
enhancement
• Poor performance for
Realtime speech
2 An audio-visual corpus for
multimodal automatic
speech recognition, Andrzej
Czyzewski, Bozena Kostek,
Piotr Bratoszewski,Jozef Kotus,
Marcin Szykulski
2017-Springer This paper uses Active
Appearance Model(AAM) and
Hidden Markov Models(HMM)
• Recognize in street noise
• Babble noise dramatically
worsens the accuracy of
speech recognition
3 Extraction of Visual Features for
Lipreading, Iain Matthews,
Timothy F. Cootes, J. Andrew
Bangham, Stephen Cox,
Richard Harvey
2017-IEEE Transaction This paper uses Active shape
model (ASM) and point
distribution model (PDM)
• Accuracy improved when
a noisy audio
signal is augmented with visual
information
• Poor performance in
Babble noise
4 Audio-visual speech
recognition using deep
learning, Kuniaki Noda, Yuki
Yamaguchi, Kazuhiro Nakadai,
Hiroshi G. Okuno,Tetsuya Ogata
2014-Springer This paper uses Hidden Markov
Model(HMM)
• Increased Performance
• Reverberation, illumination,
and facial orientation,
occur
5 Speaker-Independent Speech
Recognition using
Visual Features, Pooventhiran
G., Sandeep A.
2020-IEEE This paper uses 3D-CNN model • Improved Accuracy
• Complex
LITERATURE SURVEY
10. REFERENCES
11
[1] A. Thanda and S. M. Venkatesan, “Audio visual speech recognition using deep recurrent
neural networks,” in IAPR workshop on multimodal pattern recognition of social signals in human-
computer interaction. Springer, 2016
[2] E. Petajan, B. Bischoff, D. Bodoff, and N. M. Brooke, “An improved automatic lipreading
system to enhance speech recognition,” in Proceedings of the SIGCHI conference on Human factors
in computing systems, 1988
[3] A. Torfi, S. M. Iranmanesh, N. Nasrabadi, and J. Dawson, “3d convolutional
neural networks for cross audio-visual matching recognition,” IEEE Access, vol. 5, pp. 22 081–22 091,
2017.
[4] N. Alothmany, R. Boston, C. Li, S. Shaiman, and J. Durrant, “Classification
of visemes using visual cues,” in Proceedings ELMAR-2010.
IEEE, 2010.
[5] I. Almajai, S. Cox, R. Harvey, and Y. Lan, “Improved speaker independent lip reading using
speaker adaptive training and deep neural networks,” in 2016 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP).