Zeroth Review - 2021
M.Vignesh
221003105
IV-CSE A
221003105@sastra.ac.in
S.Mahadevan
221003057
IV-CSE A
221003057@sastra.ac.in
Guided By
Ms. Bhavani R
APII/CSE/SRC/SASTRA
PROJECT
OVERVIEW
• Visual Speech Recognition
• Extract lip features
• Neural Network model to train the lip sequence
• Transcribing and evaluate Lip movements into text
3
INTRODUCTION
4
• There are many Recognition System that recognize words from audio features.
• Lip reading System is one of the developing technology
• Aims to recognize words only by visual feature without audio
• Classify and recognize words by visemes movements
PROBLEM
STATEMENT
5
• Noisy Environment
• Speech Speed
• Accent
• Pronunciation
• Facial Features
OBJECTIVE
6
• Extract textual or speech data from facial features
• Train a Neural network system to process visemes sequence
• Develop a Speaker Independed system
• Recognize and classify ten different words
S/W –H/W REQUIREMENTS
7
SOFTWARE REQUIREMENTS:
• Anaconda
• System:64 bit OS, x64 processor
HARDWARE REQUIREMENTS:
• 4 GB RAM
• Better GPU(For performance)
EXISTING VS PROPOSED SYSTEM
8
Existing System:
• Uses BBCLRS2 Dataset
• Recognize and classify only ASCII characters and decode words
• Complex processing and gives better accuracy after 2000 epochs
Proposed System:
• Uses MIRACL-VC1 Dataset
• Recognize and classify ten different words
• Simple and gives better accuracy after 200 epochs
9
S.No Paper Title, author Journal details with Date
of publication
Methodology applied Merits and demerits
1 Lip-Reading Driven Deep
Learning Approach for
Speech Enhancement, Ahsan
Adeel, Mandar Gogate, Amir
Hussain, and William M. Whitmer
2019- IEEE Transaction This paper uses LSTM driven
Audio Visual
mapping approach.
• Increased accuracy
• Autonomous Speech
enhancement
• Poor performance for
Realtime speech
2 An audio-visual corpus for
multimodal automatic
speech recognition, Andrzej
Czyzewski, Bozena Kostek,
Piotr Bratoszewski,Jozef Kotus,
Marcin Szykulski
2017-Springer This paper uses Active
Appearance Model(AAM) and
Hidden Markov Models(HMM)
• Recognize in street noise
• Babble noise dramatically
worsens the accuracy of
speech recognition
3 Extraction of Visual Features for
Lipreading, Iain Matthews,
Timothy F. Cootes, J. Andrew
Bangham, Stephen Cox,
Richard Harvey
2017-IEEE Transaction This paper uses Active shape
model (ASM) and point
distribution model (PDM)
• Accuracy improved when
a noisy audio
signal is augmented with visual
information
• Poor performance in
Babble noise
4 Audio-visual speech
recognition using deep
learning, Kuniaki Noda, Yuki
Yamaguchi, Kazuhiro Nakadai,
Hiroshi G. Okuno,Tetsuya Ogata
2014-Springer This paper uses Hidden Markov
Model(HMM)
• Increased Performance
• Reverberation, illumination,
and facial orientation,
occur
5 Speaker-Independent Speech
Recognition using
Visual Features, Pooventhiran
G., Sandeep A.
2020-IEEE This paper uses 3D-CNN model • Improved Accuracy
• Complex
LITERATURE SURVEY
10
PROPOSED ARCHITECTURE
REFERENCES
11
[1] A. Thanda and S. M. Venkatesan, “Audio visual speech recognition using deep recurrent
neural networks,” in IAPR workshop on multimodal pattern recognition of social signals in human-
computer interaction. Springer, 2016
[2] E. Petajan, B. Bischoff, D. Bodoff, and N. M. Brooke, “An improved automatic lipreading
system to enhance speech recognition,” in Proceedings of the SIGCHI conference on Human factors
in computing systems, 1988
[3] A. Torfi, S. M. Iranmanesh, N. Nasrabadi, and J. Dawson, “3d convolutional
neural networks for cross audio-visual matching recognition,” IEEE Access, vol. 5, pp. 22 081–22 091,
2017.
[4] N. Alothmany, R. Boston, C. Li, S. Shaiman, and J. Durrant, “Classification
of visemes using visual cues,” in Proceedings ELMAR-2010.
IEEE, 2010.
[5] I. Almajai, S. Cox, R. Harvey, and Y. Lan, “Improved speaker independent lip reading using
speaker adaptive training and deep neural networks,” in 2016 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP).
12
THANK YOU

Lip Reading.pptx

  • 1.
    Zeroth Review -2021 M.Vignesh 221003105 IV-CSE A 221003105@sastra.ac.in S.Mahadevan 221003057 IV-CSE A 221003057@sastra.ac.in Guided By Ms. Bhavani R APII/CSE/SRC/SASTRA
  • 2.
    PROJECT OVERVIEW • Visual SpeechRecognition • Extract lip features • Neural Network model to train the lip sequence • Transcribing and evaluate Lip movements into text 3
  • 3.
    INTRODUCTION 4 • There aremany Recognition System that recognize words from audio features. • Lip reading System is one of the developing technology • Aims to recognize words only by visual feature without audio • Classify and recognize words by visemes movements
  • 4.
    PROBLEM STATEMENT 5 • Noisy Environment •Speech Speed • Accent • Pronunciation • Facial Features
  • 5.
    OBJECTIVE 6 • Extract textualor speech data from facial features • Train a Neural network system to process visemes sequence • Develop a Speaker Independed system • Recognize and classify ten different words
  • 6.
    S/W –H/W REQUIREMENTS 7 SOFTWAREREQUIREMENTS: • Anaconda • System:64 bit OS, x64 processor HARDWARE REQUIREMENTS: • 4 GB RAM • Better GPU(For performance)
  • 7.
    EXISTING VS PROPOSEDSYSTEM 8 Existing System: • Uses BBCLRS2 Dataset • Recognize and classify only ASCII characters and decode words • Complex processing and gives better accuracy after 2000 epochs Proposed System: • Uses MIRACL-VC1 Dataset • Recognize and classify ten different words • Simple and gives better accuracy after 200 epochs
  • 8.
    9 S.No Paper Title,author Journal details with Date of publication Methodology applied Merits and demerits 1 Lip-Reading Driven Deep Learning Approach for Speech Enhancement, Ahsan Adeel, Mandar Gogate, Amir Hussain, and William M. Whitmer 2019- IEEE Transaction This paper uses LSTM driven Audio Visual mapping approach. • Increased accuracy • Autonomous Speech enhancement • Poor performance for Realtime speech 2 An audio-visual corpus for multimodal automatic speech recognition, Andrzej Czyzewski, Bozena Kostek, Piotr Bratoszewski,Jozef Kotus, Marcin Szykulski 2017-Springer This paper uses Active Appearance Model(AAM) and Hidden Markov Models(HMM) • Recognize in street noise • Babble noise dramatically worsens the accuracy of speech recognition 3 Extraction of Visual Features for Lipreading, Iain Matthews, Timothy F. Cootes, J. Andrew Bangham, Stephen Cox, Richard Harvey 2017-IEEE Transaction This paper uses Active shape model (ASM) and point distribution model (PDM) • Accuracy improved when a noisy audio signal is augmented with visual information • Poor performance in Babble noise 4 Audio-visual speech recognition using deep learning, Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno,Tetsuya Ogata 2014-Springer This paper uses Hidden Markov Model(HMM) • Increased Performance • Reverberation, illumination, and facial orientation, occur 5 Speaker-Independent Speech Recognition using Visual Features, Pooventhiran G., Sandeep A. 2020-IEEE This paper uses 3D-CNN model • Improved Accuracy • Complex LITERATURE SURVEY
  • 9.
  • 10.
    REFERENCES 11 [1] A. Thandaand S. M. Venkatesan, “Audio visual speech recognition using deep recurrent neural networks,” in IAPR workshop on multimodal pattern recognition of social signals in human- computer interaction. Springer, 2016 [2] E. Petajan, B. Bischoff, D. Bodoff, and N. M. Brooke, “An improved automatic lipreading system to enhance speech recognition,” in Proceedings of the SIGCHI conference on Human factors in computing systems, 1988 [3] A. Torfi, S. M. Iranmanesh, N. Nasrabadi, and J. Dawson, “3d convolutional neural networks for cross audio-visual matching recognition,” IEEE Access, vol. 5, pp. 22 081–22 091, 2017. [4] N. Alothmany, R. Boston, C. Li, S. Shaiman, and J. Durrant, “Classification of visemes using visual cues,” in Proceedings ELMAR-2010. IEEE, 2010. [5] I. Almajai, S. Cox, R. Harvey, and Y. Lan, “Improved speaker independent lip reading using speaker adaptive training and deep neural networks,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
  • 11.