Lipreading from video using deep learning

•Download as PPTX, PDF•

0 likes•95 views

The document discusses a lipreading model that uses a spatiotemporal front-end with ResNet and a temporal convolutional backend followed by a bidirectional LSTM backend to predict words from video input. It provides statistics on the model's top predictions and worst predictions on a test set of 50 examples with 500 labels each. Future work proposed includes improving the preprocessing step and using the concept of sentences.

Science

Omer Cahana
Lipreading
from
video
Vlad Katsman | Omer Cahana

Viseme
is any several speech sounds that look the
same for lip reading , such as K and G, pet
and bell
Phoneme
is one of the units sound that distinguish one
word from another in a particular language
such as thumb and dumb
Creating common language

Model
ResNet50
maxPooling3d
3D - conv
Temporal convolutional backendBidirectional Lstm - backend
conv
maxPooling
conv
averagePooling
linear
linear
lstm
lstm
linear
concatenate
lstm
lstm
Spatiotemporal front - end

A Spatiotemporal front-end + ResNet
conv3D
Max pooling 3D
Input (1,29,112,112)
b. Resnet 50

B Temporal convolutional back-end
Max pooling
Conv - 1D
Conv - 1D
Linear
Average pooling

C Bidirectional LSTM back-end
Linear layer(256)
lstm(256)
Linear layer(500)
lstm(256)
lstm(256)lstm(256)

D Results
TOP 1 PREDICTION
TOP 5 PREDICTION
Afternoon Access Between Everything Small
Afternoon
Africa
Difficult
Levels
Evidence
Access
Cases
Editor
Anything
Asking
Between
Doing
Story
Three
During
Everything
Think
Evidence
Living
Giving
Small
Support
Tomorrow
Report
Football

D1 Statistics (based on test set, 50 examples x 500 labels
0
25
50
75
100
Top predictions

D2 Statistics (based on test set, 50 examples x 500 labels
0
20
40
60
Worst predictions

Future work
● Preprocessing step
improvements
● Use concept of the
sentence

What's hot

I phone 10Jaehyeuk Oh

Data Communication Unit 2 Anjuman College of Engg. & Tech.

Ofdma 1中琳李中琳

Stbc.pptx(1)Rathangshah

Speech recognition: SurveyWonjun Jeong

04 -ece_3125_~_ece_3242_-_oct_10_202_-_assignment_1_-_due_oct_17_2012Emad ALmarday

TBS 604 Mobile Computing.docBhupesh Rawat

Chap03[1]Hafiz Muhammad Azeem Sarwar

Mimonarmada alaparthi

Source coding theorempriyadharshini murugan

International Journal of Engineering Research and Development (IJERD)IJERD Editor

Simulation of Turbo Convolutional Codes for Deep Space MissionIJERA Editor

Latent diffusions vs DALL-E v2Vitaly Bondar

Listen and accept functionJithin Parakka

D I G I T A L C O M M U N I C A T I O N S J N T U M O D E L P A P E R{Wwwguest3f9c6b

Nies cuny describing_finite_groupsAndre Nies

What's hot (16)

I phone 10

Data Communication Unit 2

Ofdma 1

Stbc.pptx(1)

Speech recognition: Survey

04 -ece_3125_~_ece_3242_-_oct_10_202_-_assignment_1_-_due_oct_17_2012

TBS 604 Mobile Computing.doc

Chap03[1]

Mimo

Source coding theorem

International Journal of Engineering Research and Development (IJERD)

Simulation of Turbo Convolutional Codes for Deep Space Mission

Latent diffusions vs DALL-E v2

Listen and accept function

D I G I T A L C O M M U N I C A T I O N S J N T U M O D E L P A P E R{Www

Nies cuny describing_finite_groups

Recently uploaded

Disentangling the origin of chemical differences using GHOSTSérgio Sacani

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh

Natural Polymer Based NanomaterialsAArockiyaNisha

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Neurodevelopmental disorders according to the dsm 5 trssuser06f238

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani

Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P

Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR

Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza

Boyles law module in the grade 10 sciencefloriejanemacaya1

Isotopic evidence of long-lived volcanism on IoSérgio Sacani

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani

Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl

Recently uploaded (20)

Disentangling the origin of chemical differences using GHOST

Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝

Natural Polymer Based Nanomaterials

Recombination DNA Technology (Nucleic Acid Hybridization )

Neurodevelopmental disorders according to the dsm 5 tr

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...

Analytical Profile of Coleus Forskohlii | Forskolin .pptx

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE

Recombinant DNA technology( Transgenic plant and animal)

Module 4: Mendelian Genetics and Punnett Square

Boyles law module in the grade 10 science

Isotopic evidence of long-lived volcanism on Io

CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service

BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...

Work, Energy and Power for class 10 ICSE Physics

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.

Lipreading from video using deep learning

1. Omer Cahana Lipreading from video Vlad Katsman | Omer Cahana

2. Viseme is any several speech sounds that look the same for lip reading , such as K and G, pet and bell Phoneme is one of the units sound that distinguish one word from another in a particular language such as thumb and dumb Creating common language

3. 02 03 01 Preprocessing

4. Model ResNet50 maxPooling3d 3D - conv Temporal convolutional backendBidirectional Lstm - backend conv maxPooling conv averagePooling linear linear lstm lstm linear concatenate lstm lstm Spatiotemporal front - end

5. A Spatiotemporal front-end + ResNet conv3D Max pooling 3D Input (1,29,112,112) b. Resnet 50

6. B Temporal convolutional back-end Max pooling Conv - 1D Conv - 1D Linear Average pooling

7. C Bidirectional LSTM back-end Linear layer(256) lstm(256) Linear layer(500) lstm(256) lstm(256)lstm(256)

8. D Results TOP 1 PREDICTION TOP 5 PREDICTION Afternoon Access Between Everything Small Afternoon Africa Difficult Levels Evidence Access Cases Editor Anything Asking Between Doing Story Three During Everything Think Evidence Living Giving Small Support Tomorrow Report Football

9. D1 Statistics (based on test set, 50 examples x 500 labels 0 25 50 75 100 Top predictions

10. D2 Statistics (based on test set, 50 examples x 500 labels 0 20 40 60 Worst predictions

11. D2 Statistics (based on test set, 50 examples x 500 labels 0 20 40 60 Worst predictions

12. D2 Statistics (based on test set, 50 examples x 500 labels 0 20 40 60 Worst predictions

13. Future work ● Preprocessing step improvements ● Use concept of the sentence

14. Thank you

Lipreading from video using deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Recently uploaded

Recently uploaded (20)

Lipreading from video using deep learning