Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Hearing by seeing: Can improving the
visibility of the speaker's lips make you hear
better?
Najwa Alghamdi, MSc
Bio
• Lecturer in the Information Technology Department, CCIS, KSU.
• SKERG member
• PhD Candidate in University of Sheffi...
Introduction
• Cochlear implants help profoundly deaf people
• to become more aware of everyday sounds
• to understand spe...
Introduction: Training after implantation
• Auditory training is formal listening activities whose goal is to
optimize the...
Introduction: Enhancing training videos
• Lander and Capek (2013) found that increasing and decreasing lip
visibility by a...
Aim of Research
• Investigate whether or not artificially
enhancing the appearance of a speaker’s lips:
• Supports lip-rea...
Enhancement Method
Automatic
tracking
using
Faceware
Analyzer*
XML
Parser
Smoothing
landmarks using
piecewise
bicubic Bézi...
Enhancement Method
28/04/2016
8
Original Simulated
© King Saud University - The University of Sheffield
Method: Subjects
• 46 non-native, Saudi listeners from King Saud University, Riyadh,
Saudi Arabia
• Minimum IELTS score = ...
Method: Stimuli
• We used the Grid corpus*
• Example: ‘bin blue at L 8 please’
• Audio and video (facial) recordings of 10...
Method: Stimuli
• The Grid videos are processed to produce the different stimuli
• The subjects need to identify the colou...
Results: three sets
1. The impact of using E speech in auditory
training
• training gain = post-test - pre-test
2. A compa...
1. The Impact of Using Enhanced Audiovisual
Speech in Auditory Training
28/04/2016
13
A V E ANOVA Post-hoc
Pre-test mean s...
2. A comparison of the intelligibility of A, V and E speech
28/04/2016
14
k=1
3
Identification score by X subjects in Sess...
3. Letter Confusion Matrices from post-test results
• Letter identification was the most challenging task
• 25 letters to ...
3. Letter Confusion Matrices from post-test results
• The analysis of the letter confusion matrices for the audio-only
pos...
3. Letter Confusion Matrices from post-test results
• The visual signal might impede learning the discrimination of
visual...
Conclusions
• Audio-only post-training tests suggest that the enhanced visual
signal improves the training gain of partici...
Current Experiment in SKERG
• Evaluation study of a new enhancement method that exaggerate
speaking style of the speaker i...
Amalghamdi1@sheffield.ac.uk
www.najwa-alghamdi.net
20
28/04/2016
This research has been supported by the Saudi
Ministry of...
References
• T. Bent, A. Buchwald, D. B. Pisoni. “Perceptual adaptation and intelligibility of
multiple talkers for two ty...
22
© The University of Sheffield28/04/2016
User’s
response
Letter
presented
in the test
Colouring the lips
• Smoothing the lip contours
• Where are controls points and is a Bernstein polynomial given
by
28/04/2...
Luminance Blending
• Luminance blending was utilized as well to improve colour blending under
different lighting condition...
CI simulation
• The GRID audio was spectrally distorted using an eight-channel sine-wave vocoder
(AngelSim*).
• Normal hea...
28/04/2016 © King Saud University - The University of Sheffield
29
healthy cochlea -- ‫سليمة‬ ‫قوقعة‬
back
28/04/2016 © King Saud University - The University of Sheffield
30
cochlear implant-- ‫قوقع‬‫ة‬‫الكترونية‬
back
neurosenso...
Upcoming SlideShare
Loading in …5
×

Hearing by seeing: Can improving the visibility of the speaker's lips make you hear better?

408 views

Published on

SKERG Seminar Tuesday, April 5, 2016 at 12:00pm

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Hearing by seeing: Can improving the visibility of the speaker's lips make you hear better?

  1. 1. Hearing by seeing: Can improving the visibility of the speaker's lips make you hear better? Najwa Alghamdi, MSc
  2. 2. Bio • Lecturer in the Information Technology Department, CCIS, KSU. • SKERG member • PhD Candidate in University of Sheffield. • Member of the Computer Graphics and Virtual Reality, Speech and Hearing, research groups • Supervised by: Dr.Steve Maddock, Prof. Guy Brown and Dr. Jon Barker. • My research investigates methods for enhancing visual speech intelligibility* to support hard of hearing ( cochlear implant (CI) users in particular). • Alghamdi, Najwa / Maddock, Steve / Brown, Guy J. / Barker, Jon (2015): "Investigating the impact of artificial enhancement of lip visibility on the intelligibility of spectrally-distorted speech", In FAAVSP-2015, 93-98. * Speech intelligibility is a measure of how comprehensible speech is in given conditions 28/04/2016 © King Saud University - The University of Sheffield 2
  3. 3. Introduction • Cochlear implants help profoundly deaf people • to become more aware of everyday sounds • to understand speech better when combined with lip-reading • The sound waveform is separated by band-pass filters into different frequency components • Users initially describe the sound characteristics like “mechanical” and “synthetic” 28/04/2016 3 Real Synthesized Cochlear Implant (CI) © King Saud University - The University of Sheffield
  4. 4. Introduction: Training after implantation • Auditory training is formal listening activities whose goal is to optimize the activity of speech perception. • Auditory training helps the CI user use the new ‘hearing’ • Typically, training uses audio-only speech stimuli • Recent studies suggest that using visual speech stimuli in the training may maximize the benefit from the training (Bernstein et al., 2013) 28/04/2016 4 Audio-only Audiovisual © King Saud University - The University of Sheffield
  5. 5. Introduction: Enhancing training videos • Lander and Capek (2013) found that increasing and decreasing lip visibility by applying lipstick and concealer improved the speechreading performance of words and sentences compared to natural, unadorned lips • Our idea is to artificially colour a speaker’s lips in a video sequence to improve lip visibility 28/04/2016 5 Natural lip with lipstick with concealer © King Saud University - The University of Sheffield
  6. 6. Aim of Research • Investigate whether or not artificially enhancing the appearance of a speaker’s lips: • Supports lip-reading thus improving the intelligibility of visual speech • Improves auditory training • Preliminary step: study non-native, normal hearing listeners using cochlear implant simulation. Why? • Both CI users and non-native listeners deal with internal adverse conditions when listening to CI processed speech: • Linguistic knowledge in non-native listeners (Bent et al., 2009) • Damaged inner ear in a CI user • Non-native listeners may help predict the performance of CI users 28/04/2016 6 © King Saud University - The University of Sheffield
  7. 7. Enhancement Method Automatic tracking using Faceware Analyzer* XML Parser Smoothing landmarks using piecewise bicubic Bézier curves Colour & luminance blending Lip contour smoothing using average filter Landmarks XML file 7 28/04/2016 *http://facewaretech.com/products/software/analyzer/ © King Saud University - The University of Sheffield
  8. 8. Enhancement Method 28/04/2016 8 Original Simulated © King Saud University - The University of Sheffield
  9. 9. Method: Subjects • 46 non-native, Saudi listeners from King Saud University, Riyadh, Saudi Arabia • Minimum IELTS score = 5.5 • Subjects are split into groups 28/04/2016 9 Group Size Pre-test Training Post-test A Audio-only 13 A A AV AudioVisual 19 V E Enhanced audiovisual 14 E © King Saud University - The University of Sheffield
  10. 10. Method: Stimuli • We used the Grid corpus* • Example: ‘bin blue at L 8 please’ • Audio and video (facial) recordings of 1000 sentences × 34 talkers (18 male, 16 female) • We used audio and video recordings made by a single talker 28/04/2016 10 command colour preposition letter digit adverb bin, lay, place, set blue, green, red, white at, by, in, with 25, no ‘w’ 10 again, now, please, soon *http://spandh.dcs.shef.ac.uk/gridcorpus/ © King Saud University - The University of Sheffield
  11. 11. Method: Stimuli • The Grid videos are processed to produce the different stimuli • The subjects need to identify the colour, the letter and the digit keyword of a Grid stimulus in all training and testing sessions 28/04/2016 11 A V E Grid audio stimuli of a single speaker are spectrally distorted (vocoded) to simulate CI processed speech (Tabri et al., 2011) The audio tracks of Grid videos are replaced with the spectrally distorted Audio-only stimuli The speaker's lips in the Audiovisual stimuli are automatically tracked and artificially coloured © King Saud University - The University of Sheffield
  12. 12. Results: three sets 1. The impact of using E speech in auditory training • training gain = post-test - pre-test 2. A comparison of the intelligibility of A, V and E speech • Training scores can be used to provide a subjective intelligibility assessment 3. Letter confusion matrices from post-test • Understand the possible sources of confusion when identifying letters during the audio-only post test 28/04/2016 12 © King Saud University - The University of Sheffield
  13. 13. 1. The Impact of Using Enhanced Audiovisual Speech in Auditory Training 28/04/2016 13 A V E ANOVA Post-hoc Pre-test mean scores 14% 14% 13% Post-test mean scores 46% 54% 71% p= 0.04 p= 0.037 Training gain 32% 40% 58% p= 0.01 p= 0.009 © King Saud University - The University of Sheffield
  14. 14. 2. A comparison of the intelligibility of A, V and E speech 28/04/2016 14 k=1 3 Identification score by X subjects in Session kSpeech intelligibility of X speech = × 100 60where X = {A, V or E} ANOVA p= 0.008 Post-hoc p= 0.006, between A & E © King Saud University - The University of Sheffield
  15. 15. 3. Letter Confusion Matrices from post-test results • Letter identification was the most challenging task • 25 letters to choose from [no ‘W’] • Due to the vocoding process, some letters sound similar: (P,B), (G,T), (M,N) and vowels 28/04/2016 15 Letter presented in the test User’s response Clusters 1 Dipthongs 2 Contains plosive sounds 3 Contains nasal sounds 4 Contains fricative sounds 5 Contains a lateral approximant sound © King Saud University - The University of Sheffield
  16. 16. 3. Letter Confusion Matrices from post-test results • The analysis of the letter confusion matrices for the audio-only post-test shows that E subjects were better at letter and diphthong identification: 28/04/2016 16 E V A Letter identification 75% 65% 55% Vowels identification 81% 66% 52% E V A © King Saud University - The University of Sheffield
  17. 17. 3. Letter Confusion Matrices from post-test results • The visual signal might impede learning the discrimination of visually similar sounds such as P & B . 28/04/2016 17 E V A 28/04/2016 © King Saud University - The University of Sheffield
  18. 18. Conclusions • Audio-only post-training tests suggest that the enhanced visual signal improves the training gain of participants • Intelligibility of spectrally-distorted speech is improved when a corresponding enhanced visual signal is introduced • Next steps: Expand the study; Similar experiment on a group of CI and hearing aid users 28/04/2016 18 © King Saud University - The University of Sheffield
  19. 19. Current Experiment in SKERG • Evaluation study of a new enhancement method that exaggerate speaking style of the speaker in the video. 28/04/2016 © King Saud University - The University of Sheffield 19 Normal Exaggerated Exaggerated with lipstick
  20. 20. Amalghamdi1@sheffield.ac.uk www.najwa-alghamdi.net 20 28/04/2016 This research has been supported by the Saudi Ministry of Education, King Saud University and Faceware Technologies Inc. © King Saud University - The University of Sheffield
  21. 21. References • T. Bent, A. Buchwald, D. B. Pisoni. “Perceptual adaptation and intelligibility of multiple talkers for two types of degraded speech,” The Journal of the Acoustical Society of America,126(5), 2660–2669,2009. • L. E. Bernstein, E. T. Auer Jr, S. P. Eberhardt, and J. Jiang, “Auditory perceptual learning for speech perception can be enhanced by audiovisual training,” Frontiers in neuroscience, vol. 7, 2013. • M. F., Dorman, P. C, Loizou. “The identification of consonants and vowels by cochlear implant patients using a 6-channel continuous interleaved sampling processor and by normal-hearing subjects using simulations of processors with two to nine channels”. Ear and hearing 19(2), 162–166, 1998. • K. Lander and C. Capek, “Investigating the impact of lip visibility and talking style on speechreading performance,” Speech Communication, vol. 55, no. 5, pp. 600– 605, 2013. • D. Tabri, K. M. S. A. Chacra, and T. Pring, “Speech perception in noise by monolingual, bilingual and trilingual listeners,” International Journal of Language & Communication Disorders, vol. 46, no. 4, pp. 411–422, 2011. 21 28/04/2016 © King Saud University - The University of Sheffield
  22. 22. 22 © The University of Sheffield28/04/2016 User’s response Letter presented in the test
  23. 23. Colouring the lips • Smoothing the lip contours • Where are controls points and is a Bernstein polynomial given by 28/04/2016 © The University of Sheffield 23
  24. 24. Luminance Blending • Luminance blending was utilized as well to improve colour blending under different lighting conditions • This was accomplished by applying Luminance blending in luma/chroma (Y‘CbCr) space and then converting the results to the RGB space using the following equations 28/04/2016 © The University of Sheffield 24
  25. 25. CI simulation • The GRID audio was spectrally distorted using an eight-channel sine-wave vocoder (AngelSim*). • Normal hearing listeners can perform in a comparable way to CI users when hearing no more or less than 8 channels (Dorman et al., 1998) • The fluctuation of noise in a noise vocoder is not presented in real CI (Bent et al., 2009) thus we used the sinewave vocoder. • The processing of vocoding 1. The signal is divided into 8 channels by a bandpass filter [200 to 7,000Hz] (slope=24dB/octave); 2. Each channel was then low-pass filtered by 160Hz (slope=24dB/octave) to obtain the envelope; 3. The envelope of each channel modulated a sine wave that replaced the signal frequency 28/04/2016 © The University of Sheffield 25 *http://www.tigerspeech.com
  26. 26. 28/04/2016 © King Saud University - The University of Sheffield 29 healthy cochlea -- ‫سليمة‬ ‫قوقعة‬ back
  27. 27. 28/04/2016 © King Saud University - The University of Sheffield 30 cochlear implant-- ‫قوقع‬‫ة‬‫الكترونية‬ back neurosensory hearing-loss conditions

×