Successfully reported this slideshow.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

W4A 2012-Federico-Furini_AutomaticCaptioning

  1. 1. W4A 2012 Lyon, April 17 2012 Enhancing Learning Accessibility through Fully Automatic Captioning Maria Federico Marco Furini Servizio Accoglienza Studenti Disabili Dipartimento di Comunicazione ed Economia Università di Modena e Reggio Emilia Università di Modena e Reggio Emilia
  2. 2. The traditional learning scenario Traditional solutions: - Sign interpreters - Stenographers - Student note takers - Respeaking Video Audio classroom remote Disabled students: • Hearing-impaired Able-bodied students • Dyslexic • Motion impaired
  3. 3. An accessible learning scenario Automatic speech transcription Video Audio Video Audio OUR SYSTEM Textual transcript Disabled students: classroom remote • Hearing-impaired • Dyslexic Able-bodied students • Motion impaired
  4. 4. System Architecture  Architecture for the automatic production of video lesson captions based on  Automatic speech recognition (ASR) technologies  A novel caption alignment mechanism that:  Introduces unique audio markups into the audio stream before transcription by an ASR  Transforms the plain transcript produced by the ASR into a timecoded transcript
  5. 5. Markup Insertion  Identification of silence periods (i.e., when the speaker does not speak)  Insertion of a unique markup periodically in silence periods  It is important to find resonable values for silence length and minimum distance between two consecutive markups in order to have no truncated words in transcript and enough timing information
  6. 6. Speech2text  Transcription of the audio stream coupled with unique markup into plain text (including the textual form of the markup)  Any existing automatic speech recognition technology can be used  In the system prototype we used Dragon NaturallySpeaking  Support for Italian language  Availability of speech-to-text transcription from digital audio file  Easy access to product  High accuracy (99% for dictation)
  7. 7. Caption Alignment Speech2text produced Transcript with timestamps plain transcript Timing information about where markups have been inserted by the Markup Insertion Module
  8. 8. Caption Alignment  Existing solutions:  1. Alignment of manual transcript with video  2. ASR runs twice High computational environment  Our solution:  Automatic: based on audio analysis  Efficient: ASR runs just one time  Technology transparent: any ASR can be used
  9. 9. Experimental study  Different Computer Science and Linguistics Professors of the Communication Sciences degree of the University of Modena and Reggio Emilia teaching in front of a live audience  To tune the parameters used to locate the positions where to insert audio markups  To find the most appropriate hardware (microphone) and software (ASR) products to build the recording scenario  To investigate the transcription accuracy
  10. 10. Transcription accuracy Minimum Markup Distance (sec) The higher the values of silence length and minimum markup distance are, the better the accuracy is, but these parameters affect the length of the produced captions
  11. 11. Caption length Desktop threshold = 375 char, ARIAL font family, 16 pt The higher the values of silence length and minimum markup distance are, the longer the captions are
  12. 12. System Prototype 1024x80
  13. 13. Conclusions Automatic Video Efficient Audio Technology transparent Video Audio OUR SYSTEM Textual transcript Disabled students: classroom remote • Hearing-impaired • Dyslexic Able-bodied students • Motion impaired
  14. 14. Contacts  Supported by Servizio Accoglienza Studenti Disabili University of Modena and Reggio Emilia  Further information: Maria Federico, Ph.D. maria.federico@unimore.it

Editor's Notes

  • Idee dopo presentazione: TODO: slide su confronto con sottotitolatura programmi tv fatta con respeaking: si potrebbe insistere dicendo nella prima slide che sign interpreters e stenographer sono persone terze che vengono pagate, oppure anche i respeakers….E questo è il tipo di approccio che si seguen anche per i sottotitoli televisivi per esempio. Poi nella seconda slide (quella sul SENTO scenario) dire che noi facciamo automaticamente e quindi senza persone terze coinvolte, per questo è una soluzione economica ed efficiente.
  • For years, universities have faced a number of challenges in making classroom lectures accessible to students who are deaf or hard of hearing. Traditional methods—sign interpreters, stenographers or student note takers—are often costly, difficult to procure or inconsistent. http://italy.nuance.com/naturallyspeaking/products/dns_livesub.html
  • Il progetto ha come obiettivo la progettazione e lo sviluppo di una piattaforma per la sottotitolazione automatica di materiale didattico audiovisivo e per la gestione e messa in fruizione di materiale multimediale da parte di studenti disabili.
  • We analyze audio, for instance we identify silences
  • Il progetto ha come obiettivo la progettazione e lo sviluppo di una piattaforma per la sottotitolazione automatica di materiale didattico audiovisivo e per la gestione e messa in fruizione di materiale multimediale da parte di studenti disabili.
  • ×