W4A 2012                                                Lyon, April 17 2012           Enhancing Learning         Accessibi...
The traditional learning scenario                                         Traditional solutions:                          ...
An accessible learning scenario                          Automatic speech transcription        Video        Audio         ...
System Architecture   Architecture for the automatic production of video lesson captions    based on     Automatic speec...
Markup Insertion   Identification of silence periods (i.e., when the speaker    does not speak)   Insertion of a unique ...
Speech2text   Transcription of the audio stream coupled with unique    markup into plain text (including the textual form...
Caption Alignment         Speech2text produced    Transcript with timestamps         plain transcriptTiming information ab...
Caption Alignment   Existing solutions:       1. Alignment of manual transcript with video       2. ASR runs twice     ...
Experimental study   Different Computer Science and Linguistics Professors    of the Communication Sciences degree of the...
Transcription accuracy                                                     Minimum Markup Distance (sec)The higher the val...
Caption length          Desktop threshold = 375 char, ARIAL font family, 16 ptThe higher the values of silence length and ...
System Prototype1024x80
Conclusions                                                 Automatic        Video                                        ...
Contacts   Supported by    Servizio Accoglienza Studenti Disabili    University of Modena and Reggio Emilia   Further in...
Upcoming SlideShare
Loading in …5
×

W4A 2012-Federico-Furini_AutomaticCaptioning

1,055 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,055
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Idee dopo presentazione: TODO: slide su confronto con sottotitolatura programmi tv fatta con respeaking: si potrebbe insistere dicendo nella prima slide che sign interpreters e stenographer sono persone terze che vengono pagate, oppure anche i respeakers….E questo è il tipo di approccio che si seguen anche per i sottotitoli televisivi per esempio. Poi nella seconda slide (quella sul SENTO scenario) dire che noi facciamo automaticamente e quindi senza persone terze coinvolte, per questo è una soluzione economica ed efficiente.
  • For years, universities have faced a number of challenges in making classroom lectures accessible to students who are deaf or hard of hearing. Traditional methods—sign interpreters, stenographers or student note takers—are often costly, difficult to procure or inconsistent. http://italy.nuance.com/naturallyspeaking/products/dns_livesub.html
  • Il progetto ha come obiettivo la progettazione e lo sviluppo di una piattaforma per la sottotitolazione automatica di materiale didattico audiovisivo e per la gestione e messa in fruizione di materiale multimediale da parte di studenti disabili.
  • We analyze audio, for instance we identify silences
  • Il progetto ha come obiettivo la progettazione e lo sviluppo di una piattaforma per la sottotitolazione automatica di materiale didattico audiovisivo e per la gestione e messa in fruizione di materiale multimediale da parte di studenti disabili.
  • W4A 2012-Federico-Furini_AutomaticCaptioning

    1. 1. W4A 2012 Lyon, April 17 2012 Enhancing Learning Accessibility through Fully Automatic Captioning Maria Federico Marco Furini Servizio Accoglienza Studenti Disabili Dipartimento di Comunicazione ed Economia Università di Modena e Reggio Emilia Università di Modena e Reggio Emilia
    2. 2. The traditional learning scenario Traditional solutions: - Sign interpreters - Stenographers - Student note takers - Respeaking Video Audioclassroom remote Disabled students: • Hearing-impaired Able-bodied students • Dyslexic • Motion impaired
    3. 3. An accessible learning scenario Automatic speech transcription Video Audio Video Audio OUR SYSTEM Textual transcript Disabled students:classroom remote • Hearing-impaired • Dyslexic Able-bodied students • Motion impaired
    4. 4. System Architecture Architecture for the automatic production of video lesson captions based on  Automatic speech recognition (ASR) technologies  A novel caption alignment mechanism that:  Introduces unique audio markups into the audio stream before transcription by an ASR  Transforms the plain transcript produced by the ASR into a timecoded transcript
    5. 5. Markup Insertion Identification of silence periods (i.e., when the speaker does not speak) Insertion of a unique markup periodically in silence periods It is important to find resonable values for silence length and minimum distance between two consecutive markups in order to have no truncated words in transcript and enough timing information
    6. 6. Speech2text Transcription of the audio stream coupled with unique markup into plain text (including the textual form of the markup) Any existing automatic speech recognition technology can be used In the system prototype we used Dragon NaturallySpeaking  Support for Italian language  Availability of speech-to-text transcription from digital audio file  Easy access to product  High accuracy (99% for dictation)
    7. 7. Caption Alignment Speech2text produced Transcript with timestamps plain transcriptTiming information about wheremarkups have been insertedby the Markup Insertion Module
    8. 8. Caption Alignment Existing solutions:  1. Alignment of manual transcript with video  2. ASR runs twice High computational environment Our solution:  Automatic: based on audio analysis  Efficient: ASR runs just one time  Technology transparent: any ASR can be used
    9. 9. Experimental study Different Computer Science and Linguistics Professors of the Communication Sciences degree of the University of Modena and Reggio Emilia teaching in front of a live audience To tune the parameters used to locate the positions where to insert audio markups To find the most appropriate hardware (microphone) and software (ASR) products to build the recording scenario To investigate the transcription accuracy
    10. 10. Transcription accuracy Minimum Markup Distance (sec)The higher the values of silence length and minimum markup distance are,the better the accuracy is, but these parameters affect the length of the producedcaptions
    11. 11. Caption length Desktop threshold = 375 char, ARIAL font family, 16 ptThe higher the values of silence length and minimum markup distance are,the longer the captions are
    12. 12. System Prototype1024x80
    13. 13. Conclusions Automatic Video Efficient Audio Technology transparent Video Audio OUR SYSTEM Textual transcript Disabled students:classroom remote • Hearing-impaired • Dyslexic Able-bodied students • Motion impaired
    14. 14. Contacts Supported by Servizio Accoglienza Studenti Disabili University of Modena and Reggio Emilia Further information: Maria Federico, Ph.D. maria.federico@unimore.it

    ×