W4A 2012-Federico-Furini_AutomaticCaptioning

W4A 2012                                                Lyon, April 17 2012


           Enhancing Learning
         Accessibility through Fully
          Automatic Captioning
         Maria Federico                                Marco Furini
 Servizio Accoglienza Studenti Disabili   Dipartimento di Comunicazione ed Economia
 Università di Modena e Reggio Emilia         Università di Modena e Reggio Emilia
The traditional learning scenario
                                         Traditional solutions:
                                         - Sign interpreters
                                         - Stenographers
                                         - Student note takers
                                         - Respeaking
                              Video
                              Audio




classroom                remote       Disabled students:
                                      • Hearing-impaired
  Able-bodied students                • Dyslexic
                                      • Motion impaired
An accessible learning scenario

                          Automatic speech transcription
        Video
        Audio


                                 Video
                                 Audio
   OUR SYSTEM
                            Textual transcript




                                                 Disabled students:
classroom                remote                  • Hearing-impaired
                                                 • Dyslexic
  Able-bodied students                           • Motion impaired
System Architecture




   Architecture for the automatic production of video lesson captions
    based on
     Automatic speech recognition (ASR) technologies

     A novel caption alignment mechanism that:

         Introduces unique audio markups into the audio stream before
          transcription by an ASR
         Transforms the plain transcript produced by the ASR into a
          timecoded transcript
Markup Insertion




   Identification of silence periods (i.e., when the speaker
    does not speak)
   Insertion of a unique markup periodically in silence periods

   It is important to find resonable values for silence length and
    minimum distance between two consecutive markups in
    order to have no truncated words in transcript and enough
    timing information
Speech2text
   Transcription of the audio stream coupled with unique
    markup into plain text (including the textual form of the
    markup)

   Any existing automatic speech recognition technology can
    be used
   In the system prototype we used Dragon NaturallySpeaking
       Support for Italian language
       Availability of speech-to-text transcription from digital audio file
       Easy access to product
       High accuracy (99% for dictation)
Caption Alignment
         Speech2text produced    Transcript with timestamps
         plain transcript




Timing information about where
markups have been inserted
by the Markup Insertion Module
Caption Alignment
   Existing solutions:
       1. Alignment of manual transcript with video

       2. ASR runs twice
                                High computational
                                   environment



   Our solution:
       Automatic: based on audio analysis
       Efficient: ASR runs just one time
       Technology transparent: any ASR can be used
Experimental study
   Different Computer Science and Linguistics Professors
    of the Communication Sciences degree of the University
    of Modena and Reggio Emilia teaching in front of a live
    audience
   To tune the parameters used to locate the positions
    where to insert audio markups
   To find the most appropriate hardware (microphone) and
    software (ASR) products to build the recording scenario
   To investigate the transcription accuracy
Transcription accuracy




                                                     Minimum Markup Distance (sec)



The higher the values of silence length and minimum markup distance are,
the better the accuracy is, but these parameters affect the length of the produced
captions
Caption length
          Desktop threshold = 375 char, ARIAL font family, 16 pt




The higher the values of silence length and minimum markup distance are,
the longer the captions are
System Prototype




1024x80
Conclusions
                                                 Automatic

        Video
                                                 Efficient
        Audio
                                         Technology transparent

                                 Video
                                 Audio
   OUR SYSTEM
                            Textual transcript




                                                 Disabled students:
classroom                remote                  • Hearing-impaired
                                                 • Dyslexic
  Able-bodied students                           • Motion impaired
Contacts

   Supported by
    Servizio Accoglienza Studenti Disabili
    University of Modena and Reggio Emilia

   Further information:
    Maria Federico, Ph.D.
    maria.federico@unimore.it
1 of 14

Recommended

Redundancy by
RedundancyRedundancy
RedundancyPushpendra Verma
131 views1 slide
Jobs & Professions by
 Jobs & Professions Jobs & Professions
Jobs & ProfessionsStefis Stefis
101 views5 slides
Speech Technology Overview by
Speech Technology OverviewSpeech Technology Overview
Speech Technology Overviewamr0mt
1.5K views15 slides
Audio steganography - LSB by
Audio steganography - LSBAudio steganography - LSB
Audio steganography - LSBMohab El-Shishtawy
22.2K views22 slides
Digital Audio Watermarking by
Digital Audio WatermarkingDigital Audio Watermarking
Digital Audio WatermarkingHasit Trivedi
62 views18 slides

More Related Content

Similar to W4A 2012-Federico-Furini_AutomaticCaptioning

Lectures On Demand: delivering traditional lectures over the web by
Lectures On Demand: delivering traditional lectures over the webLectures On Demand: delivering traditional lectures over the web
Lectures On Demand: delivering traditional lectures over the webronchet
744 views78 slides
Lecture Capturing System and its Advantages by
Lecture Capturing System and its AdvantagesLecture Capturing System and its Advantages
Lecture Capturing System and its AdvantagesREVA University
244 views8 slides
Audiovisual references by
Audiovisual referencesAudiovisual references
Audiovisual referencesEtilux
656 views10 slides
Digital Watermarking Of Audio Signals.pptx by
Digital Watermarking Of Audio Signals.pptxDigital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptxAyushJaiswal781174
12 views25 slides
Nanci A. Scheetz, Ed.D, CSC by
Nanci A. Scheetz, Ed.D, CSCNanci A. Scheetz, Ed.D, CSC
Nanci A. Scheetz, Ed.D, CSCVideoguy
163 views32 slides
Nanci A. Scheetz, Ed.D, CSC by
Nanci A. Scheetz, Ed.D, CSCNanci A. Scheetz, Ed.D, CSC
Nanci A. Scheetz, Ed.D, CSCVideoguy
137 views32 slides

Similar to W4A 2012-Federico-Furini_AutomaticCaptioning(20)

Lectures On Demand: delivering traditional lectures over the web by ronchet
Lectures On Demand: delivering traditional lectures over the webLectures On Demand: delivering traditional lectures over the web
Lectures On Demand: delivering traditional lectures over the web
ronchet744 views
Lecture Capturing System and its Advantages by REVA University
Lecture Capturing System and its AdvantagesLecture Capturing System and its Advantages
Lecture Capturing System and its Advantages
REVA University 244 views
Audiovisual references by Etilux
Audiovisual referencesAudiovisual references
Audiovisual references
Etilux656 views
Nanci A. Scheetz, Ed.D, CSC by Videoguy
Nanci A. Scheetz, Ed.D, CSCNanci A. Scheetz, Ed.D, CSC
Nanci A. Scheetz, Ed.D, CSC
Videoguy163 views
Nanci A. Scheetz, Ed.D, CSC by Videoguy
Nanci A. Scheetz, Ed.D, CSCNanci A. Scheetz, Ed.D, CSC
Nanci A. Scheetz, Ed.D, CSC
Videoguy137 views
Nanci A. Scheetz, Ed.D, CSC by Videoguy
Nanci A. Scheetz, Ed.D, CSCNanci A. Scheetz, Ed.D, CSC
Nanci A. Scheetz, Ed.D, CSC
Videoguy223 views
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION... by IRJET Journal
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
IRJET Journal120 views
Experiences with openEyA-Lecture Capture System (Pros and Cons) by Sara Valla
Experiences with openEyA-Lecture Capture System (Pros and Cons)Experiences with openEyA-Lecture Capture System (Pros and Cons)
Experiences with openEyA-Lecture Capture System (Pros and Cons)
Sara Valla918 views
Speechrecognition 100423091251-phpapp01 by girishjoshi1234
Speechrecognition 100423091251-phpapp01Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01
girishjoshi12341K views
Searching information in a collection of video-lectures by ronchet
Searching information in a collection of video-lecturesSearching information in a collection of video-lectures
Searching information in a collection of video-lectures
ronchet568 views
Speech recognition challenges by Alexandru Chica
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
Alexandru Chica6.6K views
Speech Recognition Technology by Aamir-sheriff
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
Aamir-sheriff1.6K views
M&L 2012 - Videos: a high tech, low cost, hands-on approach to the physiscs o... by Media & Learning Conference
M&L 2012 - Videos: a high tech, low cost, hands-on approach to the physiscs o...M&L 2012 - Videos: a high tech, low cost, hands-on approach to the physiscs o...
M&L 2012 - Videos: a high tech, low cost, hands-on approach to the physiscs o...
Speech Recognition Technology by SrijanKumar18
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
SrijanKumar18758 views
Supporting Accessibility Through Audio by Neil Milliken
Supporting Accessibility Through AudioSupporting Accessibility Through Audio
Supporting Accessibility Through Audio
Neil Milliken1.1K views

Recently uploaded

Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptx by
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxGopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxDebapriya Chakraborty
695 views81 slides
Narration lesson plan by
Narration lesson planNarration lesson plan
Narration lesson planTARIQ KHAN
61 views11 slides
Education and Diversity.pptx by
Education and Diversity.pptxEducation and Diversity.pptx
Education and Diversity.pptxDrHafizKosar
193 views16 slides
unidad 3.pdf by
unidad 3.pdfunidad 3.pdf
unidad 3.pdfMarcosRodriguezUcedo
117 views38 slides
Solar System and Galaxies.pptx by
Solar System and Galaxies.pptxSolar System and Galaxies.pptx
Solar System and Galaxies.pptxDrHafizKosar
106 views26 slides
Drama KS5 Breakdown by
Drama KS5 BreakdownDrama KS5 Breakdown
Drama KS5 BreakdownWestHatch
98 views2 slides

Recently uploaded(20)

Narration lesson plan by TARIQ KHAN
Narration lesson planNarration lesson plan
Narration lesson plan
TARIQ KHAN61 views
Education and Diversity.pptx by DrHafizKosar
Education and Diversity.pptxEducation and Diversity.pptx
Education and Diversity.pptx
DrHafizKosar193 views
Solar System and Galaxies.pptx by DrHafizKosar
Solar System and Galaxies.pptxSolar System and Galaxies.pptx
Solar System and Galaxies.pptx
DrHafizKosar106 views
Drama KS5 Breakdown by WestHatch
Drama KS5 BreakdownDrama KS5 Breakdown
Drama KS5 Breakdown
WestHatch98 views
Sociology KS5 by WestHatch
Sociology KS5Sociology KS5
Sociology KS5
WestHatch85 views
Classification of crude drugs.pptx by GayatriPatra14
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra14101 views
How to empty an One2many field in Odoo by Celine George
How to empty an One2many field in OdooHow to empty an One2many field in Odoo
How to empty an One2many field in Odoo
Celine George87 views
Psychology KS4 by WestHatch
Psychology KS4Psychology KS4
Psychology KS4
WestHatch98 views
When Sex Gets Complicated: Porn, Affairs, & Cybersex by Marlene Maheu
When Sex Gets Complicated: Porn, Affairs, & CybersexWhen Sex Gets Complicated: Porn, Affairs, & Cybersex
When Sex Gets Complicated: Porn, Affairs, & Cybersex
Marlene Maheu85 views
Class 9 lesson plans by TARIQ KHAN
Class 9 lesson plansClass 9 lesson plans
Class 9 lesson plans
TARIQ KHAN51 views

W4A 2012-Federico-Furini_AutomaticCaptioning

  • 1. W4A 2012 Lyon, April 17 2012 Enhancing Learning Accessibility through Fully Automatic Captioning Maria Federico Marco Furini Servizio Accoglienza Studenti Disabili Dipartimento di Comunicazione ed Economia Università di Modena e Reggio Emilia Università di Modena e Reggio Emilia
  • 2. The traditional learning scenario Traditional solutions: - Sign interpreters - Stenographers - Student note takers - Respeaking Video Audio classroom remote Disabled students: • Hearing-impaired Able-bodied students • Dyslexic • Motion impaired
  • 3. An accessible learning scenario Automatic speech transcription Video Audio Video Audio OUR SYSTEM Textual transcript Disabled students: classroom remote • Hearing-impaired • Dyslexic Able-bodied students • Motion impaired
  • 4. System Architecture  Architecture for the automatic production of video lesson captions based on  Automatic speech recognition (ASR) technologies  A novel caption alignment mechanism that:  Introduces unique audio markups into the audio stream before transcription by an ASR  Transforms the plain transcript produced by the ASR into a timecoded transcript
  • 5. Markup Insertion  Identification of silence periods (i.e., when the speaker does not speak)  Insertion of a unique markup periodically in silence periods  It is important to find resonable values for silence length and minimum distance between two consecutive markups in order to have no truncated words in transcript and enough timing information
  • 6. Speech2text  Transcription of the audio stream coupled with unique markup into plain text (including the textual form of the markup)  Any existing automatic speech recognition technology can be used  In the system prototype we used Dragon NaturallySpeaking  Support for Italian language  Availability of speech-to-text transcription from digital audio file  Easy access to product  High accuracy (99% for dictation)
  • 7. Caption Alignment Speech2text produced Transcript with timestamps plain transcript Timing information about where markups have been inserted by the Markup Insertion Module
  • 8. Caption Alignment  Existing solutions:  1. Alignment of manual transcript with video  2. ASR runs twice High computational environment  Our solution:  Automatic: based on audio analysis  Efficient: ASR runs just one time  Technology transparent: any ASR can be used
  • 9. Experimental study  Different Computer Science and Linguistics Professors of the Communication Sciences degree of the University of Modena and Reggio Emilia teaching in front of a live audience  To tune the parameters used to locate the positions where to insert audio markups  To find the most appropriate hardware (microphone) and software (ASR) products to build the recording scenario  To investigate the transcription accuracy
  • 10. Transcription accuracy Minimum Markup Distance (sec) The higher the values of silence length and minimum markup distance are, the better the accuracy is, but these parameters affect the length of the produced captions
  • 11. Caption length Desktop threshold = 375 char, ARIAL font family, 16 pt The higher the values of silence length and minimum markup distance are, the longer the captions are
  • 13. Conclusions Automatic Video Efficient Audio Technology transparent Video Audio OUR SYSTEM Textual transcript Disabled students: classroom remote • Hearing-impaired • Dyslexic Able-bodied students • Motion impaired
  • 14. Contacts  Supported by Servizio Accoglienza Studenti Disabili University of Modena and Reggio Emilia  Further information: Maria Federico, Ph.D. maria.federico@unimore.it

Editor's Notes

  1. Idee dopo presentazione: TODO: slide su confronto con sottotitolatura programmi tv fatta con respeaking: si potrebbe insistere dicendo nella prima slide che sign interpreters e stenographer sono persone terze che vengono pagate, oppure anche i respeakers….E questo è il tipo di approccio che si seguen anche per i sottotitoli televisivi per esempio. Poi nella seconda slide (quella sul SENTO scenario) dire che noi facciamo automaticamente e quindi senza persone terze coinvolte, per questo è una soluzione economica ed efficiente.
  2. For years, universities have faced a number of challenges in making classroom lectures accessible to students who are deaf or hard of hearing. Traditional methods—sign interpreters, stenographers or student note takers—are often costly, difficult to procure or inconsistent. http://italy.nuance.com/naturallyspeaking/products/dns_livesub.html
  3. Il progetto ha come obiettivo la progettazione e lo sviluppo di una piattaforma per la sottotitolazione automatica di materiale didattico audiovisivo e per la gestione e messa in fruizione di materiale multimediale da parte di studenti disabili.
  4. We analyze audio, for instance we identify silences
  5. Il progetto ha come obiettivo la progettazione e lo sviluppo di una piattaforma per la sottotitolazione automatica di materiale didattico audiovisivo e per la gestione e messa in fruizione di materiale multimediale da parte di studenti disabili.