Rec: All Lecture Capture Workshop
11 December 2013
Carlos Turró
Universitat Politècnica de València
EC FP7 ICT project #287755
Motivation
• Video lecture repositories and MOOCs
• Thousands of hours of video lectures available
• Hundreds of hours of video lectures recorded
every week

• Most video lectures only available in their
original language
• No subtitles

12 Nov 2013

2
Motivation
• Transcriptions and translations are needed
•
•
•
•
•

Accessibility for people with disabilities
Accessibility for speakers of different languages
Search and analysis functions
Automated topic finding
…

12 Nov 2013

3
Motivation
• Transcriptions and translations are needed
•
•
•
•
•

Accessibility for people with disabilities
Accessibility for speakers of different languages
Search and analysis functions
Automated topic finding
…

• How do we get there?

12 Nov 2013

4
The transLectures approach
1. Automatic Speech Recognition (ASR)
and Machine Translation (MT)
• Adaptation: Taking advantage of the characteristics
of video lecture repositories
• High-quality automatic transcriptions and translations

2. Interactive postediting:
intelligent interaction for reduced effort

12 Nov 2013

5
Goals
• Development of an engine for adaptation &
Intelligent interaction
• Implementation
• Case studies: Videolectures.NET & Polimedia
• Real-life evaluation
• Integration into Opencast Matterhorn
http://opencast.org/matterhorn/
12 Nov 2013

6
The transLectures partners
Name

1
2
3
3+
4
5
6

Country

Universitat Politècnica de València
Xerox SAS
Institut Jožef Stefan
Knowledge for All Foundation
RWTH Aachen University
EML – European Media Laboratory
DDS – Deluxe Digital Studios

Spain
France
Slovenia
UK
Germany
Germany
UK

Now we are in M25

12 Nov 2013

7
36 Months
Statistical Transcription
(and translation)
Acustic
Model

Sound

Language
Model

ASR Engine
Statistical transcription
(and translation)
Acustic
Model

Manually transcripted
voice

Modeling Engine

Language
Model
Architecture of TransLectures
Lecture

Language
Model

Result

Slides

Transcription

Translation

Extra
content

Intelligent interaction
Languages
• Transcription (ASR)
• EN
• SL
• ES

• Translation (MT)
•
•
•
•

EN>SL , SL>EN
EN>ES , ES>EN
EN>FR
EN>DE

12 Nov 2013

1
1
Case study:
VideoLectures.NET

15000 lectures
Case study: Polimedia

10000 Learning Objects
Demo
http://translectures.videolectures.net
http://polimedia.upv.es/catalogo
http://translectures.eu/player/
Scientific evaluations
• Transcription results

Worse

• WER: Word Error Rate (%)
• Goal: WER < 20%

• EN, SL, ES

Better

12 Nov 2013

15
Scientific evaluations
• Translation results

Better

• BLEU
• Goal: BLEU > 30

•
•
•
•

EN>SL , SL>EN
EN>ES , ES>EN
EN>FR
EN>DE
Worse

12 Nov 2013

16
Y2 results and comparison

12 Nov 2013

17
Y2 results and comparison

12 Nov 2013

18
Y2 results and comparison

12 Nov 2013

19
Massive adaptation
• Characteristics
of video lectures

Just one person
Known speaker
Clear talking
No interruptions

Focused on a topic
Slides

12 Nov 2013

20
Massive adaptation
• Known speaker and topic
• Slides
• Related documents

12 Nov 2013

21
Intelligent interaction
• Postediting automatic transcriptions/translations
• The user invests the least possible effort
• The system learns the most from it

• Confidence measures
• Fast constrained search

12 Nov 2013

22
Intelligent interaction

12 Nov 2013

23
Intelligent interaction

12 Nov 2013

20
Implementation and integration
• Videolectures.NET
• Polimedia
• Opencast Matterhorn

12 Nov 2013

25
The tL player
Online HTML5 VideoPlayer editor with editing capabilities.
The user interface has three different editing layouts, and full
keyboard support.
User interaction statistics analyzed to improve user
experience and develop a user model.
tL player
Manual upload of lectures
transLectures: tools available
• The transLectures-UPV Toolkit (TLK) for ASR
• www.translectures.eu/tlk

• RWTH Aachen: rASR, Jane (MT)
• http://www-i6.informatik.rwth-aachen.de/web/Software/

Note that you need an acoustic & language model

12 Nov 2013

29
transLectures: tools at M30
• The tL player (& editor)
• tL Opencast Matterhorn module
• Cloud service for testing
• Coming soon at M30 (www.translectures.eu)

More info at the
OCWC
conference
(Ljubljana) in
April 2014
Next steps for transLectures
• Keep improving ASR and MT results
• Keep improving tL open source tools (TLK, tL player)
• External user evaluations (VL.NET and polimedia)
• External trials: implementation in other universities

12 Nov 2013

31
Next EU project: EMMA
• MOOC related project
• transLectures work in adding 7 new transciption systems
(English, Italian, Spanish, French, Dutch, Portuguese and
Estonian)
• … and 8 translation systems (from
Italian, Spanish, French, Dutch, Portuguese and Estonian into
English; and from English into Italian and Spanish)
• Beginning in 2014
www.translectures.eu

Thanks!
My mail (Carlos Turro)
Project coordinator:
Alfons Juan-Ciscar

turro@cc.upv.es

ajuan@dsic.upv.es

EC FP7 ICT Programme – Project Number 287755
12 Nov 2013

33

Subtitling & translation of weblectures by Carlos Turró Ribalta