• Save
Intern presentation
Upcoming SlideShare
Loading in...5
×
 

Intern presentation

on

  • 448 views

 

Statistics

Views

Total Views
448
Views on SlideShare
448
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Intern presentation Intern presentation Presentation Transcript

  • Presentation on Internship Work Speech and Eye Tracking Enabled Computer Assisted Translation (SEECAT) Copenhagen Business School By: Himanshu Bansal
  • BACKGROUND Michael Carl Associate Professor CBS Srinivas Bangalore Principal Member AT&T Research Labs
  • BACKGROUND Why SEECAT ?
  • BACKGROUND We need translation To convey our thoughts foreign language speaker To understand foreign language text and speech ------------------------------------------------- Training Data for automated system To prepare high quality manuscripts of same text in different language
  • BACKGROUND ProZ Tomedes Verbalizeit gengo Straker Translations
  • BACKGROUND Translog – Manual Translation
  • BACKGROUND CASMACAT – Computer Assisted Translation
  • BACKGROUND SEECAT as an extension of CASMACAT Translator reads a source text on a computer screen and speaks out the translation in the target language, a process called sight translation. This sight translation process is supported by an Automatic Speech Recognition (ASR) and a Machine Translation (MT) system, which transcribe the spoken speech signal into the target text and which assist the translator with partial translation proposals, predictions and completions on the computer monitor. An eye-tracking device follows the translators gaze path on the screen, detects where he or she faces translation problems and triggers reactive assistance.
  • STUCTURE of INTERSHIP 21 May- 7 Jun Lectures and hands on sessions (CBS) 8 Jun- 28 Jun Divided into teams, worked at summer house (Nykobing, Falster) 29 Jun- 21 July Integration (CBS) # Excursions planned for every weekend
  • GAZE TEAM Himanshu, Kritika and Rucha Part -1 Word- Fixation Remapping Part-2 Mutual disambiguation between gaze and speech
  • Word- Fixation Remapping Word- fixation mapping is useful for cognitive/linguistic research, usability studies and most importantly for providing interactivity into the system
  • Word- Fixation Remapping Issues Identification of the Fixations in a stream of gaze samples. Mapping the Fixations to words/characters (Dealing with variable error) Evaluation scheme for the fixation mapping
  • Word- Fixation Remapping Our Approach
  • Word- Fixation Remapping Our Approach
  • Word- Fixation Remapping Our Approach
  • Word- Fixation Remapping Our Approach
  • Word- Fixation Remapping Our Approach -> Evaluation ● Input: – Manually annotated fixation to word mapping (Gold Standard) – Machine computed fixation to word mapping ● Output: – The average character/word error. ● Method: – Compute the overlaps in the gaze fixation durations in the manual and machine annotations. – For the overlapping fixations, compute the absolute differences in the cursor positions of the two mappings.
  • Mutual disambiguation between gaze and speech Motivation
  • Mutual disambiguation between gaze and speech Motivation
  • Mutual disambiguation between gaze and speech Motivation Ambiguity in gaze Variable Error -Midas Touch System Errors - Eye Tracker - Algorithm - Calibration Ambiguity in ASR Domain of training data Accent of speaker Morphology of language Speaking Style -Co-articulation
  • Mutual disambiguation between gaze and speech Motivation Consider a simple example: ● User reads the text “where is the bat” ● ASR can help map gaze points to ● Gaze can help disambiguate ASR output Where is the bat. There it is, behind the door I can't find it! Where is it? Look properly! Its right there. Here is the mat Here is the bat where is the bat where is a pat ASR Hypothesis where it there theis Possible words being gazed Intersection
  • Mutual disambiguation between gaze and speech Inspiration from literature research Meyer et.al. studied eye movements in an object naming task. It was shown that people consistently fixated objects prior to naming them. Griffin showed that when multiple objects were being named in a single utterance, Speech about one object was being produced while the next object was fixated and lexically processed.
  • Mutual disambiguation between gaze and speech Experiments -> Reading Task • 5 participants read English Text • Eye Gaze and Speech Recorded • 6 sets of 11 sentences • 5 sets in domain and 1 out of domain
  • Mutual disambiguation between gaze and speech Experiments -> Translation Task • 4 participants translated English Text • 4 sets of 10-10 very simple sentences • Target languages - Hi, Sp, Da, It • Eye gaze on source language words and speech in target languages recorded
  • Mutual disambiguation between gaze and speech Approach ASR word lattice Reference sentence: Leaving next day in the morning
  • Mutual disambiguation between gaze and speech Approach Gaze word lattice
  • Mutual disambiguation between gaze and speech Approach Gaze bag of words
  • Mutual disambiguation between gaze and speech Approach Composed word-lattice Reference sentence: Leaving next day in the morning
  • Mutual disambiguation between gaze and speech System • Performed experiments on Translog • Speech hypothesis are provided by AT&T Watson server • Transformed these format to word-lattice format using python • Generate bag of words from x,y coordinates using our algo of part 1 using c# and python • In case of translation tasks, gaze bag of words are transformed into target language bag of words using lexicons (1 more level of ambiguity) • Composed these lattices using OpenFST
  • Mutual disambiguation between gaze and speech System -> experiment with algo • Weights of gaze words : should consider or not • Weights of ASR words: should consider or not Then used Latin square -> • Unweighted ASR with Weighted Gaze Bag-of-words (WGUA) • Unweighted ASR with Unweighted Gaze Bag-of-words (UGUA) • Weighted ASR with Weighted Gaze Bag-of-words (WGWA) • Weighted ASR with Unweighted Gaze Bag-of-words (UGWA)
  • ASR SCLITE was used to get the word accuracies of then-best hypotheses with respect to the reference sentence. Eye Gaze Precision: ((Wg) inters (Wr))/Wg Recall = ((Wg) inters (Wr))/Wr F-Measure (Harmonic mean of precision and recall) Sentence Recognition Error (SRI; 1 or 0 ) Wg =Unique words in the gazed words Wr =Unique words in the reference sentence Mutual disambiguation between gaze and speech Evaluation
  • Mutual disambiguation between gaze and speech Research Design Reading Task Independent Variables Domain of test data Weights of ASR Weights of Eye Gaze Dependent Variable Gaze f-measure Gaze SRI ASR Word accuracy Translation Task Independent Variables Target language Weights of ASR Weights of Eye Gaze Dependent Variable Gaze f-measure Gaze SRI ASR Word accuracy
  • Mutual disambiguation between gaze and speech Reading– Paired T-test In domain Out of Domain All Gaze_Precision 8.63075E-07 no improvement 9.2475E-07 Gaze_Recall 0.048194288 no improvement 0.048220416 Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06 ASR_WrdAccr 0.040033133 0.86206786 0.067110316 In domain Out of Domain All Gaze_Precision 8.63075E-07 no improvement 9.2475E-07 Gaze_Recall 0.048194288 no improvement 0.048220416 Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06 ASR_WrdAccr 0.007594247 0.86206786 0.017268861 In domain Out of Domain All Gaze_Precision 8.63075E-07 no improvement 9.2475E-07 Gaze_Recall 0.048194288 no improvement 0.048220416 Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06 ASR_WrdAccr 0.040033133 0.86206786 0.067110316 In domain Out of Domain All Gaze_Precision 8.63075E-07 no improvement 9.2475E-07 Gaze_Recall 0.048194288 no improvement 0.048220416 Gaze_F-Measure 1.68557E-06 no improvement 1.7924E-06 ASR_WrdAccr 0.040033133 0.363217468 0.099456245 WGUAUGWAUGUAWGWA
  • Mutual disambiguation between gaze and speech Reading– Absolute % improvements
  • Mutual disambiguation between gaze and speech Translation – Paired T-test En-Hi En-Sp En-It En-Da Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591 Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10 Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612 ASR_Word_Accuracy 0.001722134 0.002676057 0.108137263 En-Hi En-Sp En-It En-Da Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591 Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10 Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612 ASR_Word_Accuracy 0.702333466 0.323474945 0.108137263 En-Hi En-Sp En-It En-Da Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591 Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10 Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612 ASR_Word_Accuracy 0.003101235 0.011298332 0.209938743 En-Hi En-Sp En-It En-Da Gaze_Precision 9.22715E-10 9.69933E-10 8.11911E-10 0.000916591 Gaze_Recall 3.15662E-06 0.000781894 0.016622281 1.32874E-10 Gaze_F-Measure 1.19047E-09 1.42E-10 9.78254E-11 0.000278612 ASR_Word_Accuracy 0.045589916 0.181222117 0.108137263 UGUAWGUAWGWAUGWA
  • Mutual disambiguation between gaze and speech Translation – Absolute % improvements
  • Mutual disambiguation between gaze and speech Conclusions Reading Task • Significant improvement in both Gaze F-measure and ASR accuracy after integration • Gaze recall fall significantly • SRI also improved • Improvement in In-domain task was lot more than out-of-domain task • Out of the four experiments UGWA is observed best
  • Mutual disambiguation between gaze and speech Conclusions Translation Task • Significant improvement in Gaze F-measure only for all languages • ASR accuracy improved non-significantly • For Hindi and Danish SRI decreased a lot • Again UGWA is observed to be best (For 3 languages)
  • Mutual disambiguation between gaze and speech Overview flowchart Input from gaze Fixation-word remapping algo Got x, y from already logged files EVALUATION: fixation duration intersection b/w machine and manual 3 manual and 1 machine Static text reading experiment Eye gaze data captured (translog) Audio recorded at sentence level Word- lattices Bag of Words EVALUATION: comparison with BoW of reference sentence: precision and recall 10 best hypothesis Watson server EVALUATION: compared 1st best with reference text (edit distance) - ScLite Word lattices Eye gaze disambiguation ASR disambiguation With weighted & un- weighted ASR lattices Improved BoW Improved Hypothesis EVAL EVAL Majority based also Sentence Identfi.
  • LEARNING Academic Worked with Tobii T-60 Experiment Design Python Latex Moses Translog Putty Cygwin Audacity OpenFST Got an idea of MT and ASR Personal Communication Skills Project-Management morning reporting presentations weekly targets and check Kayaking Two string kite Bit of Cooking
  • Some photos
  • Thanks Hoping the monkeys to be friends forever