Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CUNI at MediaEval 2013
Similar Segments in Social Speech Task
Petra Galuščáková and Pavel Pecina
galuscakova@ufal.mff.cuni...
2
Our approach
●
The queries are created from the human transcripts of the
query segments
● All words (from human transcri...
3
Segmentation
●
Regular
● Segments of 50 seconds, 25 seconds long shift
●
Machine Learning
● Decision about segment bound...
4
ML Segmentation
● Identify segment boundaries
● Each word in the transcripts belongs to a single segment
● Detect, wheth...
5
Features
● Cue words and cue tags (unigrams, bigrams and trigrams)
● Appear frequently at the segment boundary or are in...
6
Results
Segmentation
Beginnings
Segmentation
Ends
Normal.
SUR
Normal.
Recall
F-measure
REG REG 0.57 0.78 0.58
ML REG 0.6...
7
Conclusion
8
Conclusion
●
Overall best result is achieved using regular segmentation
on the ASR transcripts
● Probably caused by appr...
9
Thank you
This research is supported by the Charles University Grant Agency
(GA UK n. 920913) and the Czech Science Foun...
Upcoming SlideShare
Loading in …5
×

CUNI at MediaEval 2013 Similar Segments in Social Speech Task

577 views

Published on

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

CUNI at MediaEval 2013 Similar Segments in Social Speech Task

  1. 1. CUNI at MediaEval 2013 Similar Segments in Social Speech Task Petra Galuščáková and Pavel Pecina galuscakova@ufal.mff.cuni.cz Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University in Prague MediaEval, 18. 10. 2013
  2. 2. 2 Our approach ● The queries are created from the human transcripts of the query segments ● All words (from human transcripts) lying within the boundaries of the query segment in both tracks ● The recordings are segmented into overlapping passages ● Passages are indexed using the Terrier IR Platform ● Predefined settings, stopwords removal, Porter stemmer, pruning retrieved overlapping segments
  3. 3. 3 Segmentation ● Regular ● Segments of 50 seconds, 25 seconds long shift ● Machine Learning ● Decision about segment boundaries (for each word in the transcript) ● Classification trees ● Two types: segment boundaries identification and segment beginning identification ● Model trained and tuned on the human transcripts
  4. 4. 4 ML Segmentation ● Identify segment boundaries ● Each word in the transcripts belongs to a single segment ● Detect, whether the word is followed by a segment boundary ● The segment begins where the previous one ends → Segments do not overlap ● We need high precision of the decision ● Detect beginnings of segments ● 50 seconds long ● Segments can overlap ● We need high recall of the decision
  5. 5. 5 Features ● Cue words and cue tags (unigrams, bigrams and trigrams) ● Appear frequently at the segment boundary or are informative for the segment boundary ● Defined for segment beginning and for segment end ● e.g. for beginning: I, actually, exactly, … and for end: right?, there, so , … ● Letter cases ● Length of the silence before the word ● Division given in transcripts ● The output of the TextTiling algorithm ● Employ lexical cohesion
  6. 6. 6 Results Segmentation Beginnings Segmentation Ends Normal. SUR Normal. Recall F-measure REG REG 0.57 0.78 0.58 ML REG 0.65 0.90 0.67 ML ML 0.59 0.80 0.61 Segmentation Beginnings Segmentation Ends Normal. SUR Normal. Recall F-measure REG REG 0.87 1.19 0.90 ML REG 0.70 1.00 0.72 ML ML 0.65 0.90 0.67 Tab1. Human Transcripts Tab2. ASR Transcripts
  7. 7. 7 Conclusion
  8. 8. 8 Conclusion ● Overall best result is achieved using regular segmentation on the ASR transcripts ● Probably caused by approximated word times in human transcripts ● On the human transcripts, the ML-based segmentation outperforms the regular segmentation. ● On the ASR transcripts, the regular segmentation wins. ● ML-based segmentation searching for segment beginnings outperforms ML-segmentation searching for entire segments
  9. 9. 9 Thank you This research is supported by the Charles University Grant Agency (GA UK n. 920913) and the Czech Science Foundation (grant n. P103/12/G084).

×