This document summarizes an approach for identifying similar segments in social speech using machine learning segmentation techniques. It discusses:
1) Creating queries from human transcripts and indexing recordings using an IR platform after preprocessing.
2) Segmenting recordings regularly into overlapping passages or using machine learning classification trees trained on human transcripts to identify segment boundaries.
3) Features and models used for the machine learning segmentation of beginnings and ends of segments.
4) Evaluation results showing regular segmentation on ASR transcripts achieved the overall best performance.
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
CUNI at MediaEval 2013 Similar Segments in Social Speech Task
1. CUNI at MediaEval 2013
Similar Segments in Social Speech Task
Petra Galuščáková and Pavel Pecina
galuscakova@ufal.mff.cuni.cz
Institute of Formal and Applied Linguistics
Faculty of Mathematics and Physics
Charles University in Prague
MediaEval, 18. 10. 2013
2. 2
Our approach
●
The queries are created from the human transcripts of the
query segments
● All words (from human transcripts) lying within the boundaries of
the query segment in both tracks
●
The recordings are segmented into overlapping passages
●
Passages are indexed using the Terrier IR Platform
● Predefined settings, stopwords removal, Porter stemmer, pruning
retrieved overlapping segments
3. 3
Segmentation
●
Regular
● Segments of 50 seconds, 25 seconds long shift
●
Machine Learning
● Decision about segment boundaries (for each word in the
transcript)
● Classification trees
●
Two types: segment boundaries identification and
segment beginning identification
● Model trained and tuned on the human transcripts
4. 4
ML Segmentation
● Identify segment boundaries
● Each word in the transcripts belongs to a single segment
● Detect, whether the word is followed by a segment boundary
● The segment begins where the previous one ends
→ Segments do not overlap
● We need high precision of the decision
● Detect beginnings of segments
● 50 seconds long
● Segments can overlap
● We need high recall of the decision
5. 5
Features
● Cue words and cue tags (unigrams, bigrams and trigrams)
● Appear frequently at the segment boundary or are informative for the
segment boundary
● Defined for segment beginning and for segment end
● e.g. for beginning: I, actually, exactly, … and for end: right?, there, so , …
● Letter cases
● Length of the silence before the word
● Division given in transcripts
● The output of the TextTiling algorithm
● Employ lexical cohesion
8. 8
Conclusion
●
Overall best result is achieved using regular segmentation
on the ASR transcripts
● Probably caused by approximated word times in human
transcripts
●
On the human transcripts, the ML-based segmentation
outperforms the regular segmentation.
●
On the ASR transcripts, the regular segmentation wins.
●
ML-based segmentation searching for segment beginnings
outperforms ML-segmentation searching for entire
segments
9. 9
Thank you
This research is supported by the Charles University Grant Agency
(GA UK n. 920913) and the Czech Science Foundation (grant n. P103/12/G084).