Powerful Google developer tools for immediate impact! (2023-24 C)
Semantic video classification based on subtitles and domain terminologies
1. SEMANTIC VIDEO CLASSIFICATION
BASED ON SUBTITLES AND
DOMAIN TERMINOLOGIES
“基於字幕以及領域術語學為基
礎的影⽚片語義分群”
FROM:KAMC 07’ 1ST INTERNATIONAL WORKSHOP ON
KNOWLEDGE ACQUISITION FROM MULTIMEDIA
CONTENT
EDITOR: POLYXENI KATSIOULI, VASSILEIOS TSETSOS,
STATHES HADJIEFTHYMIADES
報告者:蘇⿍鼎⽂文 指導教授:林熙禎
8. Guided Learning
在Video-sharing educational tool applied to the
teaching in renewable energy subjects 論⽂文中實驗
證明能夠⽤用⼀一個影⽚片學習系統幫助學⽣生提⾼高學習能
⼒力以及動機
但影⽚片由專家⼿手動加⼊入費時且無法⾃自動化
是否能夠應⽤用Youtube海量影⽚片庫來幫助?
16. Abstract
An unsupervised approach to classify video
content by analyzing the corresponding subtitles
Based on the WordNet and WordNet domains
Apply natural language processing techniques on
video subtitles
18. semantic information from
multimedia content
multimedia databases gain
more and more popularity
a critical and challenging topic
explore efficient ways to index
their content based on its
features and semantics
19. Subtitles
carry information through natural
language sentences
may not be able to detect all video semantics, but
have several benefits:
more lightweight process than video and audio
processing
high-level semantics are more closely related to
human language
21. Semantic Video Indexing and
Summarization Using Subtitles
partitions the script in segments
represents each one as a term frequency inverse
document frequency (TF-IDF) vector
video retrieval and summarization are described
through the application of machine learning
techniques
22. MUMIS project
use of natural language processing techniques for
indexing and searching multimedia content
based on an XML-encoded ontology is applied to
textual sources of different type and in different
language separately
combines the annotations extracted from such
sources into one integrated, formal description of
their content
23. Semantic principal video shot
classification via mixture Gaussian
a framework for semantic classification of
educational surgery videos, two phases:
1.video content characterization via principal video
shots
2.video classification through a mixture Gaussian
model
24. Content-based Video Classification
Using Support Vector Machines
based on low-level features such as color, shape
and motion
use a Support Vector Machine (SVM) classifier
to classify them in one of the following class
labels: “cartoons”, “commercials”, “cricket”,
“football” and “tennis”
25. Text Classification
Decision trees are one of most important and
successful machine learning technique
leaves represent classifications
branches correspond to the combinations of
attributes that leads to those classifications
In this paper, we compare the proposed method
for classification with a decision tree classifier
27. WordNet
a large dictionaries(or lexical database)!
English nouns, verbs, adjectives and adverbs
are grouped into sets of “synsets”
Synset contains a group of synonymous words or
collocations
28. V.S. Traditional dictionaries
Traditional dictionaries are arranged alphabetically
WordNet is arranged semantically
EX:
noun synset {base, alkali}
noun synset {basis, base, foundation, fundament,
groundwork, cornerstone}
verb synset {establish, base, ground, found}.
29. semantic relations
Most synsets are connected to other synsets
through a number of semantic relations
noun synsets are related through hypernymy
(generalization), hyponymy (specialization),
holonymy (whole of), and meronymy (part of)
relations
31. WordNet domains
augmenting WordNet with domain labels
approximately 200 domain labels enhances
WordNet synsets
If none of the domain labels is adequate for a
specific synset, the label Factotum is assigned to
it (almost 35% synsets)
32. Example
Fig. 1. Some senses of the word "plant" with their
corresponding domains
35. Step 1: Text Preprocessing
subtitles are segmented into sentences
POS tagger is applied to the words of each phrase
stop words are removed as they carry no
semantics and do not contribute to the
understanding of the main text concepts
36. Keywords Extraction
identify and select only the most important and
relevant subtitle words for further classifying the
video
implemented the TextRank algorithm
The number of keywords extracted is based on
the size of the text
37. TextRank
completely unsupervised graph-based ranking
model
keywords extraction or text summarization
利⽤用投票的原理,讓每⼀一個單字給它的鄰居投贊成
票,票的權重取決於⾃自⼰己的票數
derived from Google’s PageRank algorithm
38. Step 3: Word Sense
Disambiguation
Most words in natural language are characterized
by polysemy
Ex:
BANK
39. Step 3: Word Sense
Disambiguation
Most words in natural language are characterized
by polysemy
Ex:
BANK
銀⾏行
40. Step 3: Word Sense
Disambiguation
Most words in natural language are characterized
by polysemy
Ex:
BANK
銀⾏行
河岸
斜坡
41. WSD algorithm
adaptation of Lesk’s algorithm for WSD
Lesk’s algorithm:
based on glosses found in traditional
dictionaries
assigned the sense whose gloss shares the
largest number of words with the glosses of
the other words in the context
42. Extend Lesk’s algorithm
using WordNet to include the related words’
glosses
through semantic relations ex:hyponym, hypernym
⽐比較容易在上位或下位詞中找到相關字詞
45. Example
he sat on the bank of the river
Lesk’s algorithm
Sit
river
Extend Version
stream, watercourse
lounge
Sprawl
46. Step 4: WordNet Domains
Extraction
derive the domains which these synsets
correspond to
calculate the occurrence score of each domain
label and sort them in decreasing order.
extract the WordNet domains with the highest
occurrence score
49. 圖解
keyword Synset Domain X
keyword Synset Domain X
keyword Synset Domain Y
keyword Synset Domain Z
50. 圖解
keyword Synset Domain X Wv
keyword Synset Domain X
keyword Synset Domain Y
keyword Synset Domain Z
Dx
Dy
Dz
51. Step 5: Definition of
correspondences between category
labels and WordNet domains
choose the most appropriate class label
First, we looked up in WordNet the senses related
to each category label
obtained the WordNet domains that correspond to
the senses of each category
calculated for each category the occurrence score
of each of the derived domains
58. Equation(1)
C be the set with all the category labels
D the set of all the WordNet domains that
correspond to each category label
D = {Dc
'
}
c∈C
∪
65. Equation(2)
checking which category c ∈ C satisfies equation
classifies video v under the category c
If more than one candidate, compare the second
elements and so on
Dc
'
[0] = Wv[0]
78. Experiment on documentary
36 documentaries and General types for
documentary
Geography, History, Animals, Politics…
easier to classify documentaries
usually restricted to a specific domain
contain narrative
80. Evaluation
Classification Accuracy reflects the proportion of
the classifier’s correct category assignments that
agree with the user’s assignments
used the Recall and F-measure performance
measures to evaluate the classification results for
each individual category
82. comparison
results were compared to those obtained from
decision tree classifier J4.8 of the WEKA tool
results obtained are very promising since it achieved
an accuracy value of 69.4%
Expected distance between J4.8 as unsupervised
method
83. POLYSEMA Platform
have been carried out in the context of the
POLYSEMA project
develops an end-to-end platform for interactive TV
services by exploiting the metadata of the
broadcast transmission
84. POLYSEMA Platform
present work is part of the activity in Development
of semantics extraction techniques for
automatic annotation of audiovisual content
Three kinds of techniques are currently investigated:
video summarization
domain ontology learning
video classification
86. Look back
an innovative method for unsupervised
classification of video content
applying natural language processing techniques
on their subtitles
promising experimental results using
documentaries, especially given the fact that no
training phase is required.
87. Improvement
video segments & Subtitle Segments
Compare to other text classification algorithms
(mainly unsupervised approaches)
define more knowledge domains more close to the
movie classification
keywords extraction algorithm