Semantic video classification based on subtitles and domain terminologies

SEMANTIC VIDEO CLASSIFICATION
BASED ON SUBTITLES AND
DOMAIN TERMINOLOGIES
“基於字幕以及領域術語學為基
礎的影⽚片語義分群”
FROM:KAMC 07’ 1ST INTERNATIONAL WORKSHOP ON
KNOWLEDGE ACQUISITION FROM MULTIMEDIA
CONTENT
EDITOR: POLYXENI KATSIOULI, VASSILEIOS TSETSOS,  
STATHES HADJIEFTHYMIADES
報告者：蘇⿍鼎⽂文指導教授：林熙禎

新教育革命 
當中國學生不用花半毛錢在家 
就能上到美國的知名大學課程

慕課: ⼀一場新教育⾰革命
免費教育網路服務：Coursera 已經有700萬註冊學
⽣生，超過英國和法國⼤大學⽣生⼈人數的總和。

Coursera 使⽤用者中，三分之⼀一來⾃自於發展中的經濟
體。

What is MOOC
⼤大規模網路免費公開課程（Massive Open Online
Course）

源於開放教育資源的教育理念

焦點著重於如何使學⽣生更輕易取得e化教學、更能永
續經營e化教學

能⾃自由取得資源

沒有學⽣生⼈人數限制

MOOC的優點
只需要網路連線就可以線上學習

⾃自由分享、⾃自由批評和⾃自由瀏覽

課程彈性

Free!!

MOOC的挑戰
容易困惑或迷失⽅方向

需要具備⾃自我管理的學習 
態度

Guided Learning
在Video-sharing educational tool applied to the
teaching in renewable energy subjects 論⽂文中實驗
證明能夠⽤用⼀一個影⽚片學習系統幫助學⽣生提⾼高學習能
⼒力以及動機

但影⽚片由專家⼿手動加⼊入費時且無法⾃自動化

是否能夠應⽤用Youtube海量影⽚片庫來幫助？

⾃自動分類影⽚片的⽅方法

Text MetaData
Title, Description, Tags

Entity Extraction from
consistent text

A/V Features
Audio and Video signal
classiﬁcation

ideal for games

Less ideal for general
content

Video Context
Entities from context

Comments

Web embeds

User engagement

問題
在Youtube的教育影
⽚片，Text MetaData
通常內容都太少了

畫⾯面、⾳音訊處理較困
難且處理成本較重

是否有其他可⽤用⽂文字
的⽅方式帶來較好的解
決⽅方法？

Abstract
An unsupervised approach to classify video
content by analyzing the corresponding subtitles

Based on the WordNet and WordNet domains

Apply natural language processing techniques on
video subtitles

semantic information from
multimedia content
multimedia databases gain
more and more popularity

a critical and challenging topic

explore eﬃcient ways to index
their content based on its
features and semantics

Subtitles
carry information through natural  
language sentences

may not be able to detect all video semantics, but
have several beneﬁts：

more lightweight process than video and audio
processing

high-level semantics are more closely related to
human language

Semantic Video Indexing and
Summarization Using Subtitles
partitions the script in segments

represents each one as a term frequency inverse
document frequency (TF-IDF) vector

video retrieval and summarization are described
through the application of machine learning
techniques

MUMIS project
use of natural language processing techniques for
indexing and searching multimedia content
based on an XML-encoded ontology is applied to
textual sources of diﬀerent type and in diﬀerent
language separately

combines the annotations extracted from such
sources into one integrated, formal description of
their content

Semantic principal video shot
classification via mixture Gaussian
a framework for semantic classification of
educational surgery videos, two phases:

1.video content characterization via principal video
shots

2.video classification through a mixture Gaussian
model

Content-based Video Classiﬁcation
Using Support Vector Machines
based on low-level features such as color, shape
and motion

use a Support Vector Machine (SVM) classiﬁer

to classify them in one of the following class
labels: “cartoons”, “commercials”, “cricket”,
“football” and “tennis”

Text Classification
Decision trees are one of most important and
successful machine learning technique

leaves represent classifications

branches correspond to the combinations of
attributes that leads to those classifications

In this paper, we compare the proposed method
for classification with a decision tree classifier

WordNet
a large dictionaries(or lexical database)！

English nouns, verbs, adjectives and adverbs
are grouped into sets of “synsets”
Synset contains a group of synonymous words or
collocations

V.S. Traditional dictionaries
Traditional dictionaries are arranged alphabetically

WordNet is arranged semantically

EX:

noun synset {base, alkali}

noun synset {basis, base, foundation, fundament,
groundwork, cornerstone}

verb synset {establish, base, ground, found}.

semantic relations
Most synsets are connected to other synsets
through a number of semantic relations

noun synsets are related through hypernymy
(generalization), hyponymy (specialization),
holonymy (whole of), and meronymy (part of)
relations

semantic relations Example
artefact:  
root sysnset

motorcar與motorVehicle
互為Hypernyms
&Hyponyms

WordNet domains
augmenting WordNet with domain labels

approximately 200 domain labels enhances
WordNet synsets

If none of the domain labels is adequate for a
speciﬁc synset, the label Factotum is assigned to
it (almost 35% synsets)

Example
Fig. 1. Some senses of the word "plant" with their
corresponding domains

Step 1: Text Preprocessing
subtitles are segmented into sentences

POS tagger is applied to the words of each phrase

stop words are removed as they carry no
semantics and do not contribute to the
understanding of the main text concepts

Keywords Extraction
identify and select only the most important and
relevant subtitle words for further classifying the
video

implemented the TextRank algorithm

The number of keywords extracted is based on
the size of the text

TextRank
completely unsupervised graph-based ranking
model

keywords extraction or text summarization

利⽤用投票的原理，讓每⼀一個單字給它的鄰居投贊成
票，票的權重取決於⾃自⼰己的票數

derived from Google’s PageRank algorithm

Step 3: Word Sense
Disambiguation
Most words in natural language are characterized
by polysemy

Ex:
BANK

Step 3: Word Sense
Disambiguation
by polysemy

Ex:
BANK
銀⾏行

Step 3: Word Sense
Disambiguation
by polysemy

Ex:
BANK
銀⾏行
河岸
斜坡

WSD algorithm
adaptation of Lesk’s algorithm for WSD

Lesk’s algorithm:

based on glosses found in traditional
dictionaries

assigned the sense whose gloss shares the
largest number of words with the glosses of
the other words in the context

Extend Lesk’s algorithm
using WordNet to include the related words’
glosses

through semantic relations ex:hyponym, hypernym

⽐比較容易在上位或下位詞中找到相關字詞

Example
he sat on the bank of the river

Example
Lesk’s algorithm
Sit

river

Example
Lesk’s algorithm
Sit

river
Extend Version
stream, watercourse

lounge

Sprawl

Step 4: WordNet Domains
Extraction
derive the domains which these synsets
correspond to

calculate the occurrence score of each domain
label and sort them in decreasing order.

extract the WordNet domains with the highest
occurrence score

圖解
keyword Synset Domain Ｘ
keyword Synset Domain Y
keyword Synset Domain Z

圖解
keyword Synset Domain Ｘ Wv
keyword Synset Domain Y
keyword Synset Domain Z
Dx
Dy
Dz

Step 5: Deﬁnition of
correspondences between category
labels and WordNet domains
choose the most appropriate class label

First, we looked up in WordNet the senses related
to each category label

obtained the WordNet domains that correspond to
the senses of each category

calculated for each category the occurrence score
of each of the derived domains

Dc
Sense
Sense
Sense
Sense
Dc’

c
Dc
Sense
Sense
Sense
Sense
Dc’

c
Dx
Dy
Dz
Dc
Sense
Sense
Sense
Sense
Dc’

c
Dx
Dy
Dz
Dc
Sense
Sense
Sense
Sense
Dx
Dx
Dy
Dz
Dc’

Step 6: Category label
assignment
top-ranked WordNet domains(Step5)

Video’s set of the WordNet domains (Step 4)
STEP5
STEP4

proposed deals with assigning a category label to the video entity

Equation(1)
C be the set with all the category labels

D the set of all the WordNet domains that
correspond to each category label
D = {Dc
'
}
c∈C
∪

D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy

Equation(2)
checking which category c ∈ C satisﬁes equation

classiﬁes video v under the category c

If more than one candidate, compare the second
elements and so on
Dc
'
[0] = Wv[0]

D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz

D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Ｃv

D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Ｃv
c1
c3

D
c1c2c3cN
Dx
Dy
Dz
Da
Dc
Db
Dx
Dy
Db
Dc
Dy
Wv
Dx
Dy
Dz
Ｃv
c1

Experiment on documentary
36 documentaries and General types for
documentary

Geography, History, Animals, Politics…

easier to classify documentaries

usually restricted to a speciﬁc domain

contain narrative

statistical information
approximately 44% of all the WordNet domains extracted from each
video are assigned the label ‘Factotum

Evaluation
Classification Accuracy reflects the proportion of
the classifier’s correct category assignments that
agree with the user’s assignments

used the Recall and F-measure performance
measures to evaluate the classification results for
each individual category

comparison
results were compared to those obtained from
decision tree classiﬁer J4.8 of the WEKA tool

results obtained are very promising since it achieved
an accuracy value of 69.4%

Expected distance between J4.8 as unsupervised
method

POLYSEMA Platform
have been carried out in the context of the
POLYSEMA project

develops an end-to-end platform for interactive TV
services by exploiting the metadata of the
broadcast transmission

POLYSEMA Platform
present work is part of the activity in Development
of semantics extraction techniques for
automatic annotation of audiovisual content
Three kinds of techniques are currently investigated:

video summarization

domain ontology learning

video classiﬁcation

Look back
an innovative method for unsupervised
classiﬁcation of video content

applying natural language processing techniques
on their subtitles

promising experimental results using
documentaries, especially given the fact that no
training phase is required.

Improvement
video segments & Subtitle Segments

Compare to other text classification algorithms
(mainly unsupervised approaches)

define more knowledge domains more close to the
movie classification

keywords extraction algorithm

Comment
基於字幕的Text mining⽅方式多採取Entity Extraction的
⽅方法，近來則也有採MWH(multi-wing Harmoniums),
Entity’s temporal features analysis的部分

作為unsupervised的⽅方式，其Category與Domain
Label之間的Mapping為靜態建構，若要動態調整的時
候應該不容易

⺫⽬目前採取Single Topic Single Video的⽅方式，但⼀一部影
⽚片可能會不⽌止⼀一個議題，Video Segment的⽅方式⾃自動
化可能不容易，有辦法發現Topic shifting?

Comment
現在網路教育資源不斷出現但通常難以被普通⼈人接
觸到，缺少了⼀一個整合的系統。

若我們能夠了解影⽚片的語義，那我們也許有機會可
以做出⼀一些有⽤用的應⽤用。例如幫助學⽣生找到輔助的
教材。

Semantic video classification based on subtitles and domain terminologies

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to Semantic video classification based on subtitles and domain terminologies

Similar to Semantic video classification based on subtitles and domain terminologies (20)

Recently uploaded

Recently uploaded (20)

Semantic video classification based on subtitles and domain terminologies