The document describes an approach for semantically linking subtitles in videos to Wikipedia articles in real-time. It presents features used for ranking link candidates, including anchor text features, target article features, and context graph features that represent the context surrounding mention candidates. An evaluation on TV debate videos shows the approach achieves an R-precision of 0.7177 and MAP of 0.7884 using these features within a learning to rerank framework, outperforming the baseline retrieval model. Adding individual context graph features provides further improvements, though combining all three does not yield additional gains.
Presentation 16 may archive achievements awards tom de smet
Presentation 16 may morning casestudy 1 maarten de rijke
1. Inside the
social video live player
Semantic linking based on subtitles
Daan Odijk, Edgar Meij & Maarten de Rijke
2. Het zou de enige redding zijn voor de
doodzieke club.
3. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
4. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
5. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
6. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
3.9%
Maandag
7. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
3.9%
Maandag
46.5%
Johan Cruijff
8. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
46.5%
Johan Cruijff
9. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
46.5%
Johan Cruijff
25.7%
Barcelona
10. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
46.5%
Johan Cruijff
0.5%
FC Barcelona
25.7%
Barcelona
11. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
46.5%
Johan Cruijff
0.5%
FC Barcelona
25.7%
Barcelona
10.7%
Granaat
12. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
13. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
14. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
15. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
16. Het zou de enige redding zijn voor de
doodzieke club.
wenste hij bestuur en directie van Ajax
per onmiddellijk naar huis.
In zijn wekelijkse Telegraaf-column...
Afgelopen maandag wierp Johan Cruijff
vanuit zijn woonplaats Barcelona...
weer eens een granaat de
Amsterdamse Arena in.
29. Context as a graph
chunk t1 chunk t2
anchor (t2, a)
30. Context as a graph
chunk t1 chunk t2
anchor (t2, a)
31. Context as a graph
w
chunk t1 chunk t2
anchor (t2, a)
32. Context as a graph
w
chunk t1 chunk t2 chunk t3
anchor (t2, a)
33. Context as a graph
w
chunk t1 chunk t2 chunk t3
anchor (t2, a) anchor (t3, a’)
34. Table 1: Features used for the learning to rerank approach.
Anchor features
LEN (a) = |a| Number of terms in the n-gram a
IDFf (a) Inverse document frequency of a in representation f, where f 2 {title, anchor, content}
KEYPHRASE(a) Probability that a is used as an anchor text in Wikipedia (documents)
LINKPROB(a) Probability that a is used as an anchor text in Wikipedia (occurrences)
SNIL(a) Number of articles whose title equals a sub-n-gram of a
SNCL(a) Number of articles whose title match a sub-n-gram of a
Target features
LINKSf (w) Number of Wikipedia articles linking to or from w, where f 2 {in, out} respectively
GEN (w) Depth of w in Wikipedia category hierarchy
REDIRECT(w) Number of redirect pages linking to w
WIKISTATSn (w) Number of times w was visited in the last n 2 {7, 28, 365} days
WIKISTATSTRENDn,m (w) Number of times w was visited in the last n days divided by the number of times w visited
in last m days, where the pair (n, m) 2 {(1, 7), (7, 28), (28, 365)}
Anchor + Target features
TFf (a, w) =
nf (a,w)
|f|
Relative phrase frequency of a in representation f of w, normalized by length of f, where
f 2 {title, first sentence, first paragraph}
POS1 (a, w) = pos1(a)
|w| Position of first occurrence of a in w, normalized by length of w
NCT(a, w) Does a contain the title of w?
TCN (a, w) Does the title of w contain a?
TEN (a, w) Does the title of w equal a?
COMMONNESS(a, w) Probability of w being the target of a link with anchor text a
a set E of edges; vertices are either a chunk ti, a target
w or an anchor (ti, a). Edges link each chunk ti to ti 1.
reasons. First, the algorithm to find link candidates is recall-
oriented and, therefore, produces many links for each chunk;
Features
35. Table 1: Features used for the learning to rerank approach.
Anchor features
LEN (a) = |a| Number of terms in the n-gram a
IDFf (a) Inverse document frequency of a in representation f, where f 2 {title, anchor, content}
KEYPHRASE(a) Probability that a is used as an anchor text in Wikipedia (documents)
LINKPROB(a) Probability that a is used as an anchor text in Wikipedia (occurrences)
SNIL(a) Number of articles whose title equals a sub-n-gram of a
SNCL(a) Number of articles whose title match a sub-n-gram of a
Target features
LINKSf (w) Number of Wikipedia articles linking to or from w, where f 2 {in, out} respectively
GEN (w) Depth of w in Wikipedia category hierarchy
REDIRECT(w) Number of redirect pages linking to w
WIKISTATSn (w) Number of times w was visited in the last n 2 {7, 28, 365} days
WIKISTATSTRENDn,m (w) Number of times w was visited in the last n days divided by the number of times w visited
in last m days, where the pair (n, m) 2 {(1, 7), (7, 28), (28, 365)}
Anchor + Target features
TFf (a, w) =
nf (a,w)
|f|
Relative phrase frequency of a in representation f of w, normalized by length of f, where
f 2 {title, first sentence, first paragraph}
POS1 (a, w) = pos1(a)
|w| Position of first occurrence of a in w, normalized by length of w
NCT(a, w) Does a contain the title of w?
TCN (a, w) Does the title of w contain a?
TEN (a, w) Does the title of w equal a?
COMMONNESS(a, w) Probability of w being the target of a link with anchor text a
a set E of edges; vertices are either a chunk ti, a target
w or an anchor (ti, a). Edges link each chunk ti to ti 1.
reasons. First, the algorithm to find link candidates is recall-
oriented and, therefore, produces many links for each chunk;
Features
Table 2: Context features used for learning to rerank
on top of the features listed in Table 1.
Context features
DEGREE(w, G) Number of edges connected to the
node representing Wikipedia article w
in context graph G.
DEGREE
CENTRALITY (w, G)
Centrality of Wikipedia article w in
context graph G, computed as the
ratio of edges connected to the node
representing w in G.
PAGERANK(w, G) Importance of the node representing
w in context graph G, measured using
PageRank.
Computing features from context graphs. To feed our
learning to rerank approach with information from the con-
text graph we compute a number of features for each link
candidate. These features are described in Table 2. First,
we compute the degree of the target Wikipedia article in
ranked list of all targe
dates that are produ
a video segment. Ou
new ranks for the ra
ements making up th
and mean average pr
ciency. We report the
on a single core of an
time per chunk indica
for one line of subtitl
should be noted, that
only requiring a simp
Features and baseli
a Wikipedia dump fr
which we calculate lin
tures, we collect visit
basis and aggregate t
This preprocessing m
at runtime. We cons
model using COMM
36. Evaluation
1,596 links
1,446 with known target Wikipedia article
897 unique Wikipedia articles
2.47 unique Wikipedia articles per minute
50 segments of DWDD
5,173 lines of subtitles
6.97 terms on average per line
6h 3m 41s of video
37. Results
trees, measured in terms of R-precision and MAP.
Table 4: Semantic linking results for the graph-
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
combination of the three context features does not yield fur-
ther improvements in e↵ectiveness over the individual con-
.7624
.7729N
.7728N
.7758N
.7749N
.7849N
.7884N
ern the
n terms
in pre-
tics for
refer to
ning to
etrieval
g text.
w that
ver the
ness at
00 mil-
cient
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
combination of the three context features does not yield fur-
ther improvements in e↵ectiveness over the individual con-
38. Results
trees, measured in terms of R-precision and MAP.
Table 4: Semantic linking results for the graph-
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
combination of the three context features does not yield fur-
ther improvements in e↵ectiveness over the individual con-
.7624
.7729N
.7728N
.7758N
.7749N
.7849N
.7884N
ern the
n terms
in pre-
tics for
refer to
ning to
etrieval
g text.
w that
ver the
ness at
00 mil-
cient
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
combination of the three context features does not yield fur-
ther improvements in e↵ectiveness over the individual con-
Significant
improvement
over baseline
39. Results
trees, measured in terms of R-precision and MAP.
Table 4: Semantic linking results for the graph-
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
combination of the three context features does not yield fur-
ther improvements in e↵ectiveness over the individual con-
.7624
.7729N
.7728N
.7758N
.7749N
.7849N
.7884N
ern the
n terms
in pre-
tics for
refer to
ning to
etrieval
g text.
w that
ver the
ness at
00 mil-
cient
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
combination of the three context features does not yield fur-
ther improvements in e↵ectiveness over the individual con-
Significant
improvement
over baseline
Significant
improvement over
baseline & L2R
40. Results
trees, measured in terms of R-precision and MAP.
Table 4: Semantic linking results for the graph-
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
combination of the three context features does not yield fur-
ther improvements in e↵ectiveness over the individual con-
.7624
.7729N
.7728N
.7758N
.7749N
.7849N
.7884N
ern the
n terms
in pre-
tics for
refer to
ning to
etrieval
g text.
w that
ver the
ness at
00 mil-
cient
based context model and an oracle run (indicating a
ceiling). Significant di↵erences, tested using a two-
tailed paired t-test, are indicated for lines 3–6 with
(none), M
(p < 0.05) and N
(p < 0.01); the position
of the symbol indicates whether the comparison is
against line 1 (left most) or line 2 (right most).
Average classification time
per chunk (in ms) R-Prec MAP
1. Baseline retrieval model 54 0.5753 0.6235
2. Learning to rerank approach 99 0.7177 0.7884
Learning to rerank (L2R) + one context graph feature
3. L2R+DEGREE 104 0.7375NM
0.8252NN
4. L2R+DEGREECENTRALITY 108 0.7454NN
0.8219NN
5. L2R+PAGERANK 119 0.7380NM
0.8187NN
Learning to rerank (L2R) + three context graph features
6. L2R+DEGREE+PAGERANK
+DEGREECENTRALITY 120 0.7341N
0.8204NN
7. Oracle picking the best out of
lines 3–5 for each video segment
0.7636 0.8400
combination of the three context features does not yield fur-
ther improvements in e↵ectiveness over the individual con-
Significant
improvement
over baseline
Significant
improvement over
baseline & L2R
No improvement
over one context
graph feature
43. • Semantic linking based on retrieval and machine
learning
• Real-time, highly effective and efficient
Wrapping up
44. • Semantic linking based on retrieval and machine
learning
• Real-time, highly effective and efficient
• Available as web service and open source software
• xTAS, Elastic Search
Wrapping up
45. • Semantic linking based on retrieval and machine
learning
• Real-time, highly effective and efficient
• Available as web service and open source software
• xTAS, Elastic Search
• Maarten de Rijke, derijke@uva.nl
Wrapping up
46.
47. • Technical work
• Extend to output of speech recognizer instead of
subtitles
• Online learning to rank for automatic optimization
• Linking to other types of content, including social
Next steps