Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Articles

Image Recommendation
for Wikipedia Articles
Oleh Onyshchak, Miriam Redi

Content
● Problem Formulation
● Related Work
● Data
● Solutions Approach
● Experiments
● Demo
● Conclusions
● Future Work
● Review Comments
2

Identifying relevant images for the article
4

Thesis Goal
5
Develop a model to recommend relevant images for Wikipedia
article

Multimodal Learning Approach
6
https://link.springer.com/article/10.1007/s11042-016-3380-8

Related Work
1. Rasiwasia et al.(2010) - cross-modal retrieval for Wikipedia articles.
The dataset contains featured articles of 10 most popular
categories. Their solution approach is to exploit correlation
between text & image features obtained via latent Dirichlet
allocation and SIFT models respectively
2. Hessel et al(2018) - visual concreteness of particular topic for
Wikipedia articles. The dataset contains 192K most popular
articles, specifically included images and topics.
3. Dong et al.(2018) - cross-modal retrieval for Flickr dataset
leveraged by deep neural networks.
7

Main Subtasks
8
1. Dataset сollection
2. Multimodal model adjustment to our real-world data

Selection
Featured Articles:
● 5.6K articles
● 57K images
Good Articles:
● 36.4K articles
● 216K images
10

Collection
1. article
a. text content
b. title
1. images
b. raw images
c. metadata: description, title
d. only publicly available
11

Preprocessing
1. text:
a. wiki-markup removal
1. image:
b. converting everything to 600px width JPEG
c. icon removal
d. title words parsing
e. storing image features(computed with ResNet152)
12

Word2VisualVec
14
https://github.com/danieljf24/w2vv/blob/master/w2vv.jpg

Evaluation Setting
1. image-level split
a. images from the same article might appear in both test and train subsets
b. theoretical model precision with comprehensive fine grained dataset
1. article-level split
b. images from the same article always either in test or in train subset
c. real-world performance of this particular model
16

Baseline
Alternative to multimodal approach is classical text-based
techniques. We will experiment with a following models and choose
the best one as our baseline:
● word2vec
● wikipedia2vec
● inferText
● co-occurrence
17

Jupiter
24
https://en.wikipedia.org/wiki/Jupiter

Kennedy Half Dollar
25
https://en.wikipedia.org/wiki/Kennedy_half_dollar

Contribution (Conclusions)
26
1. Dataset сollection
a. 36.4K articles
b. 216K images
2. Identify best-performing text-similarity baseline
3. Word2VisualVec model adjustment to our real-world data
a. image-level model outperformed baseline by 145%*
b. article-level model outperformed baseline by 37%*
* performance compared based on averaging the R@1, R@3 and R@10 scores

Future Work
27
● create an API for our model to be accessible in real time
● adjust evaluation metric to recognise all photos of the same
entity as correct match, not just one mentioned in the article
● properly experiment with compound Word2VisualVec + text-
similarity model
● try more complex model, which learns best feature
representation, not assume one
● use more metadata such as article topics
● retrain the model on a bigger “good articles” dataset

Review Comments
1. There is no implementation details described about text encoding
methods ( see Section 4.3.2) even though that they are crucial for the
proper performance
a. Rather Disagree. All details are described in the original paper of model’s
authors. We concentrated on covering our own contribution in the thesis. But
we can see the benefit of replicating this information to make the thesis more
self-contained
2. There is no dataset statistics, train-val split descriptions and so on in the
thesis nor in the relevant kaggle-dataset page
a. Disagree. Statistics of article/image count is available. Dataset selection,
collection, cleaning, and formatting are described in details. But we agree that
additional EDA would be beneficial.
3. The problems with a presentation are small but numerous
a. Agree. Experimental section could be presented better.
28

Thank You!
oleh.onyshchak@gmail.com
29

Conclusions
31
1. Developed a simple cross-modal retrieval model, which
significantly outperforms our baseline
2. Showed that performance might be significantly better with
huge fine-grained dataset
3. Developed a simple text-similarity model to show that it
contains supplementary predicting power
4. Created a real-world multimodal dataset, which is publicly
available

References
● http://dx.doi.org/10.13140/RG.2.2.17463.27042
● https://ieeexplore.ieee.org/document/8353472
32

Maserati MC12
33
https://en.wikipedia.org/wiki/Maserati_MC12

Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Articles

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Articles

Similar to Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Articles (20)

More from Lviv Data Science Summer School

More from Lviv Data Science Summer School (20)

Recently uploaded

Recently uploaded (20)

Master defence 2020 - Oleh Onyshchak - Image Recommendation for Wikipedia Articles