Automatic Image Annotation (AIA)

Seminar Report
Presented to:
Dr. Shanbehzadeh
Presented by:
Farzaneh Rezaei
November 2015

What is the goal of computer vision ?
Perceive the story
behind the picture
See the world!!
But what exactly does it
mean to see?
2Source: Wall-e Movie: Pixar, Walt Disney Pictures

Outline
Introduction
To Image
Annotation
• What?
• Why?
Story Behind
AIA
• Components of AIA
• Progress of AIA
• Issues &
Conclusions
Going deeper !
• Feature Extraction
• Learning Methods
• Deep Learning
• Conclusions
Useful
Information
• Recent Articles
• Toolbox
• Databases
• Authors
Conclusions
• References
3

Outline
Introduction
To Image
Annotation
• What?
• Why?
Story Behind
AIA
• Progress of AIA
• Issues &
Conclusions
Going deeper !
• Deep Learning
• Conclusions
Useful
Information
• Recent Articles
• Toolbox
• Databases
• Authors
Conclusions
• References
4

What is Automatic
Image Annotation?
Automatic image annotation is the
task of automatically assigning
words to an image that describe the
content of the image.
Munirathnam Srikanth, et al. Exploiting
ontologies for automatic image
annotation
Source: Personalizing Automated Image Annotation Using Cross-Entropy: https://ivi.fnwi.uva.nl/isis/publications/bibtexbrowser.php?key=LiICM2011&bib=all.bib
5

What is Automatic Image Annotation?(Cont.)
Source: MS COCO Captioning Challenge: http://mscoco.org/dataset/#captions-challenge2015
6

3,000 Photos Are Uploaded
Every Second to Facebook
Why Image Annotation
is important?
Recently, we have witnessed an
exponential growth of user generated
videos and images, due to the
booming of social networks, such as
Facebook and Flickr.
Source: petapixel.com
Source: http://petapixel.com/2012/02/01/3000-photos-are-uploaded-every-second-to-facebook/
7

Why Image Annotation is important?(Cont.)
Source: Barriuso, A., & Torralba, A. (2012). Notes on image annotation
• Applications e.g. Photo organizer
apps
• Image Classification Systems
8

Numbers of articles per year for “Automatic Image Annotation”
(in Title of article)
0
10
20
30
40
50
60
70
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Year Reported by: Google Scholar
9

Outline
Introduction
To Image
Annotation
• What?
• Why?
Story Behind
AIA
• Progress of AIA
• Issues &
Conclusions
Going deeper !
• Deep Learning
• Conclusions
Useful
Information
• Recent Articles
• Toolbox
• Databases
• Authors
Conclusions
• References
10

How do you annotate these images?
11

What are
components of
Automatic Image
Annotation
System ?
12

How to classify Images ?
What are
components of
Automatic Image
Annotation
System ?
13

Feature
Extraction
Classification
Methods
What are
components of
Automatic Image
Annotation
System ?
14

What are
components of
Automatic Image
Annotation
System ?
Classification
Methods
Feature
Extraction
15

What are
components of
Automatic Image
Annotation
System ?
Feature
Extraction
Classification
Methods
Pattern Recognition !!
16

18
An Example of classical approaches in AIA
Source: Zhang, D., Islam, M. M., &
Lu, G. (2012). A review on
automatic image annotation
techniques. Pattern Recognition,
45(1), 346–362.
doi:10.1016/j.patcog.2011.05.013

Theoretical Limitations of Shallow Architectures*
Functions that can be compactly represented by a depth k architecture
might require an exponential number of computational elements to
be represented by a depth k − 1 architecture
Issues of classical approaches
19
*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning

Issues of classical approaches (Cont.)
Theoretical Limitations of Shallow Architectures
• Shallow? Deep?
• Functions?
• Compact?
• Depth?
• Computational Elements?
20
logic circuit

21Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning

• Linear regression and logistic regression have depth 1, i.e., have a single level.
• Ordinary multi-layer neural networks With the most common choice of one
hidden layer, they have depth two
• Decision trees can also be seen as having two levels
• Boosting (Freund & Schapire, 1996) usually adds one level to its base learners:
that level computes a vote or linear combination of the outputs of the base
learners
22

• Shallow? Deep?
• Functions
• Compact
• Depth
• Computational Elements
23

Theoretical Limitations of Shallow Architectures*
Functions that can be compactly represented by a depth k architecture
might require an exponential number of computational elements to
be represented by a depth k − 1 architecture
Issues of classical approaches
24
*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning

• A two-layer circuit of logic gates can represent any boolean function (Mendelson, 1997).
• With depth two logical circuits, most boolean functions require an exponential number
of logic gates (Wegener, 1987) to be represented (with respect to input size)
• There are functions computable with a polynomial-size logic gates circuit of depth k that
require exponential size when restricted to depth k − 1 (Hastad, 1986) The proof of this
theorem relies on earlier results (Yao, 1985) showing that d-bit parity circuits of depth 2
have exponential size
25

• One might wonder whether these computational complexity results for boolean
circuits are relevant to machine learning.
• See Orponen (1994)!
• for an early survey of theoretical results in computational complexity relevant to
learning algorithms. Interestingly, many of the results for boolean circuits can be
generalized to architectures whose computational elements are linear threshold
units (also known as artificial neurons (McCulloch & Pitts, 1943)), which compute:
f(x) = w0 x+b≥0 (1)
with parameters w and b.
26

27
1 Theoretical Limitations of Shallow Architectures
2 Theoretical Advantages of Deep Architectures
Which one ?? !

How to assign a word to
an image ?
What are
components of
Automatic Image
Annotation
System ?
Feature
Extraction
Classification
Methods
30
Components
of AIA
Classical or
Shallow
Structure
Issues

31http://graffiti-artist.net/corporate-offices/ny-facebook-office-graffiti/

Outline
Introduction
To Image
Annotation
• What?
• Why?
Story Behind
AIA
• Progress of AIA
• Issues &
Conclusions
Going deeper !
• CNN
• Conclusions
Useful
Information
• Recent Articles
• Toolbox
• Databases
• Authors
Conclusions
• References
32

Going Deeper!
• Color
• Texture
• Shape
• Segmentation
Feature
Extraction &
Representation
• ANN
• SVM
• Bayes
• Metadata
Learning
Methods
33

Feature Extraction
ColorHistogram
Color
Moments
Color
Coherence
Vector
Color
Correlogram Scalable
Color
Descriptor
Color
Structure
Descriptor
Dominant
Color
Descriptor
Spatial
• Statistical
• Structural
• Model-based
Spectral
• FT, DCT, Wavelet,
..
Texture
34

Color: Comparisons
Color method Pros Cons
Histogram Simple to compute, intuitive High dimension, no spatial info,
sensitive to noise
CM Compact, robust Not enough to describe all colors,
no spatial info
CCV Spatial info High dimension, high computation
cost
Correlogram Spatial info Very high computation cost,
sensitive to noise, rotation and
scale
37

Color: Comparisons (Cont.)
DCD Compact, robust,
perceptual meaning
Need post-processing for spatial
info
CSD Spatial info Sensitive to noise, rotation and
scale
SCD Compact on need,
scalability
No spatial info, less accurate if
compact
38

Spatial Texture : Comparisons
Texton Intuitive Sensitive to noise, rotation and
scale, difficult to define textons
GLCM based method Intuitive, compact, robust High High computation cost, not enough
to describe all
Tamura Perceptually meaningful Too few features
SAR Compact, robust, rotation
invariant
High computation cost, difficult to
define pattern size
FD Compact, perceptually meaningful computation cost, sensitive to scale
39

Spectral Texture : Comparisons (Cont.)
FT/DCT Fast computation Sensitive to scale and rotation
Wavelet Fast computation, multi-resolution Sensitive to rotation, limited
orientations
Gabor Multi-scale, multi-orientation,
robust
normalisation, losing of spectral
information due to incomplete
cover of spectrum plane
Curvelet Multi-resolution, multi-orientation,
robust
Need rotation normalisation
40

Shape
Chart Source: [Zhang and Lu 2004]
41

Chart Source:
[M. Yang, K. Kpalma, J. Ronsin 2008]
Shape (Cont.)
42

Shape (Cont.)
Contour
Based
Calculate shape
features only from
the boundary
of the shape
Region
Based
Extract features
from the entire
region
43

Shape (Cont.)
• Because contour based techniques are more sensitive to noise than
region based techniques.
• Therefore, color image retrieval usually employs region based shape
features.
44

Learning Methods:
Learning Methods
• SVM
• ANN
• Tree
• Parametric
• Non-Parametric
45

Learning Methods: Comparisons
Annotation method Pros Cons
SVM Small sample, optimal class
boundary, non-linear classification
Single labelling, one class per time,
expensive trial and run, sensitive to
noisy data, prone to over-fitting
ANN Multiclass outputs, non- linear
classification, robust to noisy data,
suitable for complex problem
Single labelling, sub-optimal,
expensive training, complex and
black box classification
DT Intuitive, semantic rules, multiclass
outputs, fast, allow missing values,
handle both categorical and
numerical values
Single labelling, sub-optimal, need
pruning, can be unstable
46

Learning Methods: Comparisons
Annotation method Pros Cons
Non-parametric Multi-labelling, model free, fast Large number of parameters, large
sample, sensitive to noisy data
Parametric Multi-labelling, small sample, good
approximation of unknown
distribution
Predefined distribution, expensive
training, approximated boundary
Metadata Use of both textual and visual
features
Difficult to relate visual features
with textual features, difficult
textual feature extraction
47

Deep Learning
48
• Deep belief networks
• Deep Boltzmann machines
• Deep Convolutional neural networks
• Deep Recurrent neural networks
• Hierarchical temporal memory
Source: https://en.wikipedia.org/wiki/List_of_machine_learning_concepts

Deep Learning (Cont.)
49
Source: Ranzato, 4 October 2013, Slides

Deep Learning (Cont.)
50
•A Potential Problem with Deep Learning *??
•Optimization Task
• See :
• Bengio’s Articles!
• Hot videos about Deep Learning on YouTube!
• Ranzato, 4 October 2013:
• https://www.youtube.com/watch?v=clgMTk5V
2Sk
*: Ranzato, 4 October 2013, Slides

Outline
Introduction
To Image
Annotation
• What?
• Why?
Story Behind
AIA
• Progress of AIA
• Issues &
Conclusions
Going deeper !
• Deep Learning
• Conclusions
Useful
Information
• Recent Articles
• Toolbox
• Databases
• Authors
Conclusions
• References
51

2009, Shallow
Source: Venkatesh N. Mur thy, S. Maji, R. Manmatha, Automatic Image Annotation using Deep Learning Representations 2015
Useful Information: Recent Articles
52

53
Which one ?? !
1 Theoretical Limitations of Shallow Architectures
2 Theoretical Advantages of Deep Architectures

Source: B. Klein, G. Lev, G. Sadeh, and L. Wolf, Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation 2015
Useful Information: Recent Articles (Cont.)
54

Useful Information: Toolbox
MatConvNet
• MatConvNet is a MATLAB toolbox
implementing Convolutional Neural
Networks (CNNs) for computer vision
applications. It is simple, efficient, and can run
and learn state-of-the-art CNNs. Several
example CNNs are included to classify and
encode images.
Caffe
• Caffe is a deep learning framework made with
expression, speed, and modularity in mind. It
is developed by the Berkeley Vision and
Learning Center (BVLC) and by community
contributors.Yangqing Jia created the project
during his PhD at UC Berkeley. Caffe is
released under the BSD 2-Clause license.
55

Useful Information: Databases
an important
benchmark for
keyword based image
retrieval and image
annotation
5000 images
manually annotated
with 1 to 5 keywords.
The vocabulary
contains 260 words.
Corel5k:
This data set is
obtained from an
online game where
two players, that can
not communicate
outside the game,
gain points by
agreeing on words
describing the image
ESP Game:
This set of 20.000
images accompanied
with descriptions in
several languages
was initially published
for cross-lingual
retrieval
IAPR TC12:
56

Useful Information: Databases
• Other Databases:
• Flicker8,10,30
Table Source: M. Guillaumin, T. Mensink, J. Verbeek and C. Schmid, TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation
57

Useful Information: Authors
Cordelia Schmid
•Research director INRIA
•Computer vision, object
recognition, video
recognition, learning
Li Fei-Fei
•Professor, Stanford University
•Artificial Intelligence, Machine
Learning, Computer
Vision, Neuroscience
Yoshua Bengio
•Professor, U. Montreal, Computer Sc.
•Machine learning, deep
learning, artificial intelligence
Reported by: Google Scholar
58

Useful Information: Authors (Cont.)
Richard Socher
•MetaMind
•deep learning, machine learning, natural language
processing, computer vision
59
Recursive Deep Learning for Natural Language
Processing and Computer Vision,
PhD Thesis, Computer Science Department, Stanford
University
2014 Arthur L. Samuel Best Computer Science PhD
Thesis Award
Reported by: Google Scholar

Outline
Introduction
To Image
Annotation
• What?
• Why?
Story Behind
AIA
• Progress of AIA
• Issues &
Conclusions
Going deeper !
• Deep Learning
• Conclusions
Useful
Information
• Recent Articles
• Toolbox
• Databases
• Authors
Conclusions
• References
60

How to assign a word to
an image ?
What are
components of
Automatic Image
Annotation
System ?
Feature
Extraction
Classification
Methods
61
Components
of AIA
Classical or
Shallow
Structure
Issues
Conclusions!!!

1. High dimensional feature analysis
2. How to build an effective annotation model?
3. The third issue is that currently annotation and
ranking are done online simultaneously in the
multiple labelling annotation approaches. This is not
efficient for image retrieval.
4. Lack of standard vocabulary and taxonomy.
5. There is no commonly acceptable image database
6. insufficient depth of architectures, and locality of
estimators[Bengio, 2009]
62
Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning
Source: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition,
45(1), 346–362. doi:10.1016/j.patcog.2011.05.013
Conclusions (Cont.)

Automatic Image Annotation (AIA)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Automatic Image Annotation (AIA)

Similar to Automatic Image Annotation (AIA) (20)

Recently uploaded

Recently uploaded (20)

Automatic Image Annotation (AIA)

Editor's Notes