This document discusses methods for large-scale image annotation and categorization using weakly supervised training data. It describes how traditional methods do not scale well to large datasets. Recent methods exploit linear models and distance metric learning to better scale. Specifically, Canonical Contextual Distance learning finds linear transformations to maximize correlation between image and label features in a latent subspace, providing a probabilistic similarity measure. This allows image auto-annotation on large datasets.
Presentation at International Advanced School on Knowledge Co-creation and Service Innovation 2012, Japan Advanced Institute of Science and Technology, March 1
Design Approach of Colour Image Denoising Using Adaptive WaveletIJERD Editor
International Journal of Engineering Research and Development is an international premier peer reviewed open access engineering and technology journal promoting the discovery, innovation, advancement and dissemination of basic and transitional knowledge in engineering, technology and related disciplines.
Presentation at International Advanced School on Knowledge Co-creation and Service Innovation 2012, Japan Advanced Institute of Science and Technology, March 1
Design Approach of Colour Image Denoising Using Adaptive WaveletIJERD Editor
International Journal of Engineering Research and Development is an international premier peer reviewed open access engineering and technology journal promoting the discovery, innovation, advancement and dissemination of basic and transitional knowledge in engineering, technology and related disciplines.
Transformations en ondelettes 2D directionnelles - Un panoramaLaurent Duval
La quête de représentations optimales en traitement d'images et vision par ordinateur se heurte à la variété de contenu des données bidimensionnelles. De nombreux travaux se sont cependant attelés aux tâches de séparation de zones régulières, de contours, de textures, à la recherche d'un compromis entre complexité et efficacité de représentation. La prise en compte des aspects multi-échelles, dans le siècle de l'invention des ondelettes, a joué un rôle important en l'analyse d'images. La dernière décennie a ainsi vu apparaître une série de méthodes efficaces, combinant des aspects multi-échelle à des aspects directionnels et fréquentiels, permettant de mieux prendre en compte l'orientation des éléments d'intérêt des images (curvelets, shearlets, contourlets, ridgelets). Leur fréquente redondance leur permet d'obtenir des représentations plus parcimonieuses et parfois quasi-invariantes pour certaines transformations usuelles (translation, rotation). Ces méthodes sont la motivation d'un panorama thématique. Quelques liens avec des outils plus proches de la morphologie mathématique seront evoqués.
Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...Laurent Duval
La quête de représentations optimales en traitement d'images et vision par ordinateur se heurte à la variété de contenu des données bidimensionnelles. De nombreux travaux se sont cependant attelés aux tâches de séparation de zones régulières, de contours, de textures, à la recherche d'un compromis entre complexité et efficacité de représentation. La prise en compte des aspects multi-échelles, dans le siècle de l'invention des ondelettes, a joué un rôle important en l'analyse d'images. La dernière décennie a ainsi vu apparaître une série de méthodes efficaces, combinant des aspects multi-échelle à des aspects directionnels et fréquentiels, permettant de mieux prendre en compte l'orientation des éléments d'intérêt des images : Activelet, AMlet, Armlet, Bandlet, Barlet, Bathlet, Beamlet, Binlet, Bumplet, Brushlet, Camplet, Caplet, Chirplet, Chordlet, Circlet, Coiflet, Contourlet, Cooklet, Coslet, Craplet, Cubelet, CURElet, Curvelet, Daublet, Directionlet, Dreamlet, Edgelet, ERBlet, FAMlet, FLaglet, Flatlet, Fourierlet, Framelet, Fresnelet, Gaborlet, GAMlet, Gausslet, Graphlet, Grouplet, Haarlet, Haardlet, Heatlet, Hutlet, Hyperbolet, Icalet (Icalette), Interpolet, Lesslet (cf. Morelet), Loglet, Marrlet, MIMOlet, Monowavelet, Morelet, Morphlet, Multiselectivelet, Multiwavelet, Needlet, Noiselet, Ondelette/wavelet, Ondulette, Prewavelet, Phaselet, Planelet, Platelet, Purelet, Quadlet/q-Quadlet, QVlet, Radonlet, RAMlet, Randlet, Ranklet, Ridgelet, Riezlet, Ripplet (original, type-I and II), Scalet, S2let, Seamlet, Seislet, Shadelet, Shapelet, Shearlet, Sinclet, Singlet, Sinlet, Slantlet, Smoothlet, Snakelet, SOHOlet, Sparselet, Spikelet, Splinelet, Starlet, Steerlet, Stokeslet, SURE-let (SURElet), Surfacelet, Surflet, Symlet/Symmlet, S2let, Tetrolet, Treelet, Vaguelette, Wavelet-Vaguelette, Wavelet, Warblet, Warplet, Wedgelet, Xlet/X-let
Presentación de la Universidad de Granada sobre cambios de color en escenarios naturales debidos a la interacción entre luz y atmósfera, realizada durante las jornadas HOIP 2010 organizadas por la Unidad de Sistemas de Información e Interacción TECNALIA.
Más información en http://www.tecnalia.com/es/ict-european-software-institute/index.htm
Transformations en ondelettes 2D directionnelles - Un panoramaLaurent Duval
La quête de représentations optimales en traitement d'images et vision par ordinateur se heurte à la variété de contenu des données bidimensionnelles. De nombreux travaux se sont cependant attelés aux tâches de séparation de zones régulières, de contours, de textures, à la recherche d'un compromis entre complexité et efficacité de représentation. La prise en compte des aspects multi-échelles, dans le siècle de l'invention des ondelettes, a joué un rôle important en l'analyse d'images. La dernière décennie a ainsi vu apparaître une série de méthodes efficaces, combinant des aspects multi-échelle à des aspects directionnels et fréquentiels, permettant de mieux prendre en compte l'orientation des éléments d'intérêt des images (curvelets, shearlets, contourlets, ridgelets). Leur fréquente redondance leur permet d'obtenir des représentations plus parcimonieuses et parfois quasi-invariantes pour certaines transformations usuelles (translation, rotation). Ces méthodes sont la motivation d'un panorama thématique. Quelques liens avec des outils plus proches de la morphologie mathématique seront evoqués.
Ondelettes, représentations bidimensionnelles, multi-échelles et géométriques...Laurent Duval
La quête de représentations optimales en traitement d'images et vision par ordinateur se heurte à la variété de contenu des données bidimensionnelles. De nombreux travaux se sont cependant attelés aux tâches de séparation de zones régulières, de contours, de textures, à la recherche d'un compromis entre complexité et efficacité de représentation. La prise en compte des aspects multi-échelles, dans le siècle de l'invention des ondelettes, a joué un rôle important en l'analyse d'images. La dernière décennie a ainsi vu apparaître une série de méthodes efficaces, combinant des aspects multi-échelle à des aspects directionnels et fréquentiels, permettant de mieux prendre en compte l'orientation des éléments d'intérêt des images : Activelet, AMlet, Armlet, Bandlet, Barlet, Bathlet, Beamlet, Binlet, Bumplet, Brushlet, Camplet, Caplet, Chirplet, Chordlet, Circlet, Coiflet, Contourlet, Cooklet, Coslet, Craplet, Cubelet, CURElet, Curvelet, Daublet, Directionlet, Dreamlet, Edgelet, ERBlet, FAMlet, FLaglet, Flatlet, Fourierlet, Framelet, Fresnelet, Gaborlet, GAMlet, Gausslet, Graphlet, Grouplet, Haarlet, Haardlet, Heatlet, Hutlet, Hyperbolet, Icalet (Icalette), Interpolet, Lesslet (cf. Morelet), Loglet, Marrlet, MIMOlet, Monowavelet, Morelet, Morphlet, Multiselectivelet, Multiwavelet, Needlet, Noiselet, Ondelette/wavelet, Ondulette, Prewavelet, Phaselet, Planelet, Platelet, Purelet, Quadlet/q-Quadlet, QVlet, Radonlet, RAMlet, Randlet, Ranklet, Ridgelet, Riezlet, Ripplet (original, type-I and II), Scalet, S2let, Seamlet, Seislet, Shadelet, Shapelet, Shearlet, Sinclet, Singlet, Sinlet, Slantlet, Smoothlet, Snakelet, SOHOlet, Sparselet, Spikelet, Splinelet, Starlet, Steerlet, Stokeslet, SURE-let (SURElet), Surfacelet, Surflet, Symlet/Symmlet, S2let, Tetrolet, Treelet, Vaguelette, Wavelet-Vaguelette, Wavelet, Warblet, Warplet, Wedgelet, Xlet/X-let
Presentación de la Universidad de Granada sobre cambios de color en escenarios naturales debidos a la interacción entre luz y atmósfera, realizada durante las jornadas HOIP 2010 organizadas por la Unidad de Sistemas de Información e Interacción TECNALIA.
Más información en http://www.tecnalia.com/es/ict-european-software-institute/index.htm
GTC Japan 2015 - Experiments to apply Deep Learning to Forex time series dataYuki Hayashi
Since most of previous studies about pattern recognition on FX time series data were done by handcrafted features which were not easy to program and lacking flexibility. We propose the LSTM based deep neural network and its training method which helps traders to duplicate their strategies by just feeding what they see on the chart as training samples for AI.
We have built a camera that can look around corners and beyond the line of sight. The camera uses light that travels from the object to the camera indirectly, by reflecting off walls or other obstacles, to reconstruct a 3D shape.
Cluster based landmark and event detection for tagged photo collectionsSymeon Papadopoulos
A simple presentation of the article: "Cluster-based landmark and event detection for tagged photo collections" on the IEEE MultiMedia magazine.
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5611558
In large scale visual pattern recognition applications, when the subject set is large the traditional linear models like PCA/LDA/LPP, become inadequate in capturing the non-linearity and local variations of visual appearance manifold. Kernelized solutions can alleviate the problem to certain degree, but faces a computational complexity challenge of solving eigen or QP problems of size n x n for number of training samples n. In this work, we developed a novel solution to this problem by applying a data partition first and obtain a rich set of local data patch models, then the hierarchical structure of this rich set of models are computed with subspace clustering on Grassmanian manifold, via a VQ like algorithm with data partition locality constraint. At query time, a probe image is projected to the data space partition first to obtain the probe model, and the optimal local model is computed by traversing the model hierarchical tree. Simulation results demonstrated the effectiveness of this solution in computational efficiency and recognition accuracy, with applications in large subject set face recognition and image retrieval.
[Bio]
Zhu Li is currently a Senior Staff Researcher and Media Analytics & Processing Group Lead with the Media Networking Lab, Core Networks Research, FutureWei (Huawei) Technology USA, at Bridgewater, New Jersey. He received his PhD in Electrical & Computer Engineering from Northwestern University, Evanston in 2004. He was an Assistant Professor with the Dept of Computing, The Hong Kong Polytechnic University from 2008 to 2010, and a Senior Research Engineer, Senior Staff Research Engineering, and then Principal Staff Research Engineer with the Multimedia Research Lab (MRL), Motorola Labs, Schaumburg, Illinois, from 2000 to 2008. His research interests include audio-visual analytics and machine learning with its application in large scale video repositories annotation, search and recommendation, as well as video adaptation, source-channel coding and distributed optimization issues of the wireless video networks. He has 21 issued or pending patents, 70+ publications in book chapters, journals, conference proceedings and standards contributions in these areas. He is an IEEE senior member, elected Vice Chair of the IEEE Multimedia Communication Technical Committee (MMTC) 2008~2010, co-editor for the Springer-Verlag book on "Intelligent Video Communication: Techniques and Applications". He served on numerous conference and workshop TPCs and was symposium co-chair at IEEE ICC'2008, and on Best Paper Award Committee for IEEE ICME 2010. He received the Best Poster Paper Award from IEEE Int'l Conf on Multimedia & Expo (ICME) at Toronto, 2006, and the Best Paper Award from IEEE Int'l Conf on Image Processing (ICIP) at San Antonio, 2007.
Neuronal structures are intricately related to their functions. Study of the neuronal structures reveals healthy and pathologic conditions, crucial to understanding how the Brain works. Current advances in microscopy techniques produce huge volume of data where manual reconstruction and analysis may take several years. Moreover, most of this data is sparse; hence digital reconstructions capturing the essential structural information of the neuronal networks provide ease of archiving, exchanging and analysing. The lack of powerful computational tools to automatically reconstruct neuronal arbors has emerged as a major technical bottleneck in neuroscience research. This work extends the Marked Point Process methodology, which has been proved to be an efficient framework for network extraction in 2D, to 3D neuronal network extraction from microscopy image stacks. The optimization process considers a multiple birth and death dynamics embedded in a simulated annealing scheme. To speed up the convergence a birth map based on the projection of the neuronal processes is considered.
1. Features and Learning Methods for
Large-scale Image Annotation and Categorization
Hideki Nakayama
The University of Tokyo
Department of Creative Informatics
2013/1/15
2. My research interest
Generic image (object) recognition
Whole-image level recognition
Weakly supervised training samples
画像アノテーショ
ン
一枚の画像全体へ
複数の単語を付与
(without region correspondence)
3. The era of big data
We can use gigantic weakly-labeled web data now!
Tags:
Nikon D200 DSLR Nikkor 60mmf28dmicro Nature
Landscape
Lake Idaho Ice Sunset Sun Mountain
Sky Frozen AnAwesomeShot
ImpressedBeauty isawyoufirst
ABigFave Ljomi ljspot4 ColorPhotoAward
http://www.flickr.com/
Flickr: 6 billion images (2011)
Facebook: 3 billion images every year
Youtube: 8 year movies every day
4. More data helps recognition
Simple k-NN using Flickr images & tags
query
Recog. result
100K dataset 1.6M dataset 12M dataset
football soccer varsity girls boys football soccer festival college church stainedglass football
travel party family school high futbol park people cycling bath city vacation travel
marchingband vacation cathedral window glass
Nearest neighbors
6. Challenge: scaling to large training data
Traditional methods are not scalable in training
Bag-of-visual words + kernel SVM (chi-square, etc)
complexity memory
O N2 ~ O N3 O N2
cf. [Yang et al., CVPR’09]
☹
Recent methods exploit linear methods
With carefully designed image features, where dot kernel
approximates the similarity between instances
☺
complexity memory
ON O1
8. Example-based image annotation
Standard approach for image annotation problem
K-NN tiger
tiger Kernel density
forest estimation
grass
etc… water
cow
street
city MBRM [Feng et al, 2004],
sea JEC [Makadia et al, 2008]
wave
Similar image people
TagProp [Guillaumin et al, 2009]
search
plane
sky
jet
Problem:
grass
How to define tiger
similarity? water
people
tree
stone
Image and label data
(training samples)
9. Fundamental problem: Semantic gap
Visually similar ≠ semantically similar
I look my dog contest:
http://www.hemmy.net/2006/06/25
/i-look-like-my-dog-contest/
Solution: Distance metric learning
10. Canonical Contextual Distance [Nakayama+, BMVC’10]
Canonical Correlation Analysis (CCA)
x : image features (e.g. BoVW), y : binary label vector
finds linear transformations
s AT x x , t BT y y that maximizes the correlation between s and t
t
1 2
XY YY YX A XX A AT XX A I
s
X t
Y YX
1
XX XY B YY B 2
BT YY B I
s : covariance matrices
Image feature Canonical space Labels feature
: canonical correlatio ns
similarity measure in the latent subspace
using probabilistic structure
latent
variable
z z ~ N 0, I d , min{ p, q} d 1
x|z ~ N Wx z x , x , Wx Rp d
x y y|z ~ N Wy z y , y , Wy Rq d
image labels
feature feature
Probabilistic interpretation of CCA [Bach and Jordan, 2005]
11. CCD for image auto-annotation
M x AT x x
T
T
E z | xi , y i
Mx I 2 1
1
I 2 1
1
AT x i x E z|x
My I 2
I 2
BT y i y
T
Mx
T
2 1 2 1
Mx
var z | x I M xM x
I I
var z | x i , y i I 1 1
My I 2
I 2 My
1 N p z | x i , y i p z | x dz
P w | xs P w | li P li | x s P li | x N
N i 1 p z | x j , y j p z | x dz
j 1
P w | li w , li 1 IDF w
w,li : annotation of training samples
1
12. Features
Image features
BoVW, GIST, etc… (off-the-shelf ones)
Needs to be encoded in a Euclidean space
Labels features
Binary occurrence vector cf. [Guillaumin et al., CVPR’10]
When the dictionary contains 「plane, sea, sky, clouds, mountain」
Ij y j = (1, 0, 1, 1, 0)
plane sky clouds y j , yk 2
Ik yk 0, 0, 1, 1, 1 Dot product counts the
number of common labels.
sky clouds
mountain
13. Evaluation
Benchmark datasets
Corel5K IAPR-TC12 ESP Game
# of words 260 291 268
# of training images 4,500 17,665 18,689
# of testing images 499 1,962 2,081
# of words per
3.4/5 5.7/23 4.7/15
image
(avg./max)
16. Basic pipeline
0 .5
1 .2
0 .1
1. Local feature extraction 2. Coding image-level
1-1. feature detector feature vector
(Operator, grid)
1-2. descriptor
(SIFT, SURF, …) How to encode similarity between
distributions of local features?
17. Bag-of-Visual-Words (traditional) [Csurka et al. 2004]
Vector quantization → histogram
○ computationally efficient
× large reconstruction error
× non-linear property (must be used with non-linear kernel)
Training images
Visual
Local features words
query
Credit: K. Yanai
18. New BoVW① sparse coding + max pooling
Reduce reconstruction error using multiple basis (words)
Max pooling leads to linearly-separable image signatures
(taking max response for each visual word) cf. [Boureau et al., ICML’10]
[Yang+, CVPR’09] [Wang+, CVPR’10]
19. New BoVW② encode higher-level statistics
N: # of visual words (10^3~10^4)
d: dimension of descriptor (10~100
Method Statistics Dim. of image signature
BoVW count (ratio) N
VLAD [Jegou+,CVPR’10] mean Nd
Super vector [Zhou+, ECCV’10] ratio+mean N(d+1)
Fisher vector [Perronnin+, mean+variance 2Nd
ECCV’10]
Global Gaussian mean+covariance d(d+1)/2
[Nakayama+, CVPR’10] (N=1)
VLAT [Picard+, ICIP’11] mean+covariance Nd(d+1)/2
Encoded in a feature vector so that the dot product
approximates the distance between distributions
20. Global Gaussian Coding [Nakayama+, CVPR’10]
Exploit Riemannian manifold of Gaussian
using information geometry framework
1 1 T
p x; d /2
exp x μ 1
x μ
2 2
x : local descriptor
Affine coordinates
2 T
η 1 ,, d , 11
2
1 , 12 1 2 ,, 1d 1 d , 22
2
2 ,, dd d
η, η η ηT G η η η
Inverse of Fisher information matrix
We use G η (metric on the center of samples) for entire space
η
ηi , η j ηT G η η j
i
Somewhat approximates
the KL-divergence…
21. Competition
Large-scale visual recognition challenge 2010
1000-class categorization
1.2M training images, 150K testing images
Evaluate top 5 classification accuracy
Part of ImageNet dataset [Deng et al.]
Labeled with Amazon Mechanical Turk
14M images, 22K categories (as of 2011)
Semantic structure according to WordNet
Credit: Fei-Fei Li
22. Result (2010)
11 teams participated
1. NEC+UIUC (72%) 80,000~260,000 dim ×6
2. Xerox Research (64%) 260,000 dim ×2
3. ISI(55%) 12,000 dim
4. UC Irvine (53%)
5. MIT (46%)
Examples
http://www.isi.imi.i.u-tokyo.ac.jp/pattern/ilsvrc/index.html
23. 2010 Winner: NEC-UIUC
LCC + super vector coding
Ensemble of six classifiers using different features
Parallelized feature extraction (Hadoop)
Linear SVM (Averaging SGD)
LCC→2days、Super vector→7days (with a 8-core
machine)
24. 2011 Winner: XRCE
Fisher vector
520K dim ×2 (SIFT, color)
2 days with a 16-core machine
Linear SVM (SGD)
1.5 days with a 16-core machine
25. 2012 Winner: Univ. Toronto
Deep learning
Huge convolutional neural network from raw images
Two GPUs, one week
10%
26. Summary
Large-scale image recognition is now a hot topic
Millions of training images, tens of thousands of categories
Scalability is the key issue
Linear training methods + compatibly-designed features
If we somehow approximate the sample similarity with dot
kernel, we can simply apply linear methods!
Explicit embedding
Fisher kernel
KPCA + Nystrom method
Personal interest: Can we do this with graph kernels?