SlideShare a Scribd company logo
1 of 25
G. Carneiro, A. Chan, P. Moreno N. Vasconcelos
by: Lukáš
Tencer
ECSE626 2012
Outline
• Introduction
• Prior techniques
• Supervised OVA Labeling
• Unsupervised Labeling
• Methodology
• Supervised Multiclass Labeling
• Semantic Distribution Estimation
• Density Estimation
• Algorithm
• Learning, Annotation, Retrieval
• Results
• Quantitative
• Qualitative
• Conclusion
Introduction
• Task
• Assign labels to unknown images
• Retrieve relevant images given labels
• Supervised Learning
• Learning from labeled training data
• Training data consist of pairs
• Multiple instance learning
• Semantic Classes
• labels representing common concepts (sky, bear, snow…)
• Image Annotation and Retrieval
• Annotation: Given the image D, what labels are present in the
image
• Given the label what are the top n matching images
nilx ii ...1},{ 
Introduction
 Datasets:
 Corel5K – 5000 images, 272 Classes
 Corel30K – 30000 images, 1120 Classes
 MIRFLICKR – 25000 images, 37 Classes
 (PSU) – not available anymore
 ImageCLEF - The CLEF (Cross Language
Evaluation Forum) Cross Language Image
Retrieval Track
 Medical Image retrieval
 Photo Annotation
 Plant Identification
 Wikipedia Retrieval
 Patent Image Retrieval and Classification
Introduction
 Corel 5K Corel 30K MIRFLICKR
Bear New Zealand Urban
Prior Techniques
 Supervised OVA
 Binary decision problem, concept present / absent
 Hidden variable Yi
 Decision rule:
 Unsupervised Learning
 Modeling dependency between text label and image
features, expressed as hidden variable L
 Considering just positive examples, densities for Yi=1
)0()0|()1()1|( || iiii YYXYYX PXPPXP 


D
l LWLXWX lPlwPlxPwxP 1 ||, )(),(),(),(
L
W X
W1 W2 W3 X
bear
polar, grizzly features
Methodology
Supervised Multiclass Labeling (SML)
 Elements of semantic vocabulary (W) are
explicitly made to semantic classes (L) !
 Random var. W:
annotation and retrieval is then easy to do as:
Annotation Retrieval
)|(Pandfromsampleisifonly},...,1{, W|X ixwxTiiW i
)(
)(),(
)|( |
|
xP
iPixP
xiP
X
WWX
XW 
)|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji 
Methodology
Estimation of Semantic Class
Distributions
 Given Di training set of images, estimate
 Assumption: Gaussian Distribution
 How to estimate?
 Direct estimation
 Model Averaging
 Naive Averaging
 GMM model:
 Averaged:
)|(| ixP WX

 iD
l WLX
i
WX ilxP
D
ixP 1 ,|| ),|(
1
),(
 
k
k
li
k
li
k
liWLX xGilxP ),,(),|( ,,,,| 


k
D
l
k
li
k
li
k
li
i
WX
i
xG
D
ixP
1
,,,| ),,(
1
)|( 
Methodology
Mixture hierarchies
 First step, get GMM from images – regular soft
EM
 E:
 M:


8
1
| ),,()|(
k
k
I
k
I
k
IWX xGIxP 
Initialization
Euclidian distance
Mahalonobis
distance
Initial Par.
estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood
is too small



n
i
j jjiji xGjzzxP
1
2
1
),;()()|,( 
)|,()|,()|,( 1 ttt
zxPzxPzxP   
)],;([log),( ,|
ZXFEQ t
xz
t
 

),(maxarg1 tt
Q  
Methodology
Mixture hierarchies for label
 Second step, get HGMM for labels
 E:
 M:


64
1
| ),,()|(
k
k
w
k
w
k
wWX xGwxP  Initialization
Bhattacharyya
distance
Initial Par.
estimate
Expectation
Maximizaiton
Max iter. 200Change in likelihood
is too small



n
i
j jjiji xGjzzxP
1
2
1
),;()()|,( 
)|,()|,()|,( 1 ttt
zxPzxPzxP   
)],;([log),( ,|
ZXFEQ t
xz
t
 

),(maxarg1 tt
Q  
E and M step for HGMM
 Input:
 Output:
 E-step:
 M-step:
KkDj i
k
j
k
j
k
j ,...,1,,...,1},,,{ 








l
l
c
Ntrace
l
c
l
c
k
j
m
c
Ntrace
m
c
m
c
k
jm
jk
k
j
k
j
l
c
k
j
k
j
m
c
eG
eG
h




]),,([
]),,([
}){(
2
1
}){(
2
1
1
1
Mmm
j
m
j
m
j ,...,1},,,{ 
KD
h
i
m
jkjknewm
c

)(



jk
jk
k
j
m
jk
k
j
m
jkm
jk
k
j
m
jk
newm
c
h
h
ww


 where,)(
 
jk
Tm
c
k
j
m
c
k
j
k
j
m
jk
newm
c w ]))(([)( 
Algorithm - learning
 Training
 For each training set I for label w
 Decompose image (192px * 128px ) into 8x8 regions
by sliding window moving each 2 pixels
 Calculate DCT for each window (8*8*3) 192-d feature
vector
 Calculate mixture of 8 Gaussians for each Image
using EM
 Calculate mixture of 64 Gaussians for each label
using H-EM


8
1
| ),,()|(
k
k
I
k
I
k
IWX xGIxP 


64
1
| ),,()|(
k
k
w
k
w
k
wWX xGwxP 
Algorithm – annotation, retrieval
 Annotation
 Get n(5) beast labels for image I
 Get features from image ((192*128/2)*192)
 Get log likelihood for each label, choose the best
n
 Retrieval
 For images IT and label w:
 Annotate IT and get decreasing scores of posterior




x
iWXiWX wxPwP )|(log)|(log ||
)|(| iWX wP 
Results-quantitative
 Database: Corel 5k
 Precision:
 Recall:
 4000 training 1000 testing
retrieved
retrievedrelevant
relevant
retrievedrelevant
H
C
w
w
recall 
auto
C
w
w
precision 
annotatedautomatic
annotatedhuman
imagesannotatedcorrectly



auto
H
C
w
w
w
Results-quantitative
Non zero recall mean Recall mean Precision
1 2 3 4 5 6
w with Recall > 0 140 121 110 125 90 131
Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27
Mean Precision pre
w
0.25 0.24 0.23 0.23 0.2 0.23
Annotation
Results-quantitative
Recall > 0 PrecisionAll precision
1 2 3 4 5 6
Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24
Mean Recall per w
R>0
0.45 0.40 0.40 0.41 0.37 0.41
Retrieval
Results-qualitative
Results-qualitative
plane jet f-14 sky
-----------------------
sky plane clouds
smoke snow
coast waves
water hills
-----------------------
water sky ocean
mountain clouds
polar bear bars
cage
-----------------------
bear snow texture
sunrise closeup
people cheese
market street
-----------------------
people wall sand
flower bird
Results-qualitative
Results-qualitative
Blooms Mountain Pool Smoke Woman
Results-qualitative
Conclusions
 Pros
 Nice segmentation as byproduct of annotation
 Great for general concepts with lots of samples
 Just weakly annotated data is required (multi-instance
learning)
 Allows hierarchical representation (adding images, speed)
 Contras
 Fixed number of labels per image
 Learning is time consuming
 Parameter tuning is time consuming
 Weakly represented classes could be associated with
wrong concepts
Resources
 Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of
semantic classes for image annotation and retrieval. Pattern Analysis and Machine
Intelligence, IEEE Transactions on. 29, 394–410 (2007).
 Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer.
28, 18–22 (1995).
 Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image
segmentation using EM and its application to content-based image retrieval.
Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE
(1998).
 Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data
models. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
71, 593–613 (2009).
 Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and
Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).
lukas.tencer@gmail.com
http://tencer.hustej.net
@lukastencer
accuratelyrandom.blogspot.com
facebook.com/lukas.tencer
Google labeling game

More Related Content

What's hot

04 image enhancement edge detection
04 image enhancement edge detection04 image enhancement edge detection
04 image enhancement edge detectionRumah Belajar
 
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch NetworkHiroshi Fukui
 
Image processing spatialfiltering
Image processing spatialfilteringImage processing spatialfiltering
Image processing spatialfilteringJohn Williams
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalEdwin Efraín Jiménez Lepe
 
imageCorrectionLinearDiffusion
imageCorrectionLinearDiffusionimageCorrectionLinearDiffusion
imageCorrectionLinearDiffusionKellen Betts
 
Image Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom ConceptsImage Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom Conceptsmmjalbiaty
 
05 histogram processing DIP
05 histogram processing DIP05 histogram processing DIP
05 histogram processing DIPbabak danyal
 
Digital signal processing on arm new
Digital signal processing on arm newDigital signal processing on arm new
Digital signal processing on arm newIsrael Gbati
 
Lecture note4coordinatedescent
Lecture note4coordinatedescentLecture note4coordinatedescent
Lecture note4coordinatedescentXudong Sun
 
Introduction to Image Processing
Introduction to Image ProcessingIntroduction to Image Processing
Introduction to Image ProcessingIsrael Gbati
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsAjay Bidyarthy
 
Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Yutaka KATAYAMA
 

What's hot (20)

04 image enhancement edge detection
04 image enhancement edge detection04 image enhancement edge detection
04 image enhancement edge detection
 
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
[MIRU2018] Global Average Poolingの特性を用いたAttention Branch Network
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Image processing spatialfiltering
Image processing spatialfilteringImage processing spatialfiltering
Image processing spatialfiltering
 
Deep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image RetrievalDeep Residual Hashing Neural Network for Image Retrieval
Deep Residual Hashing Neural Network for Image Retrieval
 
imageCorrectionLinearDiffusion
imageCorrectionLinearDiffusionimageCorrectionLinearDiffusion
imageCorrectionLinearDiffusion
 
Image Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom ConceptsImage Interpolation Techniques with Optical and Digital Zoom Concepts
Image Interpolation Techniques with Optical and Digital Zoom Concepts
 
05 histogram processing DIP
05 histogram processing DIP05 histogram processing DIP
05 histogram processing DIP
 
Image transforms
Image transformsImage transforms
Image transforms
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Digital signal processing on arm new
Digital signal processing on arm newDigital signal processing on arm new
Digital signal processing on arm new
 
SPATIAL FILTER
SPATIAL FILTERSPATIAL FILTER
SPATIAL FILTER
 
Spatial filtering
Spatial filteringSpatial filtering
Spatial filtering
 
Lecture note4coordinatedescent
Lecture note4coordinatedescentLecture note4coordinatedescent
Lecture note4coordinatedescent
 
Introduction to Image Processing
Introduction to Image ProcessingIntroduction to Image Processing
Introduction to Image Processing
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
2.spatial filtering
2.spatial filtering2.spatial filtering
2.spatial filtering
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Design and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation AlgorithmsDesign and Implementation of Parallel and Randomized Approximation Algorithms
Design and Implementation of Parallel and Randomized Approximation Algorithms
 
Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理Sparse coding Super-Resolution を用いた核医学画像処理
Sparse coding Super-Resolution を用いた核医学画像処理
 

Viewers also liked

Large Scale Online Learning of Image Similarity Through Ranking
Large Scale Online Learning of Image Similarity Through RankingLarge Scale Online Learning of Image Similarity Through Ranking
Large Scale Online Learning of Image Similarity Through RankingLukas Tencer
 
Web-based framework for online sketch-based image retrieval
Web-based framework for online sketch-based image retrievalWeb-based framework for online sketch-based image retrieval
Web-based framework for online sketch-based image retrievalLukas Tencer
 
ICRA: Intelligent Platform for Collaboration and Interaction
ICRA: Intelligent Platform for Collaboration and InteractionICRA: Intelligent Platform for Collaboration and Interaction
ICRA: Intelligent Platform for Collaboration and InteractionLukas Tencer
 
Tracking of objects with known color signature - ELITECH 20
Tracking of objects with known color signature - ELITECH 20Tracking of objects with known color signature - ELITECH 20
Tracking of objects with known color signature - ELITECH 20Lukas Tencer
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised LearningLukas Tencer
 
Common Probability Distibution
Common Probability DistibutionCommon Probability Distibution
Common Probability DistibutionLukas Tencer
 
Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to ProbabilityLukas Tencer
 
Introduction to Computer Graphics, lesson 1
Introduction to Computer Graphics, lesson 1Introduction to Computer Graphics, lesson 1
Introduction to Computer Graphics, lesson 1Lukas Tencer
 
Personal Career,Education and skills presentation, 2011
Personal Career,Education and skills presentation, 2011Personal Career,Education and skills presentation, 2011
Personal Career,Education and skills presentation, 2011Lukas Tencer
 
Data driven recruiting
Data driven recruitingData driven recruiting
Data driven recruitingBrendan Browne
 
AIT presentation
AIT presentationAIT presentation
AIT presentationShan .
 
Computer graphics on web and in mobile devices
Computer graphics on web and in mobile devicesComputer graphics on web and in mobile devices
Computer graphics on web and in mobile devicesLukas Tencer
 
Slovakia Presentation at Day of Cultures
Slovakia Presentation at Day of CulturesSlovakia Presentation at Day of Cultures
Slovakia Presentation at Day of CulturesLukas Tencer
 

Viewers also liked (14)

Telnet and SSH
Telnet and SSHTelnet and SSH
Telnet and SSH
 
Large Scale Online Learning of Image Similarity Through Ranking
Large Scale Online Learning of Image Similarity Through RankingLarge Scale Online Learning of Image Similarity Through Ranking
Large Scale Online Learning of Image Similarity Through Ranking
 
Web-based framework for online sketch-based image retrieval
Web-based framework for online sketch-based image retrievalWeb-based framework for online sketch-based image retrieval
Web-based framework for online sketch-based image retrieval
 
ICRA: Intelligent Platform for Collaboration and Interaction
ICRA: Intelligent Platform for Collaboration and InteractionICRA: Intelligent Platform for Collaboration and Interaction
ICRA: Intelligent Platform for Collaboration and Interaction
 
Tracking of objects with known color signature - ELITECH 20
Tracking of objects with known color signature - ELITECH 20Tracking of objects with known color signature - ELITECH 20
Tracking of objects with known color signature - ELITECH 20
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
Common Probability Distibution
Common Probability DistibutionCommon Probability Distibution
Common Probability Distibution
 
Introduction to Probability
Introduction to ProbabilityIntroduction to Probability
Introduction to Probability
 
Introduction to Computer Graphics, lesson 1
Introduction to Computer Graphics, lesson 1Introduction to Computer Graphics, lesson 1
Introduction to Computer Graphics, lesson 1
 
Personal Career,Education and skills presentation, 2011
Personal Career,Education and skills presentation, 2011Personal Career,Education and skills presentation, 2011
Personal Career,Education and skills presentation, 2011
 
Data driven recruiting
Data driven recruitingData driven recruiting
Data driven recruiting
 
AIT presentation
AIT presentationAIT presentation
AIT presentation
 
Computer graphics on web and in mobile devices
Computer graphics on web and in mobile devicesComputer graphics on web and in mobile devices
Computer graphics on web and in mobile devices
 
Slovakia Presentation at Day of Cultures
Slovakia Presentation at Day of CulturesSlovakia Presentation at Day of Cultures
Slovakia Presentation at Day of Cultures
 

Similar to Supervised Learning of Semantic Classes for Image Annotation and Retrieval

机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Networkaciijournal
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Networkaciijournal
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKaciijournal
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...VasileiosMezaris
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 
iv10_linear_pose.pptx
iv10_linear_pose.pptxiv10_linear_pose.pptx
iv10_linear_pose.pptxdarmadi ir,mm
 
Generating super resolution images using transformers
Generating super resolution images using transformersGenerating super resolution images using transformers
Generating super resolution images using transformersNEERAJ BAGHEL
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesAdnanHaider234505
 
Lectures 1 3 final (4)
Lectures 1 3 final (4)Lectures 1 3 final (4)
Lectures 1 3 final (4)seemakashyap15
 
De-convolution on Digital Images
De-convolution on Digital ImagesDe-convolution on Digital Images
De-convolution on Digital ImagesMd. Shohel Rana
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 

Similar to Supervised Learning of Semantic Classes for Image Annotation and Retrieval (20)

机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
Image De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural NetworkImage De-Noising Using Deep Neural Network
Image De-Noising Using Deep Neural Network
 
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORKIMAGE DE-NOISING USING DEEP NEURAL NETWORK
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...Learning visual explanations for DCNN-based image classifiers using an attent...
Learning visual explanations for DCNN-based image classifiers using an attent...
 
Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval Class Weighted Convolutional Features for Image Retrieval
Class Weighted Convolutional Features for Image Retrieval
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
Seema dip
Seema dipSeema dip
Seema dip
 
iv10_linear_pose.pptx
iv10_linear_pose.pptxiv10_linear_pose.pptx
iv10_linear_pose.pptx
 
Time series Forecasting using svm
Time series Forecasting using  svmTime series Forecasting using  svm
Time series Forecasting using svm
 
Generating super resolution images using transformers
Generating super resolution images using transformersGenerating super resolution images using transformers
Generating super resolution images using transformers
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture Slides
 
Lectures 1 3 final (4)
Lectures 1 3 final (4)Lectures 1 3 final (4)
Lectures 1 3 final (4)
 
De-convolution on Digital Images
De-convolution on Digital ImagesDe-convolution on Digital Images
De-convolution on Digital Images
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
ISCAS2013_v5
ISCAS2013_v5ISCAS2013_v5
ISCAS2013_v5
 
2. filtering basics
2. filtering basics2. filtering basics
2. filtering basics
 
Module 1.pptx
Module 1.pptxModule 1.pptx
Module 1.pptx
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

  • 1. G. Carneiro, A. Chan, P. Moreno N. Vasconcelos by: Lukáš Tencer ECSE626 2012
  • 2. Outline • Introduction • Prior techniques • Supervised OVA Labeling • Unsupervised Labeling • Methodology • Supervised Multiclass Labeling • Semantic Distribution Estimation • Density Estimation • Algorithm • Learning, Annotation, Retrieval • Results • Quantitative • Qualitative • Conclusion
  • 3. Introduction • Task • Assign labels to unknown images • Retrieve relevant images given labels • Supervised Learning • Learning from labeled training data • Training data consist of pairs • Multiple instance learning • Semantic Classes • labels representing common concepts (sky, bear, snow…) • Image Annotation and Retrieval • Annotation: Given the image D, what labels are present in the image • Given the label what are the top n matching images nilx ii ...1},{ 
  • 4. Introduction  Datasets:  Corel5K – 5000 images, 272 Classes  Corel30K – 30000 images, 1120 Classes  MIRFLICKR – 25000 images, 37 Classes  (PSU) – not available anymore  ImageCLEF - The CLEF (Cross Language Evaluation Forum) Cross Language Image Retrieval Track  Medical Image retrieval  Photo Annotation  Plant Identification  Wikipedia Retrieval  Patent Image Retrieval and Classification
  • 5. Introduction  Corel 5K Corel 30K MIRFLICKR Bear New Zealand Urban
  • 6. Prior Techniques  Supervised OVA  Binary decision problem, concept present / absent  Hidden variable Yi  Decision rule:  Unsupervised Learning  Modeling dependency between text label and image features, expressed as hidden variable L  Considering just positive examples, densities for Yi=1 )0()0|()1()1|( || iiii YYXYYX PXPPXP    D l LWLXWX lPlwPlxPwxP 1 ||, )(),(),(),( L W X W1 W2 W3 X bear polar, grizzly features
  • 7. Methodology Supervised Multiclass Labeling (SML)  Elements of semantic vocabulary (W) are explicitly made to semantic classes (L) !  Random var. W: annotation and retrieval is then easy to do as: Annotation Retrieval )|(Pandfromsampleisifonly},...,1{, W|X ixwxTiiW i )( )(),( )|( | | xP iPixP xiP X WWX XW  )|(maxarg)(* | XiPXi XWi )|(maxarg)(* | iXPwj jWXji 
  • 8. Methodology Estimation of Semantic Class Distributions  Given Di training set of images, estimate  Assumption: Gaussian Distribution  How to estimate?  Direct estimation  Model Averaging  Naive Averaging  GMM model:  Averaged: )|(| ixP WX   iD l WLX i WX ilxP D ixP 1 ,|| ),|( 1 ),(   k k li k li k liWLX xGilxP ),,(),|( ,,,,|    k D l k li k li k li i WX i xG D ixP 1 ,,,| ),,( 1 )|( 
  • 9. Methodology Mixture hierarchies  First step, get GMM from images – regular soft EM  E:  M:   8 1 | ),,()|( k k I k I k IWX xGIxP  Initialization Euclidian distance Mahalonobis distance Initial Par. estimate Expectation Maximizaiton Max iter. 200Change in likelihood is too small    n i j jjiji xGjzzxP 1 2 1 ),;()()|,(  )|,()|,()|,( 1 ttt zxPzxPzxP    )],;([log),( ,| ZXFEQ t xz t    ),(maxarg1 tt Q  
  • 10. Methodology Mixture hierarchies for label  Second step, get HGMM for labels  E:  M:   64 1 | ),,()|( k k w k w k wWX xGwxP  Initialization Bhattacharyya distance Initial Par. estimate Expectation Maximizaiton Max iter. 200Change in likelihood is too small    n i j jjiji xGjzzxP 1 2 1 ),;()()|,(  )|,()|,()|,( 1 ttt zxPzxPzxP    )],;([log),( ,| ZXFEQ t xz t    ),(maxarg1 tt Q  
  • 11. E and M step for HGMM  Input:  Output:  E-step:  M-step: KkDj i k j k j k j ,...,1,,...,1},,,{          l l c Ntrace l c l c k j m c Ntrace m c m c k jm jk k j k j l c k j k j m c eG eG h     ]),,([ ]),,([ }){( 2 1 }){( 2 1 1 1 Mmm j m j m j ,...,1},,,{  KD h i m jkjknewm c  )(    jk jk k j m jk k j m jkm jk k j m jk newm c h h ww    where,)(   jk Tm c k j m c k j k j m jk newm c w ]))(([)( 
  • 12. Algorithm - learning  Training  For each training set I for label w  Decompose image (192px * 128px ) into 8x8 regions by sliding window moving each 2 pixels  Calculate DCT for each window (8*8*3) 192-d feature vector  Calculate mixture of 8 Gaussians for each Image using EM  Calculate mixture of 64 Gaussians for each label using H-EM   8 1 | ),,()|( k k I k I k IWX xGIxP    64 1 | ),,()|( k k w k w k wWX xGwxP 
  • 13. Algorithm – annotation, retrieval  Annotation  Get n(5) beast labels for image I  Get features from image ((192*128/2)*192)  Get log likelihood for each label, choose the best n  Retrieval  For images IT and label w:  Annotate IT and get decreasing scores of posterior     x iWXiWX wxPwP )|(log)|(log || )|(| iWX wP 
  • 14. Results-quantitative  Database: Corel 5k  Precision:  Recall:  4000 training 1000 testing retrieved retrievedrelevant relevant retrievedrelevant H C w w recall  auto C w w precision  annotatedautomatic annotatedhuman imagesannotatedcorrectly    auto H C w w w
  • 15. Results-quantitative Non zero recall mean Recall mean Precision 1 2 3 4 5 6 w with Recall > 0 140 121 110 125 90 131 Mean Recall per w 0.27 0.25 0.25 0.26 0.23 0.27 Mean Precision pre w 0.25 0.24 0.23 0.23 0.2 0.23 Annotation
  • 16. Results-quantitative Recall > 0 PrecisionAll precision 1 2 3 4 5 6 Mean Recall all w 0.23 0.21 0.20 0.21 0.19 0.24 Mean Recall per w R>0 0.45 0.40 0.40 0.41 0.37 0.41 Retrieval
  • 18. Results-qualitative plane jet f-14 sky ----------------------- sky plane clouds smoke snow coast waves water hills ----------------------- water sky ocean mountain clouds polar bear bars cage ----------------------- bear snow texture sunrise closeup people cheese market street ----------------------- people wall sand flower bird
  • 22. Conclusions  Pros  Nice segmentation as byproduct of annotation  Great for general concepts with lots of samples  Just weakly annotated data is required (multi-instance learning)  Allows hierarchical representation (adding images, speed)  Contras  Fixed number of labels per image  Learning is time consuming  Parameter tuning is time consuming  Weakly represented classes could be associated with wrong concepts
  • 23. Resources  Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. Pattern Analysis and Machine Intelligence, IEEE Transactions on. 29, 394–410 (2007).  Gudivada, V.N., Raghavan, V.V.: Content based image retrieval systems. Computer. 28, 18–22 (1995).  Belongie, S., Carson, C., Greenspan, H., Malik, J.: Color-and texture-based image segmentation using EM and its application to content-based image retrieval. Computer Vision, 1998. Sixth International Conference on. pp. 675–682. IEEE (1998).  Cappé, O., Moulines, E.: On-line expectation–maximization algorithm for latent data models. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 71, 593–613 (2009).  Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image Retrieval: Ideas, Influences, and Trends of the New Age. ACM Computing Surveys. 40, 1-60 (2008).