ACM Multimedia 2008




Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li

                     Microsoft Resear...
Multimedia
     ……
Information
   Indexing
   Ranking
  Clustering
  Annotation
 Recommendation

  Retrieval




         ...
Indexing




                                 Ranking
Annotation           Image
                   Similarity/
          ...
Image Similarity/Distance


         Image
       Similarity/
        Distance


        Concept
       Similarity/
      ...
Image Similarity/Distance




Numerous efforts have been made.
 Concept Similarity/Distance


           Concept
         ...
Image Similarity/Distance




   Numerous efforts have been made.




    Concept Similarity/Distance
              Olympi...
WordNet Distance

Google Distance

Tag Concurrence Distance




                           7
WordNet
150,000 words

WordNet Distance
Quite a few methods to get it in WordNet
Basic idea is to measure the length of th...
Normalized Google Distance (NGD)
Reflects the concurrency of two words in Web documents
Defined as




Pros and Cons
Pros:...
Concept Pairs Google Distance
 Airplane – Dog      0.2562

Football – Soccer    0.1905

 Horse – Donkey      0.2147

Airpl...
Image Tag Concurrence Distance (Qi, Hua, et al. ACMMM07)
Reflects the frequency of two tags
occur in the same images
Based...
Concept Pairs Google Distance
                   Tag Concurrence Distance
 Airplane – Dog      0.2562   0.8532

Football –...
table — ping-         horse — donkey       car — wheel   airplane — airport
tennis   pong




 Synonymy             Visual...
Image tag concurrence
                   distance implicitly uses
                   image information, but
Concept Distan...
Can we mine concept distance
    from image content?




                               15
Semantic concept distance is based on human’s cognition
To of humanconcept distanceinformation
80%
    mine cognition come...
Concept A: Airplane                            Concept B: Airport




 Concept Model A                                Conc...
FlickrDistance
                                                               0.5151

                                    ...
R1: A Good Image Collection
Large
High coverage, especially on real-life world
With tags




                             ...
R2: A Good Concept Representation or Model
Based on image content
Can cover wider concept relationships
Can handle large-c...
VLM – Visual Language Model
       Spatial-relation sensitive
       Efficient
Concept Models object variations
      Can ...
I am talking about statistical language model.




Unigram Model     Pwx w1w2  wn   Pwx 

Bigram Model      Pwx w1w...
Visual Word Generation

                                                                                  1 0 0 1 0 0 1 0
...
Comparison on Image Categorization
Caltech 8 categories / 5097 images

                          Accuracy (%)
  100       ...
Why Latent-Topic


Latent-Topic VLM
Visual variations of concept are taken as latent topics




                          ...
Latent-Topic VLM Training
       Solved by EM algorithm,
       The objective function is to maximize the joint distributi...
Comparison on Image Categorization
Caltech 8 categories / 5097 images

                              Accuracy (%)
        ...
Kullback – Leibler (KL) divergence
Good, but not symmetric

                                 topic distance

Jensen –Shann...
Concept A: Airplane                            Concept B: Airport
                         Tag search in
                 ...
Evaluation
Objective evaluation
Subjective evaluation

Applications
Concept clustering
Image annotation
Tag recommendation...
Images
6,400,000 from Flickr

Concepts
130,000,000 different tags
10,000,000 filtered tags
1,000 randomly-selected tags

C...
Ground-Truth
12 persons are asked to score semantic correlation of each concept pair
Average scores are taken as ground-tr...
Ground-Truth
WordNet Distance
Only 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative...
Concept Clustering
   23 concepts;
   3 groups – (1) outer space, (2) animal and (3) sports

Normalized Google Distance   ...
Based on an approach using concept relation
Dual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007)
On 79 conce...
To Improve Tagging Quality
Eliminating tag incompletion, noises, and ambiguity
500 images / 10 recommended tags per image
...
Why VLM divergence can estimate concept distance?

Why FD works well even tags are not complete?



                      ...
If we find similar patterns in the images
   associated with different concepts,
      the corresponding concept
   relati...
Flickr Distance
  A novel approach to discover semantic relationships
  from image content
   based on real-life images fr...
Flickr Distance as a Service.




                                40
41
Upcoming SlideShare
Loading in …5
×

Flickr Distance

4,277 views

Published on

The presentation of "Flickr Distance" given by Xian-Sheng Hua in the best paper session of ACM Multimedia 2008

Published in: Lifestyle, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,277
On SlideShare
0
From Embeds
0
Number of Embeds
66
Actions
Shares
0
Downloads
130
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Flickr Distance

  1. ACM Multimedia 2008 Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li Microsoft Research Asia University of Science and Technology of China October 28, 2008
  2. Multimedia …… Information Indexing Ranking Clustering Annotation Recommendation Retrieval 2
  3. Indexing Ranking Annotation Image Similarity/ Distance Multimedia Information Retrieval Concept Similarity/ Recommendation Distance Clustering …… 3
  4. Image Similarity/Distance Image Similarity/ Distance Concept Similarity/ Distance 4
  5. Image Similarity/Distance Numerous efforts have been made. Concept Similarity/Distance Concept Similarity/ Distance 5
  6. Image Similarity/Distance Numerous efforts have been made. Concept Similarity/Distance Olympic Cat Sports Paw Tiger More and more used, but not well studied. 6
  7. WordNet Distance Google Distance Tag Concurrence Distance 7
  8. WordNet 150,000 words WordNet Distance Quite a few methods to get it in WordNet Basic idea is to measure the length of the path between two words Pros and Cons Pros: Built by human experts, so close to human perception Cons: Coverage is limited and difficult to extend 8
  9. Normalized Google Distance (NGD) Reflects the concurrency of two words in Web documents Defined as Pros and Cons Pros: Easy to get and huge coverage Cons: Only reflects concurrency in textual documents. Not really concept distance (semantic relationship) 9
  10. Concept Pairs Google Distance Airplane – Dog 0.2562 Football – Soccer 0.1905 Horse – Donkey 0.2147 Airplane – Airport 0.3094 Car – Wheel 0.3146 10
  11. Image Tag Concurrence Distance (Qi, Hua, et al. ACMMM07) Reflects the frequency of two tags occur in the same images Based on the same idea of NGD Mostly is sparse (> 95% are zero in the similarity matrix) Pros and Cons Pros: Images are taken into account Cons: a) Tags are sparse so visual concurrency is not well reflected b) Training data is difficult to get similarity matrix: 500 tags similarity matrix: 50 tags 11
  12. Concept Pairs Google Distance Tag Concurrence Distance Airplane – Dog 0.2562 0.8532 Football – Soccer 0.1905 0.1739 Horse – Donkey 0.2147 0.4513 Airplane – Airport 0.3094 0.1833 Car – Wheel 0.3146 0.9617 12
  13. table — ping- horse — donkey car — wheel airplane — airport tennis pong Synonymy Visually Similar Meronymy Concurrency different words but similar things or part and exist at the same the same meaning things of same type the whole scene/place 13
  14. Image tag concurrence distance implicitly uses image information, but Concept Distance tags are too sparse Mine from image tags Google distance’s coverage is very high, but it is for text domain Mine from text documents WordNet distance is good, but coverage is too low Mine from ontology 14
  15. Can we mine concept distance from image content? 15
  16. Semantic concept distance is based on human’s cognition To of humanconcept distanceinformation 80% mine cognition comes from visual from a large tagged image collection There are around 2.8 billion photos on Flickr (by Sep 08) based on image content In average each Flickr image has around 8 tags bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa 16
  17. Concept A: Airplane Concept B: Airport Concept Model A Concept Model B Flickr Distance (A, B) 17
  18. FlickrDistance 0.5151 0.0315 0.4231 0.0576 0.0708 Flickr Distance is able to cover the four different semantic relationships Synonymy, Visually Similar, Meronymy, and Concurrency 18
  19. R1: A Good Image Collection Large High coverage, especially on real-life world With tags Real-life photos reflect human’s perception Based on collective intelligence from grassroots 19
  20. R2: A Good Concept Representation or Model Based on image content Can cover wider concept relationships Can handle large-concept set Concept Models Discriminative SVM, Boosting, … Generative Global Feature Local Feature w/o Spatial Relation Bag-of-Words (pLSA, LDA), … w/ Spatial Relation 2D HMM, MRF, … 20
  21. VLM – Visual Language Model Spatial-relation sensitive Efficient Concept Models object variations Can handle Discriminative SVM, Boosting, … Generative Global Feature Local Feature w/o Spatial Relation Bag-of-Words, … w/ Spatial Relation 2D HMM, MRF, … 21
  22. I am talking about statistical language model. Unigram Model Pwx w1w2  wn   Pwx  Bigram Model Pwx w1w2  wn   pwx wx1  Trigram Model Pwx w1w2  wn   Pwx wx1 wx 2  22
  23. Visual Word Generation 1 0 0 1 0 0 1 0 Patch  Gradient Texture Histogram Hashing  Visual Word Image  Patch Unigram Model Pwxy w11w12  wmn   Pwxy  Bigram Model Pwxy w11w12  wmn   P wxy wx 1, y   Trigram Model Pwxy w11w12  wmn   pw xy wx1, y wx, y 1  Trigram VLM is estimated by directly counting from sufficient samples of each category. To avoid the bias in the sampling, back-off smoothing method is adopted. 23
  24. Comparison on Image Categorization Caltech 8 categories / 5097 images Accuracy (%) 100 88 90 90 80 64 59 60 40 20 0 pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM Training Time (sec/image) 3.00 2.44 2.50 2.00 1.50 1.11 1.00 0.84 0.44 0.50 0.14 0.00 pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM 24
  25. Why Latent-Topic Latent-Topic VLM Visual variations of concept are taken as latent topics 25
  26. Latent-Topic VLM Training Solved by EM algorithm, The objective function is to maximize the joint distribution of concept and its visual word arrangement Aw maximize p Aw , C  =   P wxy wx 1, y wx , y 1 , d C j  d C C j x, y Estimate the posteriors of the hidden topics Maximize the likelihood of visual arrangement 26
  27. Comparison on Image Categorization Caltech 8 categories / 5097 images Accuracy (%) 90 90 94 100 88 80 64 59 60 40 20 0 pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM LT-VLM Training Time (sec/image) 3.00 2.44 2.50 2.00 1.50 1.11 0.84 1.00 0.44 0.50 0.14 0.24 0.00 pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM LT-VLM 27
  28. Kullback – Leibler (KL) divergence Good, but not symmetric topic distance Jensen –Shannon (JS) divergence Better, as it is symmetric And, square root of JS divergence is a metric, so is Flickr Distance topic distance concept distance 28
  29. Concept A: Airplane Concept B: Airport Tag search in Flickr Concept Model A LT-VLM Concept Model B Jensen-Shannon Divergence Flickr Distance (A, B) 29
  30. Evaluation Objective evaluation Subjective evaluation Applications Concept clustering Image annotation Tag recommendation 30
  31. Images 6,400,000 from Flickr Concepts 130,000,000 different tags 10,000,000 filtered tags 1,000 randomly-selected tags Comparison Normalized Google Distance (NGD) Tag Concurrence Distance (TCD) Flickr Distance (FD) 31
  32. Ground-Truth 12 persons are asked to score semantic correlation of each concept pair Average scores are taken as ground-truth Evaluate Accuracy of “Relative Distance Pairs” Step 1: Find all distance pairs D(a,b) and D(c,d) Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth Correct Rate 0.56 0.55 0.54 0.53 0.52 0.51 0.5 0.49 0.48 0.47 NGD TCD FD 32
  33. Ground-Truth WordNet Distance Only 497 concepts (overlap of WordNet and the 1000 concepts) Evaluate Accuracy of “Relative Distance Pairs” Step 1: Find all distance pairs D(a,b) and D(c,d) Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth Correct Rate 0.54 0.53 0.52 0.51 0.5 0.49 0.48 0.47 0.46 0.45 NGD TCD FD 33
  34. Concept Clustering 23 concepts; 3 groups – (1) outer space, (2) animal and (3) sports Normalized Google Distance Tag Concurrence Distance Flickr Distance Group1 Group2 Group3 Group 1 Group2 Group3 Group1 Group2 Group3 bears bowling baseball moon baseball basketball moon bears baseball horses dolphin basketball space donkey bears Saturn dolphin basketball bowling moon donkey football Venus softball space donkey football dolphin space Saturn golf whale wolf football Venus golf snake sharks soccer golf horses soccer snake tennis horses sharks bowling softball volleyball Saturn spiders softball spiders sharks tennis volleyball soccer turtle whale spiders Venus tennis wolf whale turtle wolf volleyball 34
  35. Based on an approach using concept relation Dual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007) On 79 concepts / 79,000 images NGD-DCMRM TCD-DCMRM FD-DCMRM 1200 960 1000 correct keywords Total number of 800 600 423 400 354 301 310 212 186 212 193 200 55 53 57 0 1 2 3 4 NGD-DCMRM 55 212 212 301 TCD-DCMRM 53 186 193 310 FD-DCMRM 57 354 423 960 The number of correctly annotated keywords at the first N words 35
  36. To Improve Tagging Quality Eliminating tag incompletion, noises, and ambiguity 500 images / 10 recommended tags per image 0.78 0.758 0.76 0.74 0.72 0.7 0.68 0.665 0.66 0.652 0.64 0.62 0.6 0.58 NGD Tag Concurrent Distance Flickr Distance Precision @ 10 36
  37. Why VLM divergence can estimate concept distance? Why FD works well even tags are not complete? room patterns Computer VLM: computer patterns of trigrams distribution other patterns room patterns TV patterns other patterns TV room patterns screen patterns other patterns Office 37
  38. If we find similar patterns in the images associated with different concepts, the corresponding concept relationships can be discovered. Computer Office 38
  39. Flickr Distance A novel approach to discover semantic relationships from image content based on real-life images from the Web based on collective intelligence from grassroots A distance more consistent with human’s perception A measurement more effective in many applications 39
  40. Flickr Distance as a Service. 40
  41. 41

×