Semantics In Digital Photos A Contenxtual Analysis


Published on

Interpreting the semantics of an image is a hard problem. However, for storing and indexing large multimedia collections,
it is essential to build systems that can automatically extract semantics from images. In this research we show how we can fuse content and context to extract semantics from digital photographs. Our experiments show that if we can properly model context associated with media, we can interpret semantics using only a part of high dimensional content data.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Global features: color, texture, shapes Local features: edges, salient points, objects
  • B 是在像素特徵空間的隨機變數。
  • Semantics In Digital Photos A Contenxtual Analysis

    1. 1. Semantics in Digital Photos: a Contenxtual Analysis Author / Pinaki Sinha, Ramesh Jain Conference / The IEEE International Conference on Semantic Computing, 2008, p58. – p.65 Presenter / Meng-Lun, Wu
    2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Related Work </li></ul><ul><li>The Optical Context Layer </li></ul><ul><li>Photo Clustering </li></ul><ul><li>Photo Classification </li></ul><ul><li>Annotation in Digital Photos </li></ul><ul><li>Results </li></ul><ul><li>Conclusion </li></ul>
    3. 3. Introduction <ul><li>Most research is concerned with extracting semantics using content information only. </li></ul><ul><li>All search engines rely on the text associated with the images to search for images. </li></ul><ul><li>Authors fuse the content of photos with two type of context using a probabilistic model. </li></ul>
    4. 4. Introduction (cont.)
    5. 5. Introduction (cont.) <ul><li>This paper classify photos into mutually exclusive classes and automatically tagging new photos. </li></ul><ul><li>Authors collected the photo dataset from flickr, which publishes popular tags. </li></ul>
    6. 6. Related Work <ul><li>Most research use content based pixel features like global features or local features. </li></ul><ul><li>Image search using an example input image or query using low level features might be difficult and no intuitive to most people. </li></ul><ul><li>Correlations among image features and human tags or labels have been studied. </li></ul><ul><li>The semantic gap in image retrieval can’t be overcome using pixel features alone. </li></ul>
    7. 7. Related Work (cont.) <ul><li>Recent research has used the optical Context Layer to classify photos. </li></ul><ul><li>Boutell and Luo[3] use pixel values and optical metadata for classification. </li></ul><ul><ul><li>[3] M. Boutell and J. Luo. Bayesian fusion of camera metadata cues in semantic scene classification. In Proc. IEEE CVPR, 2004. </li></ul></ul><ul><li>Model[6] by fusing ontology. </li></ul><ul><ul><li>[6] P.Duygulu, K.Barnard, N. de Freitas, and D. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. ECCV, 2002. </li></ul></ul>
    8. 8. The Optical Context Layer <ul><li>The Exchangeable Image File Standard (EXIF) specifies the camera parameters recorded. </li></ul><ul><li>Fundamental parameters </li></ul><ul><ul><li>Exposure Time, Focal Length, F-number, Flash, Metering mode and ISO. </li></ul></ul>
    9. 9. Photo Clustering <ul><li>LogLight metric will have a small value when the ambient light is high. </li></ul><ul><li>Similarly it will have a large value if the outdoor light is small. </li></ul>
    10. 10. Photo Clustering (cont.) <ul><li>Log-Light distribution of photos shot with flash and without flash as a mixture of Gaussians. </li></ul><ul><li>Use the Bayesian model Selection to find the optimal model and the Expectation Maximization (EM) algorithm to fit the model parameters. </li></ul>Image ? Optical
    11. 11. Photo Clustering (cont.) <ul><li>According to the above method, we generated 8 clusters. </li></ul><ul><li>We choose 3500 tagged photos. </li></ul><ul><li>We find the probability of each photo. </li></ul><ul><li>We assign the photo to the cluster having maximum probability. </li></ul><ul><li>We assign all tags of the photo to that particular cluster. </li></ul>
    12. 12. Photo Clustering (cont.) <ul><li>Cluster with High Exposure Time Shots </li></ul><ul><li>Cluster with No Flash </li></ul>
    13. 13. Photo Clustering (cont.) <ul><li>Cluster with Indoor Shots </li></ul>
    14. 14. Photo Classification <ul><li>The intent of the photographer is somehow hidden in the optical data. </li></ul><ul><li>These classes are outdoor day, outdoor night and indoors. </li></ul><ul><li>The classes should be represented different lighting condition in the LogLight metric. </li></ul>
    15. 15. Photo Classification (cont.) <ul><li>The Classification problem using Optical Context only and also using Optical Context and Thumbnail pixel features. </li></ul><ul><li>Classification algorithms is decision trees. </li></ul>
    16. 16. Photo Classification (cont.)
    17. 17. Annotation in Digital Photos <ul><li>The goal for automatic annotation is to predict words for tagging untagged photos. </li></ul><ul><li>Relevance model approach has become quite popular for automatic annotation and retrieval of images. </li></ul><ul><li>Automatic annotation is modeled as a language translation problem. </li></ul><ul><li>The baseline is continuous relevance model(CRM). </li></ul>
    18. 18. Annotation in Digital Photos (cont.) <ul><li>We divided the whole image into rectangular blocks. </li></ul><ul><li>For each block, we compute color, texture and shape features. </li></ul><ul><li>Each feature vector has 42 dimensions. </li></ul>
    19. 19. Annotation in Digital Photos (cont.) <ul><li>The goal is to predict the W associated with an untagged image based on B. </li></ul><ul><li>B is the observed variable. </li></ul><ul><li>The conditional probability of a word given a set of blocks. </li></ul>
    20. 20. Annotation in Digital Photos (cont.) <ul><li>During clustering process, we learn the optical cluster using an untagged image. </li></ul><ul><li>Whenever a new image X comes, we assign it to the cluster O j having maximum value for P(X|O j ). </li></ul><ul><li>The probability of a word given the pixel feature blocks and the optical context information. </li></ul>
    21. 21. Results <ul><li>Experiments datasets – Flickr </li></ul><ul><ul><li>Train </li></ul></ul><ul><ul><li>Evaluation </li></ul></ul><ul><ul><li>Test </li></ul></ul><ul><li>Performance evaluation </li></ul><ul><ul><li>Precision </li></ul></ul><ul><ul><li>recall </li></ul></ul>The number of correctly tag. The number of photos annotated with that tag in the real data. The number of prediction tag.
    22. 22. Results ( cont. ) <ul><li>Prediction tag – wildlife </li></ul><ul><ul><li>Optical Context 0.71 </li></ul></ul><ul><ul><li>Image Features (CRM) 0.16 </li></ul></ul><ul><ul><li>Thumbnail-Context 0.44 </li></ul></ul>
    23. 23. Using Ontology to Improve Tagging <ul><li>CIDE word similarity ontology. </li></ul><ul><li>Wu Palmer distance between two tags </li></ul>
    24. 24. Using Ontology to Improve Tagging <ul><li>Shrink this estimate using semantic similarity: </li></ul>
    25. 25. Results (cont.)
    26. 26. Conclusion <ul><li>Optical context data is only a small fraction, which has invaluable information about the photo shooting environment. </li></ul><ul><li>Fusing ontological models on semantics about photos also improves precision. </li></ul><ul><li>The future work </li></ul><ul><ul><li>Fuse other types of context with the context and optical context features. </li></ul></ul>