Novel Approaches to Natural Scene Categorization
                    Amit Prabhudesai
                   Roll No. 04307002...
Overview of topics to be covered

     • Natural Scene Categorization: Challenges
     • Our contribution
        ◦ Qualit...
Natural Scene Categorization

     • Interesting application of a CBIR system
     • Images from a broad image domain: div...
Qualitative Visual Environment Retrieval

                                                      BUILDING
                 ...
Qualitative Visual Environment System: Overview

     • Environment representation
     • Environment retrieval
        ◦ ...
System Overview (contd.)

     • Environment representation
        ◦ Image database containing images belonging to 6
    ...
System Overview (contd.)

     • Environment Retrieval
        ◦ View Partitioning
                                       ...
System Overview (contd.)

     • Environment Retrieval
        ◦ View Partitioning
                                       ...
System Overview (contd.)

     • Environment Retrieval
        ◦ Node annotation
          • Objective: Robust retrieval a...
System Overview (contd.)

     • Environment Retrieval
        ◦ Node annotation
          • Objective: Robust retrieval a...
System Overview (contd.)

     • Environment Retrieval
        ◦ Real-time operation
          • Colour histogram: compact...
System Overview (contd.)

     • Environment Retrieval
        ◦ Real-time operation
          • Colour histogram: compact...
System Overview (contd.)

     • Results
        ◦ Cylindrical concentric mosaics




                                    ...
System Overview (contd.)

     • Results
        ◦ Cylindrical concentric mosaics




                                    ...
System Overview (contd.)

     • Results
        ◦ Still omnicam image




                                Natural Scene C...
System Overview (contd.)

     • Results
        ◦ Still omnicam image




                                Natural Scene C...
System Overview (contd.)

     • Results
        ◦ Omnivideo sequence

                               FORWARD DIRECTION
  ...
System Overview (contd.)

     • Results
        ◦ Omnivideo sequence

                               FORWARD DIRECTION
  ...
Analyzing our results

     • System accuracy: close to 70%– This is not enough!
     • Some scenes are inherently ambiguo...
Analyzing our results

     • System accuracy: close to 70%– This is not enough!
     • Some scenes are inherently ambiguo...
Analyzing our results

     • System accuracy: close to 70%– This is not enough!
     • Some scenes are inherently ambiguo...
Method I. Adding memory to the system

     • System uses only the current observation in labeling
     • Good idea to use...
Method I. Adding memory to the system

     • System uses only the current observation in labeling
     • Good idea to use...
Adding memory. . . (Results)

     • Improved confidence in the results. However, negligible
       improvement in the accu...
Method II. Preclustering the image

     • Presence of clutter, images from a broad domain
     • Premise: The part of the...
Preclustering the image. . .

     • K means clustering of the image
     • Use only pixels from the largest cluster to co...
Preclustering the image. . .

     • K means clustering of the image
     • Use only pixels from the largest cluster to co...
Preclustering the image. . .

     • K means clustering of the image
     • Use only pixels from the largest cluster to co...
Model-based approaches

    • Stochastic models used to learn semantic concepts from
      training images
    • Use of no...
Bag of words approach

     • Local features more robust to occlusions and spatial
       variations
     • Image represen...
Bag of words approach

     • Local features more robust to occlusions and spatial
       variations
     • Image represen...
pLSA model . . .

     • Generative model
        ◦ select a document d with probability P (d)
        ◦ select a latent c...
pLSA model . . .

     • Generative model
        ◦ select a document d with probability P (d)
        ◦ select a latent c...
pLSA model . . .

     • Generative model
        ◦ select a document d with probability P (d)
        ◦ select a latent c...
pLSA model . . .

     • Model fitting
        ◦ Maximize the log-likelihood function
          L = d∈D w∈W n(d, w)logP (d,...
pLSA model . . .

     • Model fitting
        ◦ Maximize the log-likelihood function
          L = d∈D w∈W n(d, w)logP (d,...
pLSA model . . .

     • Details of experiment to evaluate model
        ◦ 5 categories: houses, forests, mountains, stree...
pLSA model. . . Results

     • 50 runs of the experiment: with random partitioning on each
       run
     • Vastly diffe...
pLSA model. . . Results

     • 50 runs of the experiment: with random partitioning on each
       run
     • Vastly diffe...
Results using the pLSA model




   Figure 0: Some images that were wrongly anno-
   tated by our system




             ...
Results of the pLSA model . . .

     • Comparison with the naive Bayes’ classifier




   Figure 0: Confusion matrices for...
Analysis of our results

     • Reasons for poor performance
        ◦ Model convergence!
         ◦ Local optima problem ...
Analysis of our results

     • Reasons for poor performance
        ◦ Model convergence!
         ◦ Local optima problem ...
Maximum entropy models

    • Maximum entropy prefers a uniform distribution when no
      data are available
    • Best m...
Maximum entropy models . . .

     • Notation
        ◦ f : predicate function
        ◦ p(x, y): empirical distribution o...
Maximum entropy models . . .

     • Notation
        ◦ f : predicate function
        ◦ p(x, y): empirical distribution o...
Maximum entropy models . . .

     • Notation
        ◦ f : predicate function
        ◦ p(x, y): empirical distribution o...
Results for the maximum entropy model

     • Same dataset, feature and codebook as used for the pLSA
       experiment
  ...
Results for the maximum entropy model

     • Same dataset, feature and codebook as used for the pLSA
       experiment
  ...
Results for the maximum entropy model

     • Same dataset, feature and codebook as used for the pLSA
       experiment
  ...
A comparative study


           Method            # of catg.   training # per catg.          perf(%)
      Maximum entrop...
Future Work

    • Further investigations into the pLSA model
    • Issue of model convergence
    • DAEM algorithm is not...
THANK YOU




            Natural Scene Categorization – p.32/32
Upcoming SlideShare
Loading in …5
×

Categorization of natural images

1,134 views

Published on

A slide-deck of my graduate thesis presentation at IIT-Bombay (2006).

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,134
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
32
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Categorization of natural images

  1. 1. Novel Approaches to Natural Scene Categorization Amit Prabhudesai Roll No. 04307002 amitp@ee.iitb.ac.in M.Tech Thesis Defence Under the guidance of Prof. Subhasis Chaudhuri Indian Institute of Technology, Bombay Natural Scene Categorization – p.1/32
  2. 2. Overview of topics to be covered • Natural Scene Categorization: Challenges • Our contribution ◦ Qualitative visual environment description • Portable, real-time system to aid the visually impaired • System has peripheral vision! ◦ Model-based approaches • Use of stochastic models to capture semantics • pLSA and maximum entropy models • Conclusions and Future Work Natural Scene Categorization – p.2/32
  3. 3. Natural Scene Categorization • Interesting application of a CBIR system • Images from a broad image domain: diverse and often ambiguous • Bridging the semantic gap • Grouping scenes into semantically meaningful categories could aid further retrieval • Efficient schemes for grouping images into semantic categories Natural Scene Categorization – p.3/32
  4. 4. Qualitative Visual Environment Retrieval BUILDING SKY LAWN FR LT RT WOODS LB RB P3 P2 P1 WATER BODY • Use of omnidirectional images • Challenges ◦ Unstructured environment ◦ No prior learning (unlike navigation/localization) • Target application and objective ◦ Wearable computing community, emphasis on visually challenged people ◦ Real-time operation Natural Scene Categorization – p.4/32
  5. 5. Qualitative Visual Environment System: Overview • Environment representation • Environment retrieval ◦ View partitioning ◦ Feature extraction ◦ Node annotation ◦ Dynamic node annotation ◦ Real-time operation • Results Natural Scene Categorization – p.5/32
  6. 6. System Overview (contd.) • Environment representation ◦ Image database containing images belonging to 6 classes: Lawns(L), Woods(W), Buildings(B), Waterbodies(H), Roads(R) and Traffic(T) ◦ Moderately large intra-class variance (in the feature space) in images of each category ◦ Description relative to the person using the system: e.g., ‘to left of’, ‘in the front’, etc. ◦ Topological relationships indicated by a graph ◦ Each node annotated by an identifier associated with a class Natural Scene Categorization – p.6/32
  7. 7. System Overview (contd.) • Environment Retrieval ◦ View Partitioning FORWARD DIRECTION FR FR LT RT             LT RT   ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡   LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ XX LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ BS BS XX BACKWARD DIRECTION View Partitioning Graphical representation Natural Scene Categorization – p.7/32
  8. 8. System Overview (contd.) • Environment Retrieval ◦ View Partitioning FORWARD DIRECTION FR FR LT RT             LT RT   ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡     ¡   ¡   ¡   ¡   ¡   LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ XX LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ BS BS XX BACKWARD DIRECTION View Partitioning Graphical representation ◦ Feature Extraction • Feature invariant to scaling, viewpoint, illumination changes, and geometric warping introduced by omnicam images • Colour histogram selected as the feature for performing CBIR Natural Scene Categorization – p.7/32
  9. 9. System Overview (contd.) • Environment Retrieval ◦ Node annotation • Objective: Robust retrieval against illumination changes and intra-class variations • Solution: Annotation decided by a simple voting scheme Natural Scene Categorization – p.8/32
  10. 10. System Overview (contd.) • Environment Retrieval ◦ Node annotation • Objective: Robust retrieval against illumination changes and intra-class variations • Solution: Annotation decided by a simple voting scheme ◦ Dynamic node annotation • Temporal evolution of graph Gn with time tn • Complete temporal evolution of the graph given by G, obtained by concatenating the subgraphs Gn , i.e.,G = {G1 , G2 , . . . , Gk , . . .} Natural Scene Categorization – p.8/32
  11. 11. System Overview (contd.) • Environment Retrieval ◦ Real-time operation • Colour histogram: compact feature vector • Pre-computed histograms of all the database images • Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼ 100 ms for single omnicam image Natural Scene Categorization – p.9/32
  12. 12. System Overview (contd.) • Environment Retrieval ◦ Real-time operation • Colour histogram: compact feature vector • Pre-computed histograms of all the database images • Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼ 100 ms for single omnicam image ◦ Portable, low-cost system for visually impaired • Modest hardware and software requirements • Easily put together using off-the-shelf components Natural Scene Categorization – p.9/32
  13. 13. System Overview (contd.) • Results ◦ Cylindrical concentric mosaics Natural Scene Categorization – p.10/32
  14. 14. System Overview (contd.) • Results ◦ Cylindrical concentric mosaics Natural Scene Categorization – p.10/32
  15. 15. System Overview (contd.) • Results ◦ Still omnicam image Natural Scene Categorization – p.11/32
  16. 16. System Overview (contd.) • Results ◦ Still omnicam image Natural Scene Categorization – p.11/32
  17. 17. System Overview (contd.) • Results ◦ Omnivideo sequence FORWARD DIRECTION n W W W B W W W W 20 W W W W 15 W W W W W X W 10 n W B X 5 W W X W1 W X BACKWARD DIRECTION B X Natural Scene Categorization – p.12/32
  18. 18. System Overview (contd.) • Results ◦ Omnivideo sequence FORWARD DIRECTION n W W W B W W W W 20 W W W W 15 W W W W W X W 10 n W B X 5 W W X W1 W X BACKWARD DIRECTION B X R R R L L R 10 20 25 n 1 5 15 Natural Scene Categorization – p.12/32
  19. 19. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class Natural Scene Categorization – p.13/32
  20. 20. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class • Limitations 1. Limited discriminating power of global colour histogram (GCH) 2. Local colour histogram (LCH) based on tiling cannot be used 3. Each frame analyzed independently Natural Scene Categorization – p.13/32
  21. 21. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class • Limitations 1. Limited discriminating power of global colour histogram (GCH) 2. Local colour histogram (LCH) based on tiling cannot be used 3. Each frame analyzed independently • Possible solutions 1. Adding memory to the system 2. Clustering scheme before computing similarity measure Natural Scene Categorization – p.13/32
  22. 22. Method I. Adding memory to the system • System uses only the current observation in labeling • Good idea to use all observations upto the current one • Desired: A recursive implementation to calculate the posterior (should be able to do it in real-time!) • Hidden Markov Model: Parameter estimation using Kevin Murphy’s HMM toolkit Natural Scene Categorization – p.14/32
  23. 23. Method I. Adding memory to the system • System uses only the current observation in labeling • Good idea to use all observations upto the current one • Desired: A recursive implementation to calculate the posterior (should be able to do it in real-time!) • Hidden Markov Model: Parameter estimation using Kevin Murphy’s HMM toolkit • Challenges 1. Estimation of the transition matrix- possible solution is to use limited classes 2. Enormous training data required Natural Scene Categorization – p.14/32
  24. 24. Adding memory. . . (Results) • Improved confidence in the results. However, negligible improvement in the accuracy • Reasons for poor performance ◦ Limited number of transitions in categories (as opposed to locations ◦ Typical training data for HMMs is thousands of labels: difficult to collect such vast data • Limitation: Makes the system dependent on the system dependent on the training sequence Natural Scene Categorization – p.15/32
  25. 25. Method II. Preclustering the image • Presence of clutter, images from a broad domain • Premise: The part of the image indicative of the semantic category forms a distinct part in the feature space Some test images belonging to the ‘Water-bodies’ category • Possible solution: segment out the clutter in the scene Natural Scene Categorization – p.16/32
  26. 26. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Natural Scene Categorization – p.17/32
  27. 27. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Results of K means clustering on the test images Natural Scene Categorization – p.17/32
  28. 28. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Results of K means clustering on the test images • Results ◦ Accuracy improves significantly– for ‘water-bodies’ class improvement from 25% to about 72% • Limitations: What about, say, a traffic scene?! Natural Scene Categorization – p.17/32
  29. 29. Model-based approaches • Stochastic models used to learn semantic concepts from training images • Use of normal perspective images • Use of local image features • Two models examined 1. probabilistic Latent Semantic Analysis (pLSA) 2. Maximum entropy models • Use of the ‘bag of words’ approach Natural Scene Categorization – p.18/32
  30. 30. Bag of words approach • Local features more robust to occlusions and spatial variations • Image represented as a collection of local patches • Image patches are members of a learned (visual) vocabulary • Positional relationships not considered! • Data representation by a co-occurrence matrix Natural Scene Categorization – p.19/32
  31. 31. Bag of words approach • Local features more robust to occlusions and spatial variations • Image represented as a collection of local patches • Image patches are members of a learned (visual) vocabulary • Positional relationships not considered! • Data representation by a co-occurrence matrix • Notation ◦ D = {d1 , . . . , dN } : corpus of documents ◦ W = {w1 , . . . , wM } : dictionary of words ◦ Z = {z1 , . . . , zK } : (latent) topic variables ◦ N = {n(w, d)}: co-occurrence table Natural Scene Categorization – p.19/32
  32. 32. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) Natural Scene Categorization – p.20/32
  33. 33. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) • Joint observation probability P (d, w) = P (d)P (w|d), where P (w|d) = z∈Z P (w|z)P (z|d) Natural Scene Categorization – p.20/32
  34. 34. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) • Joint observation probability P (d, w) = P (d)P (w|d), where P (w|d) = z∈Z P (w|z)P (z|d) • Modeling assumptions 1. Observation pairs (d, w) generated independently 2. Conditional independence assumption P (w, d|z) = P (w|z)P (d|z) Natural Scene Categorization – p.20/32
  35. 35. pLSA model . . . • Model fitting ◦ Maximize the log-likelihood function L = d∈D w∈W n(d, w)logP (d, w) ◦ Minimizing the KL divergence between the empirical distribution and the model ◦ EM algorithm to learn model parameters Natural Scene Categorization – p.21/32
  36. 36. pLSA model . . . • Model fitting ◦ Maximize the log-likelihood function L = d∈D w∈W n(d, w)logP (d, w) ◦ Minimizing the KL divergence between the empirical distribution and the model ◦ EM algorithm to learn model parameters • Evaluating model on unseen test images ◦ P (w|z) and P (z|d) learned from the training dataset ◦ ‘Fold-in’ heuristic for categorization: learned factors P (w|z) are kept fixed, mixing coefficients P (z|dtest ) are estimated using the EM iterations Natural Scene Categorization – p.21/32
  37. 37. pLSA model . . . • Details of experiment to evaluate model ◦ 5 categories: houses, forests, mountains, streets and beaches ◦ Image dataset: COREL photo CDs, images from internet search engines, and personal image collections ◦ 100 images of each category ◦ Modifications in Rob Fergus’s code for the experiments ◦ 128-dim SIFT feature used to represent a patch ◦ Visual codebook with 125 entries • Image annotation z = arg maxi P (zi |dtest ) ˆ Natural Scene Categorization – p.22/32
  38. 38. pLSA model. . . Results • 50 runs of the experiment: with random partitioning on each run • Vastly different accuracy on different runs: best case ∼ 46%, and worst case 5% Natural Scene Categorization – p.23/32
  39. 39. pLSA model. . . Results • 50 runs of the experiment: with random partitioning on each run • Vastly different accuracy on different runs: best case ∼ 46%, and worst case 5% • Analysis of the results ◦ Confusion matrix gives us further insights ◦ Most of the labeling errors occur between houses and streets ◦ Ambiguity between mountains and forests Natural Scene Categorization – p.23/32
  40. 40. Results using the pLSA model Figure 0: Some images that were wrongly anno- tated by our system Natural Scene Categorization – p.24/32
  41. 41. Results of the pLSA model . . . • Comparison with the naive Bayes’ classifier Figure 0: Confusion matrices for the pLSA and naive Bayes models • 10-fold cross validation test on the same dataset: mean accuracy: ∼ 66% Natural Scene Categorization – p.25/32
  42. 42. Analysis of our results • Reasons for poor performance ◦ Model convergence! ◦ Local optima problem in the EM algorithm ◦ Optimum value of the objective function depends on the initialized values ◦ We initialize the algorithm randomly at each run! Natural Scene Categorization – p.26/32
  43. 43. Analysis of our results • Reasons for poor performance ◦ Model convergence! ◦ Local optima problem in the EM algorithm ◦ Optimum value of the objective function depends on the initialized values ◦ We initialize the algorithm randomly at each run! • Possible solution: Deterministic annealing EM (DAEM) algorithm • Even with DAEM no guarantee of converging to the global optimal solution Natural Scene Categorization – p.26/32
  44. 44. Maximum entropy models • Maximum entropy prefers a uniform distribution when no data are available • Best model is the one that is: 1. Consistent with the constraints imposed by training data 2. Makes as few assumptions as possible • Training dataset: {(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}, where xi represents an image and yi represents a label • Predicate functions ◦ Unigram predicate: co-occurrence statistics of a word and a label 1 if y =LABEL and v1 ∈ x fv1 ,LABEL (x, y) = 0 otherwise Natural Scene Categorization – p.27/32
  45. 45. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt Natural Scene Categorization – p.28/32
  46. 46. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt • Model fitting: expected value of the predicate function w.r.t. to the stochastic model should equal the expected value of the predicate measured from the training data • Constrained optimization problem Maximize H(p) = − x,y p(x)p(y|x)logp(y|x) ˜ s.t. x,y p(x, y)f (x, y) = ˜ x,y p(x)p(y|x)f (x, y) ˜ Natural Scene Categorization – p.28/32
  47. 47. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt • Model fitting: expected value of the predicate function w.r.t. to the stochastic model should equal the expected value of the predicate measured from the training data • Constrained optimization problem Maximize H(p) = − x,y p(x)p(y|x)logp(y|x) ˜ s.t. x,y p(x, y)f (x, y) = ˜ x,y p(x)p(y|x)f (x, y) ˜ • p(y|x) = 1 k Z(x) exp i=1 λi fi (x, y) Natural Scene Categorization – p.28/32
  48. 48. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit Natural Scene Categorization – p.29/32
  49. 49. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit • 25-fold cross-validation accuracy: ∼ 70% • The second best label is often the correct label: accuracy improves to 85% Natural Scene Categorization – p.29/32
  50. 50. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit • 25-fold cross-validation accuracy: ∼ 70% • The second best label is often the correct label: accuracy improves to 85% Figure 1: Confusion matrices for the maximum entropy and naive Bayes models Natural Scene Categorization – p.29/32
  51. 51. A comparative study Method # of catg. training # per catg. perf(%) Maximum entropy 5 50 70 pLSA 5 50 46 Naive Bayes’ classifier 5 50 66 Fei-Fei 13 100 64 Vogel 6 ∼100 89.3 Vogel 6 ∼100 67.2 Oliva 8 250 ∼ 300 89 Table 0: A performance comparison with other studies reported in literature. Natural Scene Categorization – p.30/32
  52. 52. Future Work • Further investigations into the pLSA model • Issue of model convergence • DAEM algorithm is not the ideal solution • Using a richer feature set, e.g., bank of Gabor filters • For maximum entropy models, ways to define predicates that will capture semantic information better Natural Scene Categorization – p.31/32
  53. 53. THANK YOU Natural Scene Categorization – p.32/32

×