Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

1,134 views

Published on

A slide-deck of my graduate thesis presentation at IIT-Bombay (2006).

No Downloads

Total views

1,134

On SlideShare

0

From Embeds

0

Number of Embeds

16

Shares

0

Downloads

32

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Novel Approaches to Natural Scene Categorization Amit Prabhudesai Roll No. 04307002 amitp@ee.iitb.ac.in M.Tech Thesis Defence Under the guidance of Prof. Subhasis Chaudhuri Indian Institute of Technology, Bombay Natural Scene Categorization – p.1/32
- 2. Overview of topics to be covered • Natural Scene Categorization: Challenges • Our contribution ◦ Qualitative visual environment description • Portable, real-time system to aid the visually impaired • System has peripheral vision! ◦ Model-based approaches • Use of stochastic models to capture semantics • pLSA and maximum entropy models • Conclusions and Future Work Natural Scene Categorization – p.2/32
- 3. Natural Scene Categorization • Interesting application of a CBIR system • Images from a broad image domain: diverse and often ambiguous • Bridging the semantic gap • Grouping scenes into semantically meaningful categories could aid further retrieval • Efﬁcient schemes for grouping images into semantic categories Natural Scene Categorization – p.3/32
- 4. Qualitative Visual Environment Retrieval BUILDING SKY LAWN FR LT RT WOODS LB RB P3 P2 P1 WATER BODY • Use of omnidirectional images • Challenges ◦ Unstructured environment ◦ No prior learning (unlike navigation/localization) • Target application and objective ◦ Wearable computing community, emphasis on visually challenged people ◦ Real-time operation Natural Scene Categorization – p.4/32
- 5. Qualitative Visual Environment System: Overview • Environment representation • Environment retrieval ◦ View partitioning ◦ Feature extraction ◦ Node annotation ◦ Dynamic node annotation ◦ Real-time operation • Results Natural Scene Categorization – p.5/32
- 6. System Overview (contd.) • Environment representation ◦ Image database containing images belonging to 6 classes: Lawns(L), Woods(W), Buildings(B), Waterbodies(H), Roads(R) and Trafﬁc(T) ◦ Moderately large intra-class variance (in the feature space) in images of each category ◦ Description relative to the person using the system: e.g., ‘to left of’, ‘in the front’, etc. ◦ Topological relationships indicated by a graph ◦ Each node annotated by an identiﬁer associated with a class Natural Scene Categorization – p.6/32
- 7. System Overview (contd.) • Environment Retrieval ◦ View Partitioning FORWARD DIRECTION FR FR LT RT LT RT ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ XX LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ BS BS XX BACKWARD DIRECTION View Partitioning Graphical representation Natural Scene Categorization – p.7/32
- 8. System Overview (contd.) • Environment Retrieval ◦ View Partitioning FORWARD DIRECTION FR FR LT RT LT RT ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ XX LB RB £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ £ ¢ BS BS XX BACKWARD DIRECTION View Partitioning Graphical representation ◦ Feature Extraction • Feature invariant to scaling, viewpoint, illumination changes, and geometric warping introduced by omnicam images • Colour histogram selected as the feature for performing CBIR Natural Scene Categorization – p.7/32
- 9. System Overview (contd.) • Environment Retrieval ◦ Node annotation • Objective: Robust retrieval against illumination changes and intra-class variations • Solution: Annotation decided by a simple voting scheme Natural Scene Categorization – p.8/32
- 10. System Overview (contd.) • Environment Retrieval ◦ Node annotation • Objective: Robust retrieval against illumination changes and intra-class variations • Solution: Annotation decided by a simple voting scheme ◦ Dynamic node annotation • Temporal evolution of graph Gn with time tn • Complete temporal evolution of the graph given by G, obtained by concatenating the subgraphs Gn , i.e.,G = {G1 , G2 , . . . , Gk , . . .} Natural Scene Categorization – p.8/32
- 11. System Overview (contd.) • Environment Retrieval ◦ Real-time operation • Colour histogram: compact feature vector • Pre-computed histograms of all the database images • Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼ 100 ms for single omnicam image Natural Scene Categorization – p.9/32
- 12. System Overview (contd.) • Environment Retrieval ◦ Real-time operation • Colour histogram: compact feature vector • Pre-computed histograms of all the database images • Linear time complexity (O(N)): on P-IV 2.0 GHz, ∼ 100 ms for single omnicam image ◦ Portable, low-cost system for visually impaired • Modest hardware and software requirements • Easily put together using off-the-shelf components Natural Scene Categorization – p.9/32
- 13. System Overview (contd.) • Results ◦ Cylindrical concentric mosaics Natural Scene Categorization – p.10/32
- 14. System Overview (contd.) • Results ◦ Cylindrical concentric mosaics Natural Scene Categorization – p.10/32
- 15. System Overview (contd.) • Results ◦ Still omnicam image Natural Scene Categorization – p.11/32
- 16. System Overview (contd.) • Results ◦ Still omnicam image Natural Scene Categorization – p.11/32
- 17. System Overview (contd.) • Results ◦ Omnivideo sequence FORWARD DIRECTION n W W W B W W W W 20 W W W W 15 W W W W W X W 10 n W B X 5 W W X W1 W X BACKWARD DIRECTION B X Natural Scene Categorization – p.12/32
- 18. System Overview (contd.) • Results ◦ Omnivideo sequence FORWARD DIRECTION n W W W B W W W W 20 W W W W 15 W W W W W X W 10 n W B X 5 W W X W1 W X BACKWARD DIRECTION B X R R R L L R 10 20 25 n 1 5 15 Natural Scene Categorization – p.12/32
- 19. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class Natural Scene Categorization – p.13/32
- 20. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class • Limitations 1. Limited discriminating power of global colour histogram (GCH) 2. Local colour histogram (LCH) based on tiling cannot be used 3. Each frame analyzed independently Natural Scene Categorization – p.13/32
- 21. Analyzing our results • System accuracy: close to 70%– This is not enough! • Some scenes are inherently ambiguous! • Often the second best class is the correct class • Limitations 1. Limited discriminating power of global colour histogram (GCH) 2. Local colour histogram (LCH) based on tiling cannot be used 3. Each frame analyzed independently • Possible solutions 1. Adding memory to the system 2. Clustering scheme before computing similarity measure Natural Scene Categorization – p.13/32
- 22. Method I. Adding memory to the system • System uses only the current observation in labeling • Good idea to use all observations upto the current one • Desired: A recursive implementation to calculate the posterior (should be able to do it in real-time!) • Hidden Markov Model: Parameter estimation using Kevin Murphy’s HMM toolkit Natural Scene Categorization – p.14/32
- 23. Method I. Adding memory to the system • System uses only the current observation in labeling • Good idea to use all observations upto the current one • Desired: A recursive implementation to calculate the posterior (should be able to do it in real-time!) • Hidden Markov Model: Parameter estimation using Kevin Murphy’s HMM toolkit • Challenges 1. Estimation of the transition matrix- possible solution is to use limited classes 2. Enormous training data required Natural Scene Categorization – p.14/32
- 24. Adding memory. . . (Results) • Improved conﬁdence in the results. However, negligible improvement in the accuracy • Reasons for poor performance ◦ Limited number of transitions in categories (as opposed to locations ◦ Typical training data for HMMs is thousands of labels: difﬁcult to collect such vast data • Limitation: Makes the system dependent on the system dependent on the training sequence Natural Scene Categorization – p.15/32
- 25. Method II. Preclustering the image • Presence of clutter, images from a broad domain • Premise: The part of the image indicative of the semantic category forms a distinct part in the feature space Some test images belonging to the ‘Water-bodies’ category • Possible solution: segment out the clutter in the scene Natural Scene Categorization – p.16/32
- 26. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Natural Scene Categorization – p.17/32
- 27. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Results of K means clustering on the test images Natural Scene Categorization – p.17/32
- 28. Preclustering the image. . . • K means clustering of the image • Use only pixels from the largest cluster to compute the colour histogram Results of K means clustering on the test images • Results ◦ Accuracy improves signiﬁcantly– for ‘water-bodies’ class improvement from 25% to about 72% • Limitations: What about, say, a trafﬁc scene?! Natural Scene Categorization – p.17/32
- 29. Model-based approaches • Stochastic models used to learn semantic concepts from training images • Use of normal perspective images • Use of local image features • Two models examined 1. probabilistic Latent Semantic Analysis (pLSA) 2. Maximum entropy models • Use of the ‘bag of words’ approach Natural Scene Categorization – p.18/32
- 30. Bag of words approach • Local features more robust to occlusions and spatial variations • Image represented as a collection of local patches • Image patches are members of a learned (visual) vocabulary • Positional relationships not considered! • Data representation by a co-occurrence matrix Natural Scene Categorization – p.19/32
- 31. Bag of words approach • Local features more robust to occlusions and spatial variations • Image represented as a collection of local patches • Image patches are members of a learned (visual) vocabulary • Positional relationships not considered! • Data representation by a co-occurrence matrix • Notation ◦ D = {d1 , . . . , dN } : corpus of documents ◦ W = {w1 , . . . , wM } : dictionary of words ◦ Z = {z1 , . . . , zK } : (latent) topic variables ◦ N = {n(w, d)}: co-occurrence table Natural Scene Categorization – p.19/32
- 32. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) Natural Scene Categorization – p.20/32
- 33. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) • Joint observation probability P (d, w) = P (d)P (w|d), where P (w|d) = z∈Z P (w|z)P (z|d) Natural Scene Categorization – p.20/32
- 34. pLSA model . . . • Generative model ◦ select a document d with probability P (d) ◦ select a latent class z with probability P (z|d) ◦ select a word w with probability P (w|z) • Joint observation probability P (d, w) = P (d)P (w|d), where P (w|d) = z∈Z P (w|z)P (z|d) • Modeling assumptions 1. Observation pairs (d, w) generated independently 2. Conditional independence assumption P (w, d|z) = P (w|z)P (d|z) Natural Scene Categorization – p.20/32
- 35. pLSA model . . . • Model ﬁtting ◦ Maximize the log-likelihood function L = d∈D w∈W n(d, w)logP (d, w) ◦ Minimizing the KL divergence between the empirical distribution and the model ◦ EM algorithm to learn model parameters Natural Scene Categorization – p.21/32
- 36. pLSA model . . . • Model ﬁtting ◦ Maximize the log-likelihood function L = d∈D w∈W n(d, w)logP (d, w) ◦ Minimizing the KL divergence between the empirical distribution and the model ◦ EM algorithm to learn model parameters • Evaluating model on unseen test images ◦ P (w|z) and P (z|d) learned from the training dataset ◦ ‘Fold-in’ heuristic for categorization: learned factors P (w|z) are kept ﬁxed, mixing coefﬁcients P (z|dtest ) are estimated using the EM iterations Natural Scene Categorization – p.21/32
- 37. pLSA model . . . • Details of experiment to evaluate model ◦ 5 categories: houses, forests, mountains, streets and beaches ◦ Image dataset: COREL photo CDs, images from internet search engines, and personal image collections ◦ 100 images of each category ◦ Modiﬁcations in Rob Fergus’s code for the experiments ◦ 128-dim SIFT feature used to represent a patch ◦ Visual codebook with 125 entries • Image annotation z = arg maxi P (zi |dtest ) ˆ Natural Scene Categorization – p.22/32
- 38. pLSA model. . . Results • 50 runs of the experiment: with random partitioning on each run • Vastly different accuracy on different runs: best case ∼ 46%, and worst case 5% Natural Scene Categorization – p.23/32
- 39. pLSA model. . . Results • 50 runs of the experiment: with random partitioning on each run • Vastly different accuracy on different runs: best case ∼ 46%, and worst case 5% • Analysis of the results ◦ Confusion matrix gives us further insights ◦ Most of the labeling errors occur between houses and streets ◦ Ambiguity between mountains and forests Natural Scene Categorization – p.23/32
- 40. Results using the pLSA model Figure 0: Some images that were wrongly anno- tated by our system Natural Scene Categorization – p.24/32
- 41. Results of the pLSA model . . . • Comparison with the naive Bayes’ classiﬁer Figure 0: Confusion matrices for the pLSA and naive Bayes models • 10-fold cross validation test on the same dataset: mean accuracy: ∼ 66% Natural Scene Categorization – p.25/32
- 42. Analysis of our results • Reasons for poor performance ◦ Model convergence! ◦ Local optima problem in the EM algorithm ◦ Optimum value of the objective function depends on the initialized values ◦ We initialize the algorithm randomly at each run! Natural Scene Categorization – p.26/32
- 43. Analysis of our results • Reasons for poor performance ◦ Model convergence! ◦ Local optima problem in the EM algorithm ◦ Optimum value of the objective function depends on the initialized values ◦ We initialize the algorithm randomly at each run! • Possible solution: Deterministic annealing EM (DAEM) algorithm • Even with DAEM no guarantee of converging to the global optimal solution Natural Scene Categorization – p.26/32
- 44. Maximum entropy models • Maximum entropy prefers a uniform distribution when no data are available • Best model is the one that is: 1. Consistent with the constraints imposed by training data 2. Makes as few assumptions as possible • Training dataset: {(x1 , y1 ), (x2 , y2 ), . . . , (xN , yN )}, where xi represents an image and yi represents a label • Predicate functions ◦ Unigram predicate: co-occurrence statistics of a word and a label 1 if y =LABEL and v1 ∈ x fv1 ,LABEL (x, y) = 0 otherwise Natural Scene Categorization – p.27/32
- 45. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt Natural Scene Categorization – p.28/32
- 46. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt • Model ﬁtting: expected value of the predicate function w.r.t. to the stochastic model should equal the expected value of the predicate measured from the training data • Constrained optimization problem Maximize H(p) = − x,y p(x)p(y|x)logp(y|x) ˜ s.t. x,y p(x, y)f (x, y) = ˜ x,y p(x)p(y|x)f (x, y) ˜ Natural Scene Categorization – p.28/32
- 47. Maximum entropy models . . . • Notation ◦ f : predicate function ◦ p(x, y): empirical distribution of the observed pairs ˜ ◦ p(y|x): stochastic model to be learnt • Model ﬁtting: expected value of the predicate function w.r.t. to the stochastic model should equal the expected value of the predicate measured from the training data • Constrained optimization problem Maximize H(p) = − x,y p(x)p(y|x)logp(y|x) ˜ s.t. x,y p(x, y)f (x, y) = ˜ x,y p(x)p(y|x)f (x, y) ˜ • p(y|x) = 1 k Z(x) exp i=1 λi fi (x, y) Natural Scene Categorization – p.28/32
- 48. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit Natural Scene Categorization – p.29/32
- 49. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit • 25-fold cross-validation accuracy: ∼ 70% • The second best label is often the correct label: accuracy improves to 85% Natural Scene Categorization – p.29/32
- 50. Results for the maximum entropy model • Same dataset, feature and codebook as used for the pLSA experiment • Evaluation using Zhang Le’s maximum entropy toolkit • 25-fold cross-validation accuracy: ∼ 70% • The second best label is often the correct label: accuracy improves to 85% Figure 1: Confusion matrices for the maximum entropy and naive Bayes models Natural Scene Categorization – p.29/32
- 51. A comparative study Method # of catg. training # per catg. perf(%) Maximum entropy 5 50 70 pLSA 5 50 46 Naive Bayes’ classiﬁer 5 50 66 Fei-Fei 13 100 64 Vogel 6 ∼100 89.3 Vogel 6 ∼100 67.2 Oliva 8 250 ∼ 300 89 Table 0: A performance comparison with other studies reported in literature. Natural Scene Categorization – p.30/32
- 52. Future Work • Further investigations into the pLSA model • Issue of model convergence • DAEM algorithm is not the ideal solution • Using a richer feature set, e.g., bank of Gabor ﬁlters • For maximum entropy models, ways to deﬁne predicates that will capture semantic information better Natural Scene Categorization – p.31/32
- 53. THANK YOU Natural Scene Categorization – p.32/32

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment