ANOVA COSINE SIMILARITY BASED
IMAGE RECOMMENDATION
Presented by
Thakur Ganeshsingh
CONTENTS
Introduction
Motivation
Related Work
Architecture
Algorithm
Experimental
Results
Conclusions
References
2Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Speaks about content retrieval methods and need for them
Speaks about limitations of the existing methods
Prior technologies and their shortcomings
Proposed system architecture and its components
ANOVA Cosine Algorithm
Comparison with existing methods to prove accuracy
Conclusions based on the observed results
References used throughout research
 Revolution in digitization, increasing number of users over web is resulting in availability
huge amount of multimedia content over web.
 As the content available online is increasing so is the need for faster and efficient methods for
information retrieval.
 To overcome the problem of content retrieval various solutions have been proposed including
Text-based And Content Based Search Engines.
Despite of the advancements in technologies and methodologies still they fail due to meet users
search requirements due to following
 Improper search queries
 Lack of users understanding about query
 Wrongly tagged images / content
3Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Speaks about content retrieval methods and need for themIntroduction
4Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Speaks about limitations of the existing methodsMotivation
 Search Engines and their limitations :
 Text Based Search Engines :
 Accept the textual user input queries to perform search
 Fast with lower response time, easy to use, used widely
 Fails to retrieve the content when query term is not present in content metadata
 Fails to explain behavior in case of wrongly tagged images
 Content Based Search Engines :
 Accept the image user input queries to perform search
 Application specific and demands proper input query image to get accurate results
 High computation time hence limited scope
 Fails to explain behavior when input query image is not present in database.
5Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Speaks about limitations of the existing methods
Motivation
contd..
 Vertical Search Engines
 Site specific search engines listing the product specific query terms
 Involves product specific query terms are used, hence higher semantic gap
 To overcome the problems
 In order to remove the limitations of the existing methods we need a system that should be
 Text Based Search Utility
 Lower Response Time
 Low Computation Cost
 Reduced Semantic Gap with Higher Relevance
 Easily Scalable and configurable
6Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Prior technologies and their shortcomingsRelated Work
 Advanced Text Based Search
 Cui et al. [1] have proposed a hybrid method to re-rank Google search results.
 Based on intention category model
 Used to integrate visual features adaptive to the input image.
 Image features are combined with similarity measure to re-rank images.
 Generic Classifier is developed to classify images.
 Hybrid Image Search [2][3][4][5][6]
 Based on Clustering algorithms
 Images are loaded from clusters based on input query term
 Involved cluster management as one image can belong to multiple clusters.
7Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
 Content Based Hybrid Search Methods [7][8] :
 Features from each image are extracted in order to form the offline feature dataset
 Results are extracted based on the visual features and compared with k-mean clusters to reduce
semantic gap where k- is the number of images
 Visual meaning is extracted by computing p-values using Kolmogorov – Smirnov test
 Based on visual meaning the visual synonyms are formed and are used to form expanded queries
Prior technologies and their shortcomings
Related Work
Contd..
8Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its componentsArchitecture
 Problem Definition
 ANOVA Cosine Framework
 Data Collection Phase
 Weight Calculation
 Image processor
 Weight calculator
 Term similarity calculator
 Similarity Module
 Visual synonyms calculator
 Search Module
 Text Based Search
 Pair-wise image similarity calculator and ranking images based on similarity score
 Search User Interface
9Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its components
Architecture
Contd..
10Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its components
Architecture
Contd..
 Problem Definition
 On a domain specific website, for given a user input query q, the objective is to recommend products.
 ANOVA Cosine Framework
1. Data Collection Phase
 A pair of customized crawler-parser is used to fetch product specific pages from online retailer
website.
 Non-search specific product details and stop-words are removed from metadata by using a
customized text parser.
2. Weight Calculation
 Image processor
 For each input crawled image Gray-level Co-occurrence matrix is used to extract texture
features. Further Haralick, Tamura, Gabor and Color features are extracted and are stored for
future use.
11Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its components
Architecture
Contd..
 Weight calculator
 ANOVA p-values are calculated for each feature value per image, inverted p-values are
used as weight of the visual features
 The extracted p-values are extremely small hence are scaled to make them effective
 Term similarity calculator
3. Similarity Module
 Pair-wise term semantic similarity score Tsim is computed with term similarity calculator.
 Visual synonyms calculator
 For each term Ta, maximum similarity score maxsim for pairs (Ta, Tb) is selected. This
maxsim is used to set a selection threshold Tsel calculated using Equation 2.
 All the term pairs with similarity score exceeding Tsel are selected as visually similar
semantic synonyms of term Ta.
Tsel = Th ∗ max(Tsim(Ta, Tb))
Here, Th is a range threshold set for selection of similarity score
12Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its components
Architecture
Contd..
4. Search Module
 Consists of two phase image recommendation algorithm explained as follows
 Text Based Search
 User input text queries are accepted from search UI, based on which text-based search is
performed to obtain initial sample set.
 Pair-wise image similarity is calculated using cosine image similarity method based on
image visual feature for each input output image in initial sample space.
 Pair-wise image similarity calculator and ranking images based on similarity score
 Search results are re-ranked based by matching with the pivotal image pair by applying
iterated cosine similarity on images in sample space.
5. Search User Interface
 Generic Search Engine User interface provided to accept inputs from user in textual format.
13Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
ANOVA Cosine AlgorithmAlgorithm
 Search Algorithm
14Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Comparison with existing methods to prove accuracy
Experimental
Results
 Data Collection
 In this experiment, 5582 images are crawled from e-commerce website myntra.com using a
customized crawler. The crawled images were passed through image processor to compute image
features. From extracted metadata using a customized parser 589 keywords were extracted.
 The process is repeated for newly added images over consistency threshold.
 Experimental Set-up
 ACS is text based search method with visual synonyms, iLike was considered for
performance comparison.
15Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Comparison with existing methods to prove accuracy
Experimental
Results contd..
 Performance Evaluation
 The experiments carried out on a 4GB DDR2 RAM, Intel(R) Core(TM) i5 @2.40GHz processor
system.
 Ranking of top-10 recommended images is considered as a performance metric. Hundred test queries
are used for evaluation. 100 graduate students are invited to evaluate relevance of recommended images.
 Each user is allocated two queries and asked to evaluate relevance of ranked recommended images
with the relevance score between 0 to 1.
 Here, 0 and 1 indicate totally irrelevant and highly relevant images respectively.
 Mean values of users’ relevance score are computed for top-1 to top-10 images.
 The mean of relevance score of ranked images of ACS method is better by 15.26% for top-10 images
in comparison with the iLike method.
16Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Comparison with existing methods to prove accuracy
Experimental
Results contd..
 Performance Evaluation Contd..
17Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Conclusions based on the observed resultsConclusions
 In our work we have proposed ANOVA Cosine Similarity framework to recommend images in
Vertical Image Search.
 Experiments are conducted on crawled image data from myntra.comwebsite and results are
compared with iLike method.
 Relevance score is used to evaluate quality of ranked images, which is evaluated manually
with the help of users.
 The accuracy of relevance score of ASC increases by 15.26% for top-10 recommended images
in comparison with iLike.
18Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
References used throughout researchReferences
1. J. Cui, F. Wen and X. Tang, Real Time Google and Live Image Search Re-ranking, In the Proceedings of the
16th ACM International Conference on Multimedia, pp. 729–732, (2008).
2. F. Jing, C.Wang, Y. Yao, K. Deng, L. Zhang andW. Y. Ma, IGroup: AWeb Image Search Engine with
Semantic Clustering of Search Results, In the Proceedings of the 14th Annual ACM International Conference
on Multimedia, pp. 497–498, (2006).
3. B. Luo, X. Wang and X. Tang, World Wide Web based Image Search Engine using Text and Image Content
Features, Electronic Imaging 2003, pp. 123–130, (2003).
4. N. Ben-Haim, B. Babenko and S. Belongie, Improving Web-based Image Search via Content based
Clustering, In the Proceedings of International Conference on Computer Vision and Pattern Recognition
Workshop, pp. 106–106, (2006).
5. D. Sejal, V. Rashmi, D. Anveker, K. R. Venugopal, S. S. Iyengar and L. M. Patnaik, IRAbMC: Image
Recommendation with Absorbing Markov Chain, In 2015 Annual IEEE India Conference (INDICON), pp. 1–6,
December (2015).
19Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
References used throughout researchReferences
6. K. B. Raja, N. Shankar, K. R. Venugopal and L. M. Patnaik, Steg analysis of LSB Embedded Images using
Variable Threshold Color Pair Analysis, International Journal of Information Processing, vol. 1, no. 1, pp. 24–
31, (2007).
7. Y. Chen, N. Yu, B. Luo and X. W. Chen, iLike: Integrating Visual and Textual Features for Vertical Search,
In the Proceedings of the International Conference on Multimedia, pp. 221–230, (2010).
8. Y. Chen, H. Sampathkumar, B. Luo and X. W. Chen, iLike: Bridging the Semantic Gap in Vertical Image
Search by Integrating Text and Visual Features, IEEE Transactions on Knowledge and Data Engineering, vol.
25, no. 10, pp. 2257–2270, (2013).
Queries???
20Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Thank you 
21Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore

Anova cosine similarity based image recommendation

  • 1.
    ANOVA COSINE SIMILARITYBASED IMAGE RECOMMENDATION Presented by Thakur Ganeshsingh
  • 2.
    CONTENTS Introduction Motivation Related Work Architecture Algorithm Experimental Results Conclusions References 2Tuesday, August16, 2016Department of CSE, UVCE, Bangalore Speaks about content retrieval methods and need for them Speaks about limitations of the existing methods Prior technologies and their shortcomings Proposed system architecture and its components ANOVA Cosine Algorithm Comparison with existing methods to prove accuracy Conclusions based on the observed results References used throughout research
  • 3.
     Revolution indigitization, increasing number of users over web is resulting in availability huge amount of multimedia content over web.  As the content available online is increasing so is the need for faster and efficient methods for information retrieval.  To overcome the problem of content retrieval various solutions have been proposed including Text-based And Content Based Search Engines. Despite of the advancements in technologies and methodologies still they fail due to meet users search requirements due to following  Improper search queries  Lack of users understanding about query  Wrongly tagged images / content 3Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore Speaks about content retrieval methods and need for themIntroduction
  • 4.
    4Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Speaks about limitations of the existing methodsMotivation  Search Engines and their limitations :  Text Based Search Engines :  Accept the textual user input queries to perform search  Fast with lower response time, easy to use, used widely  Fails to retrieve the content when query term is not present in content metadata  Fails to explain behavior in case of wrongly tagged images  Content Based Search Engines :  Accept the image user input queries to perform search  Application specific and demands proper input query image to get accurate results  High computation time hence limited scope  Fails to explain behavior when input query image is not present in database.
  • 5.
    5Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Speaks about limitations of the existing methods Motivation contd..  Vertical Search Engines  Site specific search engines listing the product specific query terms  Involves product specific query terms are used, hence higher semantic gap  To overcome the problems  In order to remove the limitations of the existing methods we need a system that should be  Text Based Search Utility  Lower Response Time  Low Computation Cost  Reduced Semantic Gap with Higher Relevance  Easily Scalable and configurable
  • 6.
    6Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Prior technologies and their shortcomingsRelated Work  Advanced Text Based Search  Cui et al. [1] have proposed a hybrid method to re-rank Google search results.  Based on intention category model  Used to integrate visual features adaptive to the input image.  Image features are combined with similarity measure to re-rank images.  Generic Classifier is developed to classify images.  Hybrid Image Search [2][3][4][5][6]  Based on Clustering algorithms  Images are loaded from clusters based on input query term  Involved cluster management as one image can belong to multiple clusters.
  • 7.
    7Tuesday, August 16,2016Department of CSE, UVCE, Bangalore  Content Based Hybrid Search Methods [7][8] :  Features from each image are extracted in order to form the offline feature dataset  Results are extracted based on the visual features and compared with k-mean clusters to reduce semantic gap where k- is the number of images  Visual meaning is extracted by computing p-values using Kolmogorov – Smirnov test  Based on visual meaning the visual synonyms are formed and are used to form expanded queries Prior technologies and their shortcomings Related Work Contd..
  • 8.
    8Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Proposed system architecture and its componentsArchitecture  Problem Definition  ANOVA Cosine Framework  Data Collection Phase  Weight Calculation  Image processor  Weight calculator  Term similarity calculator  Similarity Module  Visual synonyms calculator  Search Module  Text Based Search  Pair-wise image similarity calculator and ranking images based on similarity score  Search User Interface
  • 9.
    9Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Proposed system architecture and its components Architecture Contd..
  • 10.
    10Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Proposed system architecture and its components Architecture Contd..  Problem Definition  On a domain specific website, for given a user input query q, the objective is to recommend products.  ANOVA Cosine Framework 1. Data Collection Phase  A pair of customized crawler-parser is used to fetch product specific pages from online retailer website.  Non-search specific product details and stop-words are removed from metadata by using a customized text parser. 2. Weight Calculation  Image processor  For each input crawled image Gray-level Co-occurrence matrix is used to extract texture features. Further Haralick, Tamura, Gabor and Color features are extracted and are stored for future use.
  • 11.
    11Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Proposed system architecture and its components Architecture Contd..  Weight calculator  ANOVA p-values are calculated for each feature value per image, inverted p-values are used as weight of the visual features  The extracted p-values are extremely small hence are scaled to make them effective  Term similarity calculator 3. Similarity Module  Pair-wise term semantic similarity score Tsim is computed with term similarity calculator.  Visual synonyms calculator  For each term Ta, maximum similarity score maxsim for pairs (Ta, Tb) is selected. This maxsim is used to set a selection threshold Tsel calculated using Equation 2.  All the term pairs with similarity score exceeding Tsel are selected as visually similar semantic synonyms of term Ta. Tsel = Th ∗ max(Tsim(Ta, Tb)) Here, Th is a range threshold set for selection of similarity score
  • 12.
    12Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Proposed system architecture and its components Architecture Contd.. 4. Search Module  Consists of two phase image recommendation algorithm explained as follows  Text Based Search  User input text queries are accepted from search UI, based on which text-based search is performed to obtain initial sample set.  Pair-wise image similarity is calculated using cosine image similarity method based on image visual feature for each input output image in initial sample space.  Pair-wise image similarity calculator and ranking images based on similarity score  Search results are re-ranked based by matching with the pivotal image pair by applying iterated cosine similarity on images in sample space. 5. Search User Interface  Generic Search Engine User interface provided to accept inputs from user in textual format.
  • 13.
    13Tuesday, August 16,2016Department of CSE, UVCE, Bangalore ANOVA Cosine AlgorithmAlgorithm  Search Algorithm
  • 14.
    14Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Comparison with existing methods to prove accuracy Experimental Results  Data Collection  In this experiment, 5582 images are crawled from e-commerce website myntra.com using a customized crawler. The crawled images were passed through image processor to compute image features. From extracted metadata using a customized parser 589 keywords were extracted.  The process is repeated for newly added images over consistency threshold.  Experimental Set-up  ACS is text based search method with visual synonyms, iLike was considered for performance comparison.
  • 15.
    15Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Comparison with existing methods to prove accuracy Experimental Results contd..  Performance Evaluation  The experiments carried out on a 4GB DDR2 RAM, Intel(R) Core(TM) i5 @2.40GHz processor system.  Ranking of top-10 recommended images is considered as a performance metric. Hundred test queries are used for evaluation. 100 graduate students are invited to evaluate relevance of recommended images.  Each user is allocated two queries and asked to evaluate relevance of ranked recommended images with the relevance score between 0 to 1.  Here, 0 and 1 indicate totally irrelevant and highly relevant images respectively.  Mean values of users’ relevance score are computed for top-1 to top-10 images.  The mean of relevance score of ranked images of ACS method is better by 15.26% for top-10 images in comparison with the iLike method.
  • 16.
    16Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Comparison with existing methods to prove accuracy Experimental Results contd..  Performance Evaluation Contd..
  • 17.
    17Tuesday, August 16,2016Department of CSE, UVCE, Bangalore Conclusions based on the observed resultsConclusions  In our work we have proposed ANOVA Cosine Similarity framework to recommend images in Vertical Image Search.  Experiments are conducted on crawled image data from myntra.comwebsite and results are compared with iLike method.  Relevance score is used to evaluate quality of ranked images, which is evaluated manually with the help of users.  The accuracy of relevance score of ASC increases by 15.26% for top-10 recommended images in comparison with iLike.
  • 18.
    18Tuesday, August 16,2016Department of CSE, UVCE, Bangalore References used throughout researchReferences 1. J. Cui, F. Wen and X. Tang, Real Time Google and Live Image Search Re-ranking, In the Proceedings of the 16th ACM International Conference on Multimedia, pp. 729–732, (2008). 2. F. Jing, C.Wang, Y. Yao, K. Deng, L. Zhang andW. Y. Ma, IGroup: AWeb Image Search Engine with Semantic Clustering of Search Results, In the Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 497–498, (2006). 3. B. Luo, X. Wang and X. Tang, World Wide Web based Image Search Engine using Text and Image Content Features, Electronic Imaging 2003, pp. 123–130, (2003). 4. N. Ben-Haim, B. Babenko and S. Belongie, Improving Web-based Image Search via Content based Clustering, In the Proceedings of International Conference on Computer Vision and Pattern Recognition Workshop, pp. 106–106, (2006). 5. D. Sejal, V. Rashmi, D. Anveker, K. R. Venugopal, S. S. Iyengar and L. M. Patnaik, IRAbMC: Image Recommendation with Absorbing Markov Chain, In 2015 Annual IEEE India Conference (INDICON), pp. 1–6, December (2015).
  • 19.
    19Tuesday, August 16,2016Department of CSE, UVCE, Bangalore References used throughout researchReferences 6. K. B. Raja, N. Shankar, K. R. Venugopal and L. M. Patnaik, Steg analysis of LSB Embedded Images using Variable Threshold Color Pair Analysis, International Journal of Information Processing, vol. 1, no. 1, pp. 24– 31, (2007). 7. Y. Chen, N. Yu, B. Luo and X. W. Chen, iLike: Integrating Visual and Textual Features for Vertical Search, In the Proceedings of the International Conference on Multimedia, pp. 221–230, (2010). 8. Y. Chen, H. Sampathkumar, B. Luo and X. W. Chen, iLike: Bridging the Semantic Gap in Vertical Image Search by Integrating Text and Visual Features, IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 10, pp. 2257–2270, (2013).
  • 20.
    Queries??? 20Tuesday, August 16,2016Department of CSE, UVCE, Bangalore
  • 21.
    Thank you  21Tuesday,August 16, 2016Department of CSE, UVCE, Bangalore