An efficient search algorithm to overcome the problems of Text Based Search Utility, Higher Response Time, High Computation Cost, Semantic Gap, Relevance etc for vertical image search.
2. CONTENTS
Introduction
Motivation
Related Work
Architecture
Algorithm
Experimental
Results
Conclusions
References
2Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Speaks about content retrieval methods and need for them
Speaks about limitations of the existing methods
Prior technologies and their shortcomings
Proposed system architecture and its components
ANOVA Cosine Algorithm
Comparison with existing methods to prove accuracy
Conclusions based on the observed results
References used throughout research
3. Revolution in digitization, increasing number of users over web is resulting in availability
huge amount of multimedia content over web.
As the content available online is increasing so is the need for faster and efficient methods for
information retrieval.
To overcome the problem of content retrieval various solutions have been proposed including
Text-based And Content Based Search Engines.
Despite of the advancements in technologies and methodologies still they fail due to meet users
search requirements due to following
Improper search queries
Lack of users understanding about query
Wrongly tagged images / content
3Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Speaks about content retrieval methods and need for themIntroduction
4. 4Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Speaks about limitations of the existing methodsMotivation
Search Engines and their limitations :
Text Based Search Engines :
Accept the textual user input queries to perform search
Fast with lower response time, easy to use, used widely
Fails to retrieve the content when query term is not present in content metadata
Fails to explain behavior in case of wrongly tagged images
Content Based Search Engines :
Accept the image user input queries to perform search
Application specific and demands proper input query image to get accurate results
High computation time hence limited scope
Fails to explain behavior when input query image is not present in database.
5. 5Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Speaks about limitations of the existing methods
Motivation
contd..
Vertical Search Engines
Site specific search engines listing the product specific query terms
Involves product specific query terms are used, hence higher semantic gap
To overcome the problems
In order to remove the limitations of the existing methods we need a system that should be
Text Based Search Utility
Lower Response Time
Low Computation Cost
Reduced Semantic Gap with Higher Relevance
Easily Scalable and configurable
6. 6Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Prior technologies and their shortcomingsRelated Work
Advanced Text Based Search
Cui et al. [1] have proposed a hybrid method to re-rank Google search results.
Based on intention category model
Used to integrate visual features adaptive to the input image.
Image features are combined with similarity measure to re-rank images.
Generic Classifier is developed to classify images.
Hybrid Image Search [2][3][4][5][6]
Based on Clustering algorithms
Images are loaded from clusters based on input query term
Involved cluster management as one image can belong to multiple clusters.
7. 7Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Content Based Hybrid Search Methods [7][8] :
Features from each image are extracted in order to form the offline feature dataset
Results are extracted based on the visual features and compared with k-mean clusters to reduce
semantic gap where k- is the number of images
Visual meaning is extracted by computing p-values using Kolmogorov – Smirnov test
Based on visual meaning the visual synonyms are formed and are used to form expanded queries
Prior technologies and their shortcomings
Related Work
Contd..
8. 8Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its componentsArchitecture
Problem Definition
ANOVA Cosine Framework
Data Collection Phase
Weight Calculation
Image processor
Weight calculator
Term similarity calculator
Similarity Module
Visual synonyms calculator
Search Module
Text Based Search
Pair-wise image similarity calculator and ranking images based on similarity score
Search User Interface
9. 9Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its components
Architecture
Contd..
10. 10Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its components
Architecture
Contd..
Problem Definition
On a domain specific website, for given a user input query q, the objective is to recommend products.
ANOVA Cosine Framework
1. Data Collection Phase
A pair of customized crawler-parser is used to fetch product specific pages from online retailer
website.
Non-search specific product details and stop-words are removed from metadata by using a
customized text parser.
2. Weight Calculation
Image processor
For each input crawled image Gray-level Co-occurrence matrix is used to extract texture
features. Further Haralick, Tamura, Gabor and Color features are extracted and are stored for
future use.
11. 11Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its components
Architecture
Contd..
Weight calculator
ANOVA p-values are calculated for each feature value per image, inverted p-values are
used as weight of the visual features
The extracted p-values are extremely small hence are scaled to make them effective
Term similarity calculator
3. Similarity Module
Pair-wise term semantic similarity score Tsim is computed with term similarity calculator.
Visual synonyms calculator
For each term Ta, maximum similarity score maxsim for pairs (Ta, Tb) is selected. This
maxsim is used to set a selection threshold Tsel calculated using Equation 2.
All the term pairs with similarity score exceeding Tsel are selected as visually similar
semantic synonyms of term Ta.
Tsel = Th ∗ max(Tsim(Ta, Tb))
Here, Th is a range threshold set for selection of similarity score
12. 12Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Proposed system architecture and its components
Architecture
Contd..
4. Search Module
Consists of two phase image recommendation algorithm explained as follows
Text Based Search
User input text queries are accepted from search UI, based on which text-based search is
performed to obtain initial sample set.
Pair-wise image similarity is calculated using cosine image similarity method based on
image visual feature for each input output image in initial sample space.
Pair-wise image similarity calculator and ranking images based on similarity score
Search results are re-ranked based by matching with the pivotal image pair by applying
iterated cosine similarity on images in sample space.
5. Search User Interface
Generic Search Engine User interface provided to accept inputs from user in textual format.
13. 13Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
ANOVA Cosine AlgorithmAlgorithm
Search Algorithm
14. 14Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Comparison with existing methods to prove accuracy
Experimental
Results
Data Collection
In this experiment, 5582 images are crawled from e-commerce website myntra.com using a
customized crawler. The crawled images were passed through image processor to compute image
features. From extracted metadata using a customized parser 589 keywords were extracted.
The process is repeated for newly added images over consistency threshold.
Experimental Set-up
ACS is text based search method with visual synonyms, iLike was considered for
performance comparison.
15. 15Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Comparison with existing methods to prove accuracy
Experimental
Results contd..
Performance Evaluation
The experiments carried out on a 4GB DDR2 RAM, Intel(R) Core(TM) i5 @2.40GHz processor
system.
Ranking of top-10 recommended images is considered as a performance metric. Hundred test queries
are used for evaluation. 100 graduate students are invited to evaluate relevance of recommended images.
Each user is allocated two queries and asked to evaluate relevance of ranked recommended images
with the relevance score between 0 to 1.
Here, 0 and 1 indicate totally irrelevant and highly relevant images respectively.
Mean values of users’ relevance score are computed for top-1 to top-10 images.
The mean of relevance score of ranked images of ACS method is better by 15.26% for top-10 images
in comparison with the iLike method.
16. 16Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Comparison with existing methods to prove accuracy
Experimental
Results contd..
Performance Evaluation Contd..
17. 17Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
Conclusions based on the observed resultsConclusions
In our work we have proposed ANOVA Cosine Similarity framework to recommend images in
Vertical Image Search.
Experiments are conducted on crawled image data from myntra.comwebsite and results are
compared with iLike method.
Relevance score is used to evaluate quality of ranked images, which is evaluated manually
with the help of users.
The accuracy of relevance score of ASC increases by 15.26% for top-10 recommended images
in comparison with iLike.
18. 18Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
References used throughout researchReferences
1. J. Cui, F. Wen and X. Tang, Real Time Google and Live Image Search Re-ranking, In the Proceedings of the
16th ACM International Conference on Multimedia, pp. 729–732, (2008).
2. F. Jing, C.Wang, Y. Yao, K. Deng, L. Zhang andW. Y. Ma, IGroup: AWeb Image Search Engine with
Semantic Clustering of Search Results, In the Proceedings of the 14th Annual ACM International Conference
on Multimedia, pp. 497–498, (2006).
3. B. Luo, X. Wang and X. Tang, World Wide Web based Image Search Engine using Text and Image Content
Features, Electronic Imaging 2003, pp. 123–130, (2003).
4. N. Ben-Haim, B. Babenko and S. Belongie, Improving Web-based Image Search via Content based
Clustering, In the Proceedings of International Conference on Computer Vision and Pattern Recognition
Workshop, pp. 106–106, (2006).
5. D. Sejal, V. Rashmi, D. Anveker, K. R. Venugopal, S. S. Iyengar and L. M. Patnaik, IRAbMC: Image
Recommendation with Absorbing Markov Chain, In 2015 Annual IEEE India Conference (INDICON), pp. 1–6,
December (2015).
19. 19Tuesday, August 16, 2016Department of CSE, UVCE, Bangalore
References used throughout researchReferences
6. K. B. Raja, N. Shankar, K. R. Venugopal and L. M. Patnaik, Steg analysis of LSB Embedded Images using
Variable Threshold Color Pair Analysis, International Journal of Information Processing, vol. 1, no. 1, pp. 24–
31, (2007).
7. Y. Chen, N. Yu, B. Luo and X. W. Chen, iLike: Integrating Visual and Textual Features for Vertical Search,
In the Proceedings of the International Conference on Multimedia, pp. 221–230, (2010).
8. Y. Chen, H. Sampathkumar, B. Luo and X. W. Chen, iLike: Bridging the Semantic Gap in Vertical Image
Search by Integrating Text and Visual Features, IEEE Transactions on Knowledge and Data Engineering, vol.
25, no. 10, pp. 2257–2270, (2013).