Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
Learning to Hash for Large-Scale Search
Xu Jiaming
Chinese Academe of Science
2014-07-04 @CUHK
2
Motivation
 Similarity based search has been popular in many applications
– Image/video search and retrieval: finding m...
3
A Conceptual Diagram for Hashing Based Image Search System
Indexing
and Search
Image
Database
Similarity Search & Retrie...
4
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHa...
5
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHa...
6
LSH [1999-VLDB, 2006-FOCS, 2008-Communications]
0
1
Database Items
hash function
random
101 Query
Locality Sensitive Has...
7
SimHash [2002-STOC, 2007-WWW]
Text
…
…
Observed Features
W1
W2
Wn
100110 W1
110000 W2
001001 Wn
…
…
W1 –W1 -W1 W1 W1 -W1...
8
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimHa...
9
STH [2010-SIGIR]
2
min :
. .: { 1,1}
0
1
ij i j
ij
k
i
i
i
T
i i
i
S y y
s t y
y
y y
n
−
∈ −
=
=
∑
∑
∑ I
min : ( ( ) )
....
10
SHK [2012-CVPR]
Pairwise similarity
Code inner product approximates pairwise similarity
Supervised Hashing with Kerne...
11
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimH...
12
ITQ [2011-CVPR, 2013-TPAMI]
Iterative Quantization
 Apply PCA for dimensionality reduction, find to maximize:
 Keep t...
13
TSH [2013-ICCV]
Two-Step Hashing
14
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimH...
15
SHU [2013-IJCAI]
Smart Hashing Update
1. Consistency-based Selection;
2. Similarity-based Selection.
( , ) min{ ( , , 1...
16
TSH [2014-ACL]
Two-Stage Hashing
 LSH for neighbor candidate pruning; ITQ for
effective re-ranking.
 LSH captures ter...
17
SHTTM [2013-SIGIR]
Semantic Hashing Using Tags and Topic Modeling
Hash Code Learning Hash Function Learning
2 2*
1
* 1
...
18
DVH [2013-ICML]
Predictable Dual-View Hashing
The goal is to find two sets of hyperplanes that map the visual and textu...
19
MVH [2011-SIGIR]
Composite Hashing with Multiple Information Sources
( )
2
2( ) ( ) ( ) ( )
1 2
1 1 1
( , , ) ( ) ( , )...
20
Outline
 Background (data-independent)
 Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]
 SimH...
21
LSH in MapReduce – Key Idea
22
LSH in MapReduce – First Round of MapReduce
23
LSH in MapReduce – Second Round of MapReduce
24
Reference
[1]. Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via
hashing[C]//VLDB. 1999, 99: 518-5...
25
Reference
[8]. Gong Y, Lazebnik S. Iterative quantization: A procrustean approach to learning binary
codes[C]//Computer...
26
Reference
[15]. Zhang D, Wang F, Si L. Composite hashing with multiple information
sources[C]//Proceedings of the 34th ...
27
Discussions and Questions?
Thank you!
2014-07-04
Upcoming SlideShare
Loading in …5
×

20140702 xu jiaming hashinglearning - lite

Presentation about hashing.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

20140702 xu jiaming hashinglearning - lite

  1. 1. 1 Learning to Hash for Large-Scale Search Xu Jiaming Chinese Academe of Science 2014-07-04 @CUHK
  2. 2. 2 Motivation  Similarity based search has been popular in many applications – Image/video search and retrieval: finding most similar images/videos – Audio search: find similar songs – Product search: find shoes with similar style but different color – Patient search: find patients with similar diagnostic status  Two key components: – Similarity/distance measure – Indexing scheme Whittlesearch (Kovashka et al. 2013) - 2013CIKM Tutorial by Jun Wang
  3. 3. 3 A Conceptual Diagram for Hashing Based Image Search System Indexing and Search Image Database Similarity Search & Retrieval Hash Function Design Visual Search ApplicationsVisual Search Applications Reranking Refinement Designing compact yet accurate hashing codes is a critical component to make the search effective - 2013CIKM Tutorial by Jun Wang
  4. 4. 4 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  5. 5. 5 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  6. 6. 6 LSH [1999-VLDB, 2006-FOCS, 2008-Communications] 0 1 Database Items hash function random 101 Query Locality Sensitive Hashing (LSH) - 2013CIKM Tutorial by Jun Wang 0 1 0 1
  7. 7. 7 SimHash [2002-STOC, 2007-WWW] Text … … Observed Features W1 W2 Wn 100110 W1 110000 W2 001001 Wn … … W1 –W1 -W1 W1 W1 -W1 W2 W2 -W2 -W2 -W2 -W2 -Wn –Wn Wn –Wn –Wn Wn … …13, 108, -22, -5, -32, 551, 1, 0, 0, 0, 1 Step1: Compute TF-IDF Step2: Hash Function Step3: Signature Step4: Sum Step5: Generate Fingerprint
  8. 8. 8 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  9. 9. 9 STH [2010-SIGIR] 2 min : . .: { 1,1} 0 1 ij i j ij k i i i T i i i S y y s t y y y y n − ∈ − = = ∑ ∑ ∑ I min : ( ( ) ) . .: ( , ) { 1,1} 0 T k T T trace Y D W Y s t Y i j − ∈ − = = Y 1 Y Y I Laplacian Eigenmap Self Taught Hashing (STH) Unsupervised Learning Supervised Learning
  10. 10. 10 SHK [2012-CVPR] Pairwise similarity Code inner product approximates pairwise similarity Supervised Hashing with Kernels - 2013CIKM Tutorial by Jun Wang
  11. 11. 11 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  12. 12. 12 ITQ [2011-CVPR, 2013-TPAMI] Iterative Quantization  Apply PCA for dimensionality reduction, find to maximize:  Keep top c eigenvectors of the data covariance matrix to obtain , projected data is  Note that if is an optimal solution then is also optimal for any orthogonal matrix  Key idea: Find to minimize the quantization loss:  nc and V are fixed so this is equivalent to maximizing ( ) :
  13. 13. 13 TSH [2013-ICCV] Two-Step Hashing
  14. 14. 14 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  15. 15. 15 SHU [2013-IJCAI] Smart Hashing Update 1. Consistency-based Selection; 2. Similarity-based Selection. ( , ) min{ ( , , 1), ( , ,1)}Diff k j num k j num k j= − 2 { 1,1} 1 min l r l T l l H F Q H H S r× ∈ − = − 2 1 1 {1,2,...,r} min k k T k r r Fk R rS H H− − ∈ = −
  16. 16. 16 TSH [2014-ACL] Two-Stage Hashing  LSH for neighbor candidate pruning; ITQ for effective re-ranking.  LSH captures term similarity; ITQ captures topic similarity  Advantages:  High hash lookup success rate is attained by the LSH stage;  High search precision due to the ITQ re-ranking stage;  Scan only a small portion of an entire dataset  Integrate two similarity measures
  17. 17. 17 SHTTM [2013-SIGIR] Semantic Hashing Using Tags and Topic Modeling Hash Code Learning Hash Function Learning 2 2* 1 * 1 ( ) arg min ( ) j j j n j j j T T y f x x y x λ λ = − = = = − + ⇒ = + ∑W W W W W W Y X X X I Tag Consistency 1 2 2 2 2 min ( ) . . { 1,1} , 0 T F k n C s t γ × − + + − ∈ − = Y,U T U Y U Yθ Y Y1 g Similarity Preservation
  18. 18. 18 DVH [2013-ICML] Predictable Dual-View Hashing The goal is to find two sets of hyperplanes that map the visual and textual space into a common subspace. CCA Multi-SVM
  19. 19. 19 MVH [2011-SIGIR] Composite Hashing with Multiple Information Sources ( ) 2 2( ) ( ) ( ) ( ) 1 2 1 1 1 ( , , ) ( ) ( , ) ( ) S C M M M TT k k k k k k k k J J J C tr C α = = = = + = + − +∑ ∑ ∑ Y WαY Y W Y L Y Y W X W%  Overall Objection
  20. 20. 20 Outline  Background (data-independent)  Locality Sensitive Hashing [1999-VLDB, 2006-FOCS, 2008-Communications]  SimHash [2002-STOC, 2007-WWW]  Learning to Hashing (data-dependent)  Unsupervised V.S. Supervised STH [2010-SIGIR] V.S. SHK [2012-CVPR]  One-Step V.S. Two-Step ITQ [2011-CVPR, 2013-TPAMI] V.S. TSH [2013-ICCV]  Others (data-dependent)  Smart Hashing Update for Fast Response [2013-IJCAI]  Two-Stage Hashing [2014-ACL]  Semantic Hashing with Topics and Tags [2013-SIGIR]  Dual-View Hashing [2013-ICML]  Multiple View Hashing [2011-SIGIR]  LSH in MapReduce
  21. 21. 21 LSH in MapReduce – Key Idea
  22. 22. 22 LSH in MapReduce – First Round of MapReduce
  23. 23. 23 LSH in MapReduce – Second Round of MapReduce
  24. 24. 24 Reference [1]. Gionis A, Indyk P, Motwani R. Similarity search in high dimensions via hashing[C]//VLDB. 1999, 99: 518-529. [2]. Andoni A, Indyk P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions[C]//Foundations of Computer Science, 2006. FOCS'06. 47th Annual IEEE Symposium on. IEEE, 2006: 459-468. [3]. Andoni A, Indyk P. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions[J]. COMMUNICATIONS OF THE ACM, 2008, 51(1): 117. [4]. Charikar M S. Similarity estimation techniques from rounding algorithms[C]//Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 2002: 380-388. [5]. Manku G S, Jain A, Das Sarma A. Detecting near-duplicates for web crawling[C]//Proceedings of the 16th international conference on World Wide Web. ACM, 2007: 141-150. [6]. Zhang D, Wang J, Cai D, et al. Self-taught hashing for fast similarity search[C]//Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 2010: 18-25. [7]. Liu W, Wang J, Ji R, et al. Supervised hashing with kernels[C]//Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012: 2074-2081.
  25. 25. 25 Reference [8]. Gong Y, Lazebnik S. Iterative quantization: A procrustean approach to learning binary codes[C]//Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011: 817-824. [9]. Gong Y, Lazebnik S, Gordo A, et al. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2013, 35(12): 2916-2929. [10]. Lin G, Shen C, Suter D, et al. A general two-step approach to learning-based hashing[C]//Computer Vision (ICCV), 2013 IEEE International Conference on. IEEE, 2013: 2552-2559. [11]. Yang Q, Huang L K, Zheng W S, et al. Smart hashing update for fast response[C]//Proceedings of the Twenty-Third international joint conference on Artificial Intelligence. AAAI Press, 2013: 1855-1861. [12]. Li H, Liu W, Ji H. Two-Stage Hashing for Fast Document Retrieval[C]. ACL. 2014 [13]. Wang Q, Zhang D, Si L. Semantic hashing using tags and topic modeling[C]//Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 2013: 213-222. [14]. Rastegari M, Choi J, Fakhraei S, et al. Predictable Dual-View Hashing[C]//Proceedings of The 30th International Conference on Machine Learning. 2013: 1328-1336.
  26. 26. 26 Reference [15]. Zhang D, Wang F, Si L. Composite hashing with multiple information sources[C]//Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 2011: 225-234. [16]. Szmit, Radosław. "Locality Sensitive Hashing for Similarity Search Using MapReduce on Large Scale Data." Language Processing and Intelligent Information Systems. Springer Berlin Heidelberg, 2013. 171-178. [17]. Blog: Location Sensitive Hashing in Map Reduce: http://horicky.blogspot.hk/2012/09/location-sensitive-hashing-in-map-reduce.html [18]. Likelike Project: https://github.com/takahi-i/likelike [19]. Jun Wang. Learning to Hash for Large-Scale Search. 2013 CIKM Tutorial.
  27. 27. 27 Discussions and Questions? Thank you! 2014-07-04

    Be the first to comment

    Login to see the comments

Presentation about hashing.

Views

Total views

782

On Slideshare

0

From embeds

0

Number of embeds

2

Actions

Downloads

17

Shares

0

Comments

0

Likes

0

×