SlideShare a Scribd company logo
Variable Bit Quantisation for Large Scale Search 
Sean Moran 
Final year PhD student 
Institute of Language, Cognition and Computation 
12th September 2014
Variable Bit Quantisation for Large Scale Search 
Nearest Neighbour Search 
Variable Bit Quantisation for LSH 
Evaluation 
Summary
Variable Bit Quantisation for Large Scale Search 
Nearest Neighbour Search 
Variable Bit Quantisation for LSH 
Evaluation 
Summary
Nearest Neighbour Search 
I Given a query q
nd the nearest neighbour NN(q) from 
X = fx1; x2 : : : ; xNg where NN(q) = argminx2Xdist(q; x) 
I dist(q,x) is a distance measure - e.g. Euclidean, Cosine etc 
q ? 
y 
1-NN 
x 
1-NN 
q 
x 
y 
I Generalised variant: K-nearest neighbour search (KNN(q)) 
I Compare query to all N database items - O(N) query time
Example: First Story Detection in Twitter 
I Real-time detection of
rst stories in the Twitter stream 
I Haiti earthquake struck 21:53,
rst story at 22:17 UTC 
I 22:17:43 justinholtweb NOT expecting Tsunami on east coast 
after haiti earthquake. good news. 
I State-of-the-art FSD uses NN search under the bonnet 
I Problems: dimensionality (1 million+) and data volume 
(250Gb/day) 
I Hashing-based approximate NN operates in O(1) time [1] 
[1] Real-Time Detection, Tracking and Monitoring of Discovered Events in 
Social Media. S. Moran et al. In ACL'14.
Example: First Story Detection in Twitter 
Time 
Volume 
T 1 
Rapper Lil Wayne ends up in 
hospital after a skateboarding accident
Example: First Story Detection in Twitter 
Time 
Thor Hushovd wins stage 13 of Tour de 
France 
T 2 
Volume
Example: First Story Detection in Twitter 
Time 
T 3 
Volume 
Magnitude 5.4 earthquake hits western 
Japan
Example: First Story Detection in Twitter 
USGS and Nat Weather Service NOT 
expecting Tsunami on east coast after 
haiti earthquake. 
Time 
T 8 
Volume
Hashing-based approximate NN search 
H Database 
Tweets from T 1 - T 7
Hashing-based approximate NN search 
H Database 
110101 
010111 
Hash Table 
010101 
111101 
.....
Hashing-based approximate NN search 
H 
Query 
Tweet from T 8 
H Database 
110101 
010111 
Hash Table 
010101 
111101 
.....
Hashing-based approximate NN search 
H 
Query 
H Database 
110101 
010111 
Hash Table 
010101 
111101 
.....
Hashing-based approximate NN search 
H 
Query Compute 
Similarity 
Query 
H Database 
110101 
010111 
010101 
111101 
..... 
Hash Table
Hashing-based approximate NN search 
H 
Query 
Nearest 
Neighbours 
H Database 
110101 
010111 
010101 
111101 
..... 
Compute 
Similarity 
Query 
Hash Table
Hashing-based approximate NN search 
H 
Query 
H Database 
110101 
010111 
010101 
111101 
..... 
Compute 
Similarity 
Query 
First Story! 
Hash Table
Locality Sensitive Hashing (LSH) 
x 
y
Locality Sensitive Hashing (LSH) 
x 
y 
n2 
n1 
h1 
h2 
11 
10 
00 01
Locality Sensitive Hashing (LSH) 
x 
y 
h1(q) h2(q) 
q 
n2 
n1 
h1 
h2 
11 
10 
00 01
Step 1: Projection 
n2 
n1 
q
Step 2: Single Bit Quantisation (SBQ) 
t 
0 1 
n2 
q 
I Threshold typically zero (sign function): sgn(n2:q) 
I Generate full 2 bit hash key (bitcode) by concatenation: 
g(q) = h1(q)  h2(q) = sgn(n1:q)  sgn(n2:q) 
= 1  1 = 11
Many more methods exist... 
I Very active area of research: 
I Kernel methods [3] 
I Spectral methods [4] [5] 
I Neural networks [6] 
I Loss based methods [7] 
I Commonality: all use single bit quantisation (SBQ) 
[3] M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In NIPS '09. 
[4] Y. Weiss and A. Torralba and R. Fergus. Spectral Hashing. NIPS '08. 
[5] J. Wang and S. Kumar and SF. Chang. Semi-supervised hashing for large-scale search. PAMI '12. 
[6] R. Salakhutdinov and G. Hinton. Semantic Hashing. NIPS '08. 
[7] B. Kulis and T. Darrell. Learning to Hash with Binary Reconstructive Embeddings. NIPS '09.
Variable Bit Quantisation for Large Scale Search 
Nearest Neighbour Search 
Variable Bit Quantisation for LSH 
Evaluation 
Summary
Problem 1: SBQ leads to high quantisation errors 
I Threshold at zero can separate many related Tweets: 
6000 
5000 
4000 
3000 
2000 
1000 
0 
−1.5 −1 −0.5 0 0.5 1 1.5 
  Projected Value 
Count
Problem 2: some hyperplanes are better than others 
10 
Projected Dimension 2 n2 
n1 
Projected Dimension 1 
x 
y 
n2 
n1 
h1 
h2 
11 
00 01
Problem 2: some hyperplanes are better than others 
10 
Projected Dimension 2 n2 
n1 
Projected Dimension 1 
x 
y 
n2 
n1 
h1 
h2 
11 
00 01
Problem 2: some hyperplanes are better than others 
10 
Projected Dimension 2 n2 
n1 
Projected Dimension 1 
x 
y 
n2 
n1 
h1 
h2 
11 
00 01
Threshold Positioning 
I Multiple bits per hyperplane requires multiple thresholds [8] 
I F-score optimisation: maximise # related tweets falling inside 
the same thresholded regions: 
F1 score: 1.00 
t1 t2 t3 
00 01 10 11 
n2 
F1 score: 0.44 
t1 t2 t3 
00 01 10 11 
n2 
[8] Neighbourhood Preserving Quantisation for LSH. S. Moran et al. In 
SIGIR'13
Variable Bit Allocation 
I F-score is a measure of the neighbourhood preservation [9]: 
F1 score: 1.00 
F1 score: 0.25 
t1 t2 t3 
00 01 10 11 
0 bits assigned: 0 thresholds do just as well as one or more thresholds 
n2 
n1 
2 bits assigned: 3 thresholds perfectly preserve the neighbourhood structure 
I Compute bit allocation that maximises the cumulative F-score 
I Bit allocation solved as a binary integer linear program (BILP) 
[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
Variable Bit Allocation 
max kF  Zk 
subject to kZhk = 1 h 2 f1 : : : Bg 
kZ  Dk  B 
Z is binary 
I F contains the F scores per hyperplane, per bit count 
I Z is an indicator matrix specifying the bit allocation 
I D is a constraint matrix 
I B is the bit budget 
I k:k denotes the Frobenius L1 norm 
I  the Hadamard product 
[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
Variable Bit Allocation 
max kF  Zk 
subject to kZhk = 1 h 2 f1 : : : Bg 
kZ  Dk  B 
Z is binary 
I F contains the F scores per hyperplane, per bit count 
I Z is an indicator matrix specifying the bit allocation 
I D is a constraint matrix 
I B is the bit budget 
I k:k denotes the Frobenius L1 norm 
I  the Hadamard product 
[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
Variable Bit Allocation 
max kF  Zk 
subject to kZhk = 1 h 2 f1 : : : Bg 
kZ  Dk  B 
Z is binary 
I F contains the F scores per hyperplane, per bit count 
I Z is an indicator matrix specifying the bit allocation 
I D is a constraint matrix 
I B is the bit budget 
I k:k denotes the Frobenius L1 norm 
I  the Hadamard product 
[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
Variable Bit Allocation 
max kF  Zk 
subject to kZhk = 1 h 2 f1 : : : Bg 
kZ  Dk  B 
Z is binary 
I F contains the F scores per hyperplane, per bit count 
I Z is an indicator matrix specifying the bit allocation 
I D is a constraint matrix 
I B is the bit budget 
I k:k denotes the Frobenius L1 norm 
I  the Hadamard product 
[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
Variable Bit Allocation 
max kF  Zk 
subject to kZhk = 1 h 2 f1 : : : Bg 
kZ  Dk  B 
Z is binary 
F 0 
h1 h2 
b0 0:25 0:25 
b1 @ 
0:35 0:50 
b2 0:40 1:00 
1 
A 
0 
@ 
D 
0 0 
1 1 
2 2 
1 
A 
0 
@ 
Z 
1 0 
0 0 
0 1 
1 
A 
I Sparse solution: lower quality hyperplanes discarded [9]. 
[9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
Variable Bit Quantisation for Large Scale Search 
Nearest Neighbour Search 
Variable Bit Quantisation for LSH 
Evaluation 
Summary
Evaluation Protocol 
I Task: Text and image retrieval 
I Projections: LSH [2], Shift-invariant kernel hashing (SIKH) 
[3], Spectral Hashing (SH) [4] and PCA-Hashing (PCAH) [5]. 
I Baselines: Single Bit Quantisation (SBQ), Manhattan 
Hashing (MQ)[10], Double-Bit quantisation (DBQ) [11]. 
I Evaluation: how well do we retrieve the NN of queries? 
[2] P. Indyk and R. Motwani. Approximate nearest neighbors: removing the curse of dimensionality. In STOC '98. 
[3] M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In NIPS '09. 
[4] Y. Weiss and A. Torralba and R. Fergus. Spectral Hashing. NIPS '08. 
[9] J. Wang and S. Kumar and SF. Chang. Semi-supervised hashing for large-scale search. PAMI '12. 
[10] W. Kong and W. Li and M. Guo. Manhattan hashing for large-scale image retrieval. SIGIR '12. 
[11] W. Kong and W. Li. Double Bit Quantisation for Hashing. AAAI '12.
AUPRC across dierent projections (variable # bits) 
Dataset Images (32 bits) Text (128 bits) 
SBQ MQ DBQ VBQ SBQ MQ DBQ VBQ 
SIKH 0.042 0.046 0.047 0.161 0.102 0.112 0.087 0.389 
LSH 0.119 0.091 0.066 0.207 0.276 0.201 0.175 0.538 
SH 0.051 0.144 0.111 0.202 0.033 0.028 0.030 0.154 
PCAH 0.036 0.132 0.107 0.219 0.095 0.034 0.027 0.154 
I Variable bit allocation yields substantial gains in retrieval 
accuracy 
I VBQ is an eective multimodal quantisation scheme
AUPRC for LSH across a broad bit range 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 
0 
SBQ MQ VBQ 
32 48 64 96 128 256 
Number of Bits 
AUPRC 
8 16 24 32 40 48 56 64 
0.35 
0.3 
0.25 
0.2 
0.15 
0.1 
0.05 
0 
SBQ MQ VBQ 
Number of Bits 
AUPRC 
(a) Text (b) Images 
I VBQ is eective for both long and short bit codes (hash keys)

More Related Content

What's hot

Cnq1
Cnq1Cnq1
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Kento Aoyama
 
CNIT 141 8. Public-Key Cryptosystems Based on the DLP
CNIT 141 8. Public-Key Cryptosystems Based on the DLPCNIT 141 8. Public-Key Cryptosystems Based on the DLP
CNIT 141 8. Public-Key Cryptosystems Based on the DLP
Sam Bowne
 
Knn solution
Knn solutionKnn solution
Knn solution
dvbtunisia
 
information theory
information theoryinformation theory
information theory
Dr Naim R Kidwai
 
Data Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingData Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length Encoding
MANISH T I
 
第四次课程 Chap8
第四次课程 Chap8第四次课程 Chap8
第四次课程 Chap8
Emma2013
 
Data compression introduction
Data compression introductionData compression introduction
Data compression introduction
Rahul Khanwani
 
Source coding theorem
Source coding theoremSource coding theorem
Source coding theorem
priyadharshini murugan
 
Data Communication & Computer network: Shanon fano coding
Data Communication & Computer network: Shanon fano codingData Communication & Computer network: Shanon fano coding
Data Communication & Computer network: Shanon fano coding
Dr Rajiv Srivastava
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
Lec7 8 9_10 coding techniques
Lec7 8 9_10 coding techniquesLec7 8 9_10 coding techniques
Lec7 8 9_10 coding techniques
Dom Mike
 
Network security R.Rathna Deepa 2nd M.sc.,Computer Science
Network security R.Rathna Deepa 2nd M.sc.,Computer ScienceNetwork security R.Rathna Deepa 2nd M.sc.,Computer Science
Network security R.Rathna Deepa 2nd M.sc.,Computer Science
RathnaDeepa1
 
WFST
WFSTWFST
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
Taeoh Kim
 
Comparative Study of RBFS & ARBFS Algorithm
Comparative Study of RBFS & ARBFS AlgorithmComparative Study of RBFS & ARBFS Algorithm
Comparative Study of RBFS & ARBFS Algorithm
IOSR Journals
 
Conventional Encryption NS2
Conventional Encryption NS2Conventional Encryption NS2
Conventional Encryption NS2
koolkampus
 
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Andrea Tassi
 
Huffman and Arithmetic coding - Performance analysis
Huffman and Arithmetic coding - Performance analysisHuffman and Arithmetic coding - Performance analysis
Huffman and Arithmetic coding - Performance analysis
Ramakant Soni
 
Symmetric ciphers questions and answers
Symmetric ciphers questions and answersSymmetric ciphers questions and answers
Symmetric ciphers questions and answers
prdpgpt
 

What's hot (20)

Cnq1
Cnq1Cnq1
Cnq1
 
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
Reading: "Pi in the sky: Calculating a record-breaking 31.4 trillion digits o...
 
CNIT 141 8. Public-Key Cryptosystems Based on the DLP
CNIT 141 8. Public-Key Cryptosystems Based on the DLPCNIT 141 8. Public-Key Cryptosystems Based on the DLP
CNIT 141 8. Public-Key Cryptosystems Based on the DLP
 
Knn solution
Knn solutionKnn solution
Knn solution
 
information theory
information theoryinformation theory
information theory
 
Data Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingData Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length Encoding
 
第四次课程 Chap8
第四次课程 Chap8第四次课程 Chap8
第四次课程 Chap8
 
Data compression introduction
Data compression introductionData compression introduction
Data compression introduction
 
Source coding theorem
Source coding theoremSource coding theorem
Source coding theorem
 
Data Communication & Computer network: Shanon fano coding
Data Communication & Computer network: Shanon fano codingData Communication & Computer network: Shanon fano coding
Data Communication & Computer network: Shanon fano coding
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Lec7 8 9_10 coding techniques
Lec7 8 9_10 coding techniquesLec7 8 9_10 coding techniques
Lec7 8 9_10 coding techniques
 
Network security R.Rathna Deepa 2nd M.sc.,Computer Science
Network security R.Rathna Deepa 2nd M.sc.,Computer ScienceNetwork security R.Rathna Deepa 2nd M.sc.,Computer Science
Network security R.Rathna Deepa 2nd M.sc.,Computer Science
 
WFST
WFSTWFST
WFST
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 
Comparative Study of RBFS & ARBFS Algorithm
Comparative Study of RBFS & ARBFS AlgorithmComparative Study of RBFS & ARBFS Algorithm
Comparative Study of RBFS & ARBFS Algorithm
 
Conventional Encryption NS2
Conventional Encryption NS2Conventional Encryption NS2
Conventional Encryption NS2
 
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
Presentation of 'Reliable Rate-Optimized Video Multicasting Services over LTE...
 
Huffman and Arithmetic coding - Performance analysis
Huffman and Arithmetic coding - Performance analysisHuffman and Arithmetic coding - Performance analysis
Huffman and Arithmetic coding - Performance analysis
 
Symmetric ciphers questions and answers
Symmetric ciphers questions and answersSymmetric ciphers questions and answers
Symmetric ciphers questions and answers
 

Viewers also liked

Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
Dmitriy Selivanov
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
Liwei Ren任力偉
 
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr PryymakProbabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
PyData
 
EMNLP2016読み会@黒橋研
EMNLP2016読み会@黒橋研EMNLP2016読み会@黒橋研
EMNLP2016読み会@黒橋研
Motoki Sato
 
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
Deep Learning JP
 
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchJeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
AI Frontiers
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
Kazuki Fujikawa
 
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
Deep Learning JP
 
[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기
NAVER D2
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
Deep Learning JP
 
DeNAにおける機械学習・深層学習活用
DeNAにおける機械学習・深層学習活用DeNAにおける機械学習・深層学習活用
DeNAにおける機械学習・深層学習活用
Kazuki Fujikawa
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events
홍배 김
 
実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017
Preferred Networks
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
Deep Learning JP
 
IPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へIPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へ
Preferred Networks
 

Viewers also liked (15)

Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
 
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr PryymakProbabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
Probabilistic Data Structures and Approximate Solutions Oleksandr Pryymak
 
EMNLP2016読み会@黒橋研
EMNLP2016読み会@黒橋研EMNLP2016読み会@黒橋研
EMNLP2016読み会@黒橋研
 
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Bio...
 
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning ResearchJeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
Jeff Dean at AI Frontiers: Trends and Developments in Deep Learning Research
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
[DL輪読会]Exploiting Cyclic Symmetry in Convolutional Neural Networks
 
[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기[261] 실시간 추천엔진 머신한대에 구겨넣기
[261] 실시간 추천엔진 머신한대에 구겨넣기
 
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
[DL輪読会]QUASI-RECURRENT NEURAL NETWORKS
 
DeNAにおける機械学習・深層学習活用
DeNAにおける機械学習・深層学習活用DeNAにおける機械学習・深層学習活用
DeNAにおける機械学習・深層学習活用
 
Learning to remember rare events
Learning to remember rare eventsLearning to remember rare events
Learning to remember rare events
 
実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017 実世界の人工知能@DeNA TechCon 2017
実世界の人工知能@DeNA TechCon 2017
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
 
IPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へIPAB2017 深層学習を使った新薬の探索から創造へ
IPAB2017 深層学習を使った新薬の探索から創造へ
 

Similar to Data Science Research Day (Talk)

ACL Variable Bit Quantisation Talk
ACL Variable Bit Quantisation TalkACL Variable Bit Quantisation Talk
ACL Variable Bit Quantisation Talk
Sean Moran
 
PhD thesis defence slides
PhD thesis defence slidesPhD thesis defence slides
PhD thesis defence slides
Sean Moran
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Intel® Software
 
Neighbourhood Preserving Quantisation for LSH SIGIR Poster
Neighbourhood Preserving Quantisation for LSH SIGIR PosterNeighbourhood Preserving Quantisation for LSH SIGIR Poster
Neighbourhood Preserving Quantisation for LSH SIGIR Poster
Sean Moran
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filter
xlight
 
Branch and Bound.pptx
Branch and Bound.pptxBranch and Bound.pptx
Branch and Bound.pptx
JoshipavanEdduluru1
 
Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)
Sean Moran
 
cryptography and network security chap 3
cryptography and network security chap 3cryptography and network security chap 3
cryptography and network security chap 3
Debanjan Bhattacharya
 
Cryptography and Network Security William Stallings Lawrie Brown
Cryptography and Network Security William Stallings Lawrie BrownCryptography and Network Security William Stallings Lawrie Brown
Cryptography and Network Security William Stallings Lawrie Brown
Information Security Awareness Group
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSH
Maruf Aytekin
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
zukun
 
EIS_REVIEW_1.pptx
EIS_REVIEW_1.pptxEIS_REVIEW_1.pptx
EIS_REVIEW_1.pptx
01fe20bec143
 
Getting the Most out of Transition-based Dependency Parsing
Getting the Most out of Transition-based Dependency ParsingGetting the Most out of Transition-based Dependency Parsing
Getting the Most out of Transition-based Dependency Parsing
Jinho Choi
 
fully convolutional networks for semantic segmentation
fully convolutional networks for semantic segmentationfully convolutional networks for semantic segmentation
fully convolutional networks for semantic segmentation
XinyangLi16
 
Deep Learning-Based Universal Beamformer for Ultrasound Imaging
Deep Learning-Based Universal Beamformer for Ultrasound ImagingDeep Learning-Based Universal Beamformer for Ultrasound Imaging
Deep Learning-Based Universal Beamformer for Ultrasound Imaging
Shujaat Khan
 
Lightweight Address Hopping forDefending the IPv6 IoT
Lightweight Address Hopping forDefending the IPv6 IoTLightweight Address Hopping forDefending the IPv6 IoT
Lightweight Address Hopping forDefending the IPv6 IoT
José Francisco Chávez Carreón
 
Universal plane wave compounding for high quality us imaging using deep learning
Universal plane wave compounding for high quality us imaging using deep learningUniversal plane wave compounding for high quality us imaging using deep learning
Universal plane wave compounding for high quality us imaging using deep learning
Shujaat Khan
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Sean Moran
 
ch03 network security in computer sys.ppt
ch03 network security in computer sys.pptch03 network security in computer sys.ppt
ch03 network security in computer sys.ppt
ubaidullah75790
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
Sean Moran
 

Similar to Data Science Research Day (Talk) (20)

ACL Variable Bit Quantisation Talk
ACL Variable Bit Quantisation TalkACL Variable Bit Quantisation Talk
ACL Variable Bit Quantisation Talk
 
PhD thesis defence slides
PhD thesis defence slidesPhD thesis defence slides
PhD thesis defence slides
 
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
Massively Parallel K-Nearest Neighbor Computation on Distributed Architectures
 
Neighbourhood Preserving Quantisation for LSH SIGIR Poster
Neighbourhood Preserving Quantisation for LSH SIGIR PosterNeighbourhood Preserving Quantisation for LSH SIGIR Poster
Neighbourhood Preserving Quantisation for LSH SIGIR Poster
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filter
 
Branch and Bound.pptx
Branch and Bound.pptxBranch and Bound.pptx
Branch and Bound.pptx
 
Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)Graph Regularised Hashing (ECIR'15 Talk)
Graph Regularised Hashing (ECIR'15 Talk)
 
cryptography and network security chap 3
cryptography and network security chap 3cryptography and network security chap 3
cryptography and network security chap 3
 
Cryptography and Network Security William Stallings Lawrie Brown
Cryptography and Network Security William Stallings Lawrie BrownCryptography and Network Security William Stallings Lawrie Brown
Cryptography and Network Security William Stallings Lawrie Brown
 
Scalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSHScalable Recommendation Algorithms with LSH
Scalable Recommendation Algorithms with LSH
 
Fcv learn yu
Fcv learn yuFcv learn yu
Fcv learn yu
 
EIS_REVIEW_1.pptx
EIS_REVIEW_1.pptxEIS_REVIEW_1.pptx
EIS_REVIEW_1.pptx
 
Getting the Most out of Transition-based Dependency Parsing
Getting the Most out of Transition-based Dependency ParsingGetting the Most out of Transition-based Dependency Parsing
Getting the Most out of Transition-based Dependency Parsing
 
fully convolutional networks for semantic segmentation
fully convolutional networks for semantic segmentationfully convolutional networks for semantic segmentation
fully convolutional networks for semantic segmentation
 
Deep Learning-Based Universal Beamformer for Ultrasound Imaging
Deep Learning-Based Universal Beamformer for Ultrasound ImagingDeep Learning-Based Universal Beamformer for Ultrasound Imaging
Deep Learning-Based Universal Beamformer for Ultrasound Imaging
 
Lightweight Address Hopping forDefending the IPv6 IoT
Lightweight Address Hopping forDefending the IPv6 IoTLightweight Address Hopping forDefending the IPv6 IoT
Lightweight Address Hopping forDefending the IPv6 IoT
 
Universal plane wave compounding for high quality us imaging using deep learning
Universal plane wave compounding for high quality us imaging using deep learningUniversal plane wave compounding for high quality us imaging using deep learning
Universal plane wave compounding for high quality us imaging using deep learning
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
ch03 network security in computer sys.ppt
ch03 network security in computer sys.pptch03 network security in computer sys.ppt
ch03 network security in computer sys.ppt
 
Graph Regularised Hashing
Graph Regularised HashingGraph Regularised Hashing
Graph Regularised Hashing
 

Recently uploaded

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 

Recently uploaded (20)

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 

Data Science Research Day (Talk)

  • 1. Variable Bit Quantisation for Large Scale Search Sean Moran Final year PhD student Institute of Language, Cognition and Computation 12th September 2014
  • 2. Variable Bit Quantisation for Large Scale Search Nearest Neighbour Search Variable Bit Quantisation for LSH Evaluation Summary
  • 3. Variable Bit Quantisation for Large Scale Search Nearest Neighbour Search Variable Bit Quantisation for LSH Evaluation Summary
  • 4. Nearest Neighbour Search I Given a query q
  • 5. nd the nearest neighbour NN(q) from X = fx1; x2 : : : ; xNg where NN(q) = argminx2Xdist(q; x) I dist(q,x) is a distance measure - e.g. Euclidean, Cosine etc q ? y 1-NN x 1-NN q x y I Generalised variant: K-nearest neighbour search (KNN(q)) I Compare query to all N database items - O(N) query time
  • 6. Example: First Story Detection in Twitter I Real-time detection of
  • 7. rst stories in the Twitter stream I Haiti earthquake struck 21:53,
  • 8. rst story at 22:17 UTC I 22:17:43 justinholtweb NOT expecting Tsunami on east coast after haiti earthquake. good news. I State-of-the-art FSD uses NN search under the bonnet I Problems: dimensionality (1 million+) and data volume (250Gb/day) I Hashing-based approximate NN operates in O(1) time [1] [1] Real-Time Detection, Tracking and Monitoring of Discovered Events in Social Media. S. Moran et al. In ACL'14.
  • 9. Example: First Story Detection in Twitter Time Volume T 1 Rapper Lil Wayne ends up in hospital after a skateboarding accident
  • 10. Example: First Story Detection in Twitter Time Thor Hushovd wins stage 13 of Tour de France T 2 Volume
  • 11. Example: First Story Detection in Twitter Time T 3 Volume Magnitude 5.4 earthquake hits western Japan
  • 12. Example: First Story Detection in Twitter USGS and Nat Weather Service NOT expecting Tsunami on east coast after haiti earthquake. Time T 8 Volume
  • 13. Hashing-based approximate NN search H Database Tweets from T 1 - T 7
  • 14. Hashing-based approximate NN search H Database 110101 010111 Hash Table 010101 111101 .....
  • 15. Hashing-based approximate NN search H Query Tweet from T 8 H Database 110101 010111 Hash Table 010101 111101 .....
  • 16. Hashing-based approximate NN search H Query H Database 110101 010111 Hash Table 010101 111101 .....
  • 17. Hashing-based approximate NN search H Query Compute Similarity Query H Database 110101 010111 010101 111101 ..... Hash Table
  • 18. Hashing-based approximate NN search H Query Nearest Neighbours H Database 110101 010111 010101 111101 ..... Compute Similarity Query Hash Table
  • 19. Hashing-based approximate NN search H Query H Database 110101 010111 010101 111101 ..... Compute Similarity Query First Story! Hash Table
  • 21. Locality Sensitive Hashing (LSH) x y n2 n1 h1 h2 11 10 00 01
  • 22. Locality Sensitive Hashing (LSH) x y h1(q) h2(q) q n2 n1 h1 h2 11 10 00 01
  • 24. Step 2: Single Bit Quantisation (SBQ) t 0 1 n2 q I Threshold typically zero (sign function): sgn(n2:q) I Generate full 2 bit hash key (bitcode) by concatenation: g(q) = h1(q) h2(q) = sgn(n1:q) sgn(n2:q) = 1 1 = 11
  • 25. Many more methods exist... I Very active area of research: I Kernel methods [3] I Spectral methods [4] [5] I Neural networks [6] I Loss based methods [7] I Commonality: all use single bit quantisation (SBQ) [3] M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In NIPS '09. [4] Y. Weiss and A. Torralba and R. Fergus. Spectral Hashing. NIPS '08. [5] J. Wang and S. Kumar and SF. Chang. Semi-supervised hashing for large-scale search. PAMI '12. [6] R. Salakhutdinov and G. Hinton. Semantic Hashing. NIPS '08. [7] B. Kulis and T. Darrell. Learning to Hash with Binary Reconstructive Embeddings. NIPS '09.
  • 26. Variable Bit Quantisation for Large Scale Search Nearest Neighbour Search Variable Bit Quantisation for LSH Evaluation Summary
  • 27. Problem 1: SBQ leads to high quantisation errors I Threshold at zero can separate many related Tweets: 6000 5000 4000 3000 2000 1000 0 −1.5 −1 −0.5 0 0.5 1 1.5 Projected Value Count
  • 28. Problem 2: some hyperplanes are better than others 10 Projected Dimension 2 n2 n1 Projected Dimension 1 x y n2 n1 h1 h2 11 00 01
  • 29. Problem 2: some hyperplanes are better than others 10 Projected Dimension 2 n2 n1 Projected Dimension 1 x y n2 n1 h1 h2 11 00 01
  • 30. Problem 2: some hyperplanes are better than others 10 Projected Dimension 2 n2 n1 Projected Dimension 1 x y n2 n1 h1 h2 11 00 01
  • 31. Threshold Positioning I Multiple bits per hyperplane requires multiple thresholds [8] I F-score optimisation: maximise # related tweets falling inside the same thresholded regions: F1 score: 1.00 t1 t2 t3 00 01 10 11 n2 F1 score: 0.44 t1 t2 t3 00 01 10 11 n2 [8] Neighbourhood Preserving Quantisation for LSH. S. Moran et al. In SIGIR'13
  • 32. Variable Bit Allocation I F-score is a measure of the neighbourhood preservation [9]: F1 score: 1.00 F1 score: 0.25 t1 t2 t3 00 01 10 11 0 bits assigned: 0 thresholds do just as well as one or more thresholds n2 n1 2 bits assigned: 3 thresholds perfectly preserve the neighbourhood structure I Compute bit allocation that maximises the cumulative F-score I Bit allocation solved as a binary integer linear program (BILP) [9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
  • 33. Variable Bit Allocation max kF Zk subject to kZhk = 1 h 2 f1 : : : Bg kZ Dk B Z is binary I F contains the F scores per hyperplane, per bit count I Z is an indicator matrix specifying the bit allocation I D is a constraint matrix I B is the bit budget I k:k denotes the Frobenius L1 norm I the Hadamard product [9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
  • 34. Variable Bit Allocation max kF Zk subject to kZhk = 1 h 2 f1 : : : Bg kZ Dk B Z is binary I F contains the F scores per hyperplane, per bit count I Z is an indicator matrix specifying the bit allocation I D is a constraint matrix I B is the bit budget I k:k denotes the Frobenius L1 norm I the Hadamard product [9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
  • 35. Variable Bit Allocation max kF Zk subject to kZhk = 1 h 2 f1 : : : Bg kZ Dk B Z is binary I F contains the F scores per hyperplane, per bit count I Z is an indicator matrix specifying the bit allocation I D is a constraint matrix I B is the bit budget I k:k denotes the Frobenius L1 norm I the Hadamard product [9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
  • 36. Variable Bit Allocation max kF Zk subject to kZhk = 1 h 2 f1 : : : Bg kZ Dk B Z is binary I F contains the F scores per hyperplane, per bit count I Z is an indicator matrix specifying the bit allocation I D is a constraint matrix I B is the bit budget I k:k denotes the Frobenius L1 norm I the Hadamard product [9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
  • 37. Variable Bit Allocation max kF Zk subject to kZhk = 1 h 2 f1 : : : Bg kZ Dk B Z is binary F 0 h1 h2 b0 0:25 0:25 b1 @ 0:35 0:50 b2 0:40 1:00 1 A 0 @ D 0 0 1 1 2 2 1 A 0 @ Z 1 0 0 0 0 1 1 A I Sparse solution: lower quality hyperplanes discarded [9]. [9] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13
  • 38. Variable Bit Quantisation for Large Scale Search Nearest Neighbour Search Variable Bit Quantisation for LSH Evaluation Summary
  • 39. Evaluation Protocol I Task: Text and image retrieval I Projections: LSH [2], Shift-invariant kernel hashing (SIKH) [3], Spectral Hashing (SH) [4] and PCA-Hashing (PCAH) [5]. I Baselines: Single Bit Quantisation (SBQ), Manhattan Hashing (MQ)[10], Double-Bit quantisation (DBQ) [11]. I Evaluation: how well do we retrieve the NN of queries? [2] P. Indyk and R. Motwani. Approximate nearest neighbors: removing the curse of dimensionality. In STOC '98. [3] M. Raginsky and S. Lazebnik. Locality-sensitive binary codes from shift-invariant kernels. In NIPS '09. [4] Y. Weiss and A. Torralba and R. Fergus. Spectral Hashing. NIPS '08. [9] J. Wang and S. Kumar and SF. Chang. Semi-supervised hashing for large-scale search. PAMI '12. [10] W. Kong and W. Li and M. Guo. Manhattan hashing for large-scale image retrieval. SIGIR '12. [11] W. Kong and W. Li. Double Bit Quantisation for Hashing. AAAI '12.
  • 40. AUPRC across dierent projections (variable # bits) Dataset Images (32 bits) Text (128 bits) SBQ MQ DBQ VBQ SBQ MQ DBQ VBQ SIKH 0.042 0.046 0.047 0.161 0.102 0.112 0.087 0.389 LSH 0.119 0.091 0.066 0.207 0.276 0.201 0.175 0.538 SH 0.051 0.144 0.111 0.202 0.033 0.028 0.030 0.154 PCAH 0.036 0.132 0.107 0.219 0.095 0.034 0.027 0.154 I Variable bit allocation yields substantial gains in retrieval accuracy I VBQ is an eective multimodal quantisation scheme
  • 41. AUPRC for LSH across a broad bit range 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 SBQ MQ VBQ 32 48 64 96 128 256 Number of Bits AUPRC 8 16 24 32 40 48 56 64 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 SBQ MQ VBQ Number of Bits AUPRC (a) Text (b) Images I VBQ is eective for both long and short bit codes (hash keys)
  • 42. Variable Bit Quantisation for Large Scale Search Nearest Neighbour Search Variable Bit Quantisation for LSH Evaluation Summary
  • 43. Summary I Proposed a novel data-driven scheme (VBQ) to adaptively assign variable bits per LSH hyperplane I Hyperplanes better preserving the neighbourhood structure are aorded more bits from budget I VBQ substantially increased LSH retrieval performance across text and image datasets I Current work: method to couple the quantisation and projection stages of LSH
  • 44. Thank you for your attention! I FSD live system: goo.gl/Q7WQOk [1] I Running over live Twitter stream in real time I Sub-second detection latency via ANN search (1 CPU!) I Papers: www.seanjmoran.com [1][8][9] I Contact: sean.moran@ed.ac.uk [1] Real-Time Detection, Tracking and Monitoring of Discovered Events in Social Media. S. Moran et al. In ACL'14. [8] Variable Bit Quantisation for LSH. S. Moran et al. In ACL'13 [9] Neighbourhood Preserving Quantisation for LSH. S. Moran et al. In SIGIR'13