GPU Acceleration of Set Similarity Joins

GPU Acceleration of Set Similarity Joins
Mateus S. H. Cruz, Yusuke Kozawa
Toshiyuki Amagasa, Hiroyuki Kitagawa
September 2, 2015

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Outline
1 Introduction
2 Tools
3 Proposal
Preprocessing
Signature Matrix
Join
4 Experiments
5 Summary

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Processing of Large Data
Sources: Social networks, online stores, sensors
Applications: Recommendation systems, data integration
1/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Processing of Large Data
Sources: Social networks, online stores, sensors
Applications: Recommendation systems, data integration
Problem: Detect similar records
Duplicate elimination
Plagiarism detection
Tsukuba
University
University of
Tsukuba
Same entity?
1/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Set Similarity Join
Find similar records
Similarity threshold (δ)
Student
Name Univ. Name
Bob Tsukuba Univ.
Mary Harvard Univ.
John Harvard Univ.
Anna Univ. of Berlin
University
Univ. Name Country
Univ. of Tsukuba Japan
Harvard Univ. USA
Univ. of Berlin Germany
δ = 0.6
2/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Set Similarity Join
Strings can be seen as sets of words (tokens)
University of Tsukuba → X = {University,of,Tsukuba}
Tsukuba University → Y = {Tsukuba,University}
2/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Set Similarity Join
Set similarity metric: Jaccard similarity (JS)
University of Tsukuba → X = {University,of,Tsukuba}
Tsukuba University → Y = {Tsukuba,University}
JS(X, Y ) =
|X ∩ Y |
|X ∪ Y |
=
2
3
= 0.67
2/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Set Similarity Join
Student
Name Univ. Name
Bob Tsukuba Univ.
Mary Harvard Univ.
John Harvard Univ.
University
Univ. Name Country
Harvard Univ. USA
δ = 0.6
2/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Set Similarity Join
Student
Name Univ. Name
Bob Tsukuba Univ.
Mary Harvard Univ.
John Harvard Univ.
University
Univ. Name Country
Harvard Univ. USA
δ = 0.7
2/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Set Similarity Join
Problem: Expensive processing
Student
Name Univ. Name
Bob Tsukuba Univ.
Mary Harvard Univ.
John Harvard Univ.
University
Univ. Name Country
Harvard Univ. USA
δ = 0.6
2/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Objective
Accelerate the processing of
set similarity joins
3/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Related Work
Serial similarity joins
Xiao et al., Efficient Similarity Joins for Near Duplicate
Detection, TODS 2011
Parallel similarity joins using MapReduce
Vernica et al., Efficient Parallel Set-similarity Joins using
Mapreduce, SIGMOD 2010
Parallel similarity joins using GPU
Lieberman et al., A Fast Similarity Join Algorithm Using
Graphics Processing Units, ICDE 2008
– Normed metric
Böhm et al., Index-supported Similarity Join on Graphics
Processors, BTW 2009
– Euclidean distance
4/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Graphics Processing Unit (GPU)
GPGPU: General-purpose Computing on GPUs
SM0
...
SMm
Device Memory
Shared Memory
SP0 SP1
... SPn
Registers Registers Registers
Architecture of a modern GPU
5/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Graphics Processing Unit (GPU)
GPGPU: General-purpose Computing on GPUs
Challenges: Processing of text, limited memory
SM0
...
SMm
Device Memory
Shared Memory
SP0 SP1
... SPn
Registers Registers Registers
Architecture of a modern GPU
5/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
MinHash1
Estimates Jaccard similarity
Apply hash functions to sets and keep the minimum
Similar sets will have the same hash value
Use hash values to create signatures
Parts of signatures: bins
Good coupling with GPU
Eﬃcient storage
Suitable for parallel processing
Li et al., GPU-based Minwise Hashing, WWW 2012
1
Broder, On the Resemblance and Containment of Documents,
Compression and Complexity of Sequences: Proceedings 1997
6/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
MinHash: Characteristic Matrix
database
transactions
are crucial
important gains
using gpu
database
transactions
are important
gpu are fast
R0
R1
S0
S1
R0 R1 S0 S1
database 1 0 1 0
transactions 1 0 1 0
are 1 0 1 1
crucial 1 0 0 0
important 0 1 1 0
gains 0 1 0 0
using 0 1 0 0
gpu 0 1 0 1
fast 0 0 0 1
Characteristic matrix
7/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
MinHash: Signature Matrix
1 Randomly permute rows
R0 R1 S0 S1
database 1 0 1 0
are 1 0 1 1
crucial 1 0 0 0
important 0 1 1 0
gains 0 1 0 0
using 0 1 0 0
gpu 0 1 0 1
fast 0 0 0 1
R0 R1 S0 S1
fast 0 0 0 1
important 0 1 1 0
gains 0 1 0 0
database 1 0 1 0
are 1 0 1 1
crucial 1 0 0 0
gpu 0 1 0 1
using 0 1 0 0
8/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
MinHash: Signature Matrix
1 Randomly permute rows
2 Save the index of the ﬁrst 1 for each bin
R0 R1 S0 S1
fast [0] 0 0 0 1
important [1] 0 1 1 0
gains [2] 0 1 0 0
database [3] 1 0 1 0
are [4] 1 0 1 1
crucial [5] 1 0 0 0
gpu [6] 0 1 0 1
using [7] 0 1 0 0
transactions [8] 1 0 1 0
b0
b1
b2
b0 b1 b2
R0 * 3 8
R1 1 * 6
S0 1 3 8
S1 0 4 6
Signature Matrix
8/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
MinHash: Estimating the Similarity
b0 b1 b2
R0 * 3 8
R1 1 * 6
S0 1 3 8
S1 0 4 6
Sim(X, Y ) = coinciding bins
total bins
9/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
b0 b1 b2
R0 * 3 8
R1 1 * 6
S0 1 3 8
S1 0 4 6
total bins
Sim(R0, S0) = 2/3 = 0.67 (Real similarity: 0.67)
9/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
b0 b1 b2
R0 * 3 8
R1 1 * 6
S0 1 3 8
S1 0 4 6
total bins
9/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Execution Flow
Input
R S
Preprocessing
Characteristic
matrix
Signature matrix
computation
Signature
matrix
Similarity
join
Array of
similar pairs
Output
formatter
Similar
pairs
Output
CPU GPU
10/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Preprocessing
Input
R S
Preprocessing
Characteristic
matrix
Signature matrix
computation
Signature
matrix
Similarity
join
Array of
similar pairs
Output
formatter
Similar
pairs
Output
CPU GPU
11/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Preprocessing
Compact characteristic matrix
Based on the Compressed Row Storage (CRS) format
Reduces transferred data
R0 R1 S0 S1
database [0] 1 0 1 0
transactions [1] 1 0 1 0
are [2] 1 0 1 1
crucial [3] 1 0 0 0
important [4] 0 1 1 0
gains [5] 0 1 0 0
using [6] 0 1 0 0
gpu [7] 0 1 0 1
fast [8] 0 0 0 1
R0 R1 S0 S1
start 0 4 8 12 15
tok 0 1 2 3 4 5 6 7 0 1 2 4 2 7 8
11/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Signature Matrix Computation
Input
R S
Preprocessing
Characteristic
matrix
Signature matrix
computation
Signature
matrix
Similarity
join
Array of
similar pairs
Output
formatter
Similar
pairs
Output
CPU GPU
12/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Signature Matrix Computation
From characteristic matrix to signature matrix
Parallelization of MinHash
Store signatures in the shared memory
Access device memory using coalesced access
R0 R1 S0 S1
0 4 8 12 15
0 1 2 3 4 5 6 7 0 1 2 4 2 7 8
* 3 8 1 * 6 1 3 8 0 4 6
Signature Matrix
13/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Similarity Join
Input
R S
Preprocessing
Characteristic
matrix
Signature matrix
computation
Signature
matrix
Similarity
join
Array of
similar pairs
Output
formatter
Similar
pairs
Output
CPU GPU
14/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Similarity Join
Parallelize Nested-loop join (NLJ)
Store signatures from R in the shared memory
Read signatures from S using coalesced accesses
R0 * 3 8
R1 1 * 6
S0 1 3 8
S1 0 4 6
Block level parallelization
* 3 8 1 3 8
Thread level parallelization
14/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Result Output
The result size is initially unknown
Cannot allocate memory beforehand
Write conﬂicts between blocks
Three-phase scheme for result output2
4 2 0 2
0 4 6 6
1 - Execute the join and ﬁnd the num-
ber of similar pairs for each block
2 - Execute a scan and obtain the initial
writing positions for each block
3 - Allocate the result array, execute the
join and output the similar pairs
2
He et al., Relational Joins on Graphics Processors, SIGMOD 2008
15/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Output Formatter
Input
R S
Preprocessing
Characteristic
matrix
Signature matrix
computation
Signature
matrix
Similarity
join
Array of
similar pairs
Output
formattter
Similar
pairs
Output
CPU GPU
16/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Setup
Environment
GPU (CUDA): NVIDIA K20Xm
CPU Serial: Intel Xeon CPU E5-1650
CPU Parallel (OpenMP): 12 threads
Datasets
Images
Abstracts
Transactions
17/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Performance Comparison
Speedups
25 times faster than CPU Parallel
150 times faster than CPU Serial
10−1
100
101
102
103
104
210
211
212
213
214
215
216
217
218
219
|R|
Elapsedtime(s)
CPU (Serial)
CPU (Parallel)
GPU
(a) Images
100
101
102
103
104
210
211
212
213
214
215
216
217
218
219
|R|
Elapsedtime(s)
CPU (Serial)
CPU (Parallel)
GPU
(b) Abstracts
100
101
102
103
104
210
211
212
213
214
215
216
217
218
219
|R|
Elapsedtime(s)
CPU (Serial)
CPU (Parallel)
GPU
(c) Transactions
Overall performance (|R| = |S|)
18/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Performance Breakdown
Bottlenecks
CPU: Executing the join
GPU: Reading from disk
Short data transfer time
Read from disk Preprocessing MinHash Join Data transfer Total
GPU 201 9 0.044 67 0.07 282
CPU (Parallel) 201 9 0.146 585 0 800
CPU (Serial) 194 9 0.927 2411 0 2619
Execution time in seconds, Abstracts dataset, |R| = |S| = 524, 288
19/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Accuracy Evaluation
Varying number of bins
Trade-oﬀ: Accuracy and performance
Depends on the dataset
Number of Bins Precision Recall Execution Time (s)
1 0.0000 0.9999 25.3
2 0.0275 0.9999 25.4
4 0.9733 0.9999 25.6
8 0.9994 0.9999 25.7
16 0.9998 1.0000 26.1
32 1.0000 1.0000 27.4
64 1.0000 1.0000 29.6
128 1.0000 1.0000 34.4
256 1.0000 1.0000 45.8
384 1.0000 1.0000 77.6
512 1.0000 1.0000 133.6
640 1.0000 1.0000 161.5
Abstracts dataset, |R| = |S| = 65, 536
20/21

Introduction
Tools
Proposal
Preprocessing
Signature Matrix
Join
Experiments
Summary
Summary
Processing large data
Detect similar items
Eﬃcient scheme for set similarity joins
Jaccard similarity (MinHash)
GPU
High speedups: up to 150 times
Future work
Better join technique for GPUs
Multiple GPUs
21/21

Environment
Parameters
MinHash Alg.
Join Alg.
Q&A
Q&A
1/5

Environment
Parameters
MinHash Alg.
Join Alg.
Detailed Setup
GCC 4.4.7 -O3
NVCC 6.5 -O3 -use fast math
OpenMP 4.0
Component Speciﬁcation
CPU Intel Xeon CPU E5-1650
CPU cores 6 (12 threads with Hyper-Threading)
CPU clock 3.50 GHz
Main memory 32GB
GPU NVIDIA Tesla K20Xm
Scalar processors 2688
Processor clock 732 MHz
Global memory 6GB
2/5

Environment
Parameters
MinHash Alg.
Join Alg.
Parameters
Not much impact on the performance/accuracy
Threads per block
Similarity threshold
Join selectivity
0
25
50
75
100
32
64
128
256
384
512
640
768
8961024
Number of threads per block
Elapsedtime(s)
GPU
0
5
10
15
20
0.2
0.4
0.6
0.8
1.0
Similarity Threshold
Elapsedtime(s)
GPU
50
100
150
200
0.01
0.05
0.1
0.2
0.3
0.4
0.5
Selectivity
Elapsedtime(s)
CPU (Serial)
CPU (Parallel)
GPU
Abstracts dataset, |R| = |S| = 131, 072
3/5

Environment
Parameters
MinHash Alg.
Join Alg.
Parallel MinHash Algorithm
Algorithm 1: Parallel MinHash
input : characteristic matrix CMt×d (t tokens, d
documents), number of bins b
output: signature matrix SMd×b (d documents, b bins)
1 binSize ← t/b ;
2 for i ← 0 to d in parallel do // exec. by blocks
3 for j ← 0 to t in parallel do // exec. by threads
4 if CMj,i = 1 then
5 h ← hash(CMj,i );
6 binIdx ← h/binSize ;
7 SMi,binIdx ← min(SMi,binIdx , h);
8 end
9 end
10 end
4/5

Environment
Parameters
MinHash Alg.
Join Alg.
Parallel NLJ Algorithm
Algorithm 2: Parallel nested-loop join
input : collections R and S, similarity threshold δ
output: pairs of documents whose similarity is greater than δ
1 foreach r ∈ R in parallel do // exec. by blocks
2 foreach s ∈ S do
3 if Sim(r, s) ≥ δ then
4 output(r, s);
5 end
6 end
7 end
5/5

GPU Acceleration of Set Similarity Joins

Recommended

Recommended

More Related Content

Similar to GPU Acceleration of Set Similarity Joins

Similar to GPU Acceleration of Set Similarity Joins (20)

More from Mateus S. H. Cruz

More from Mateus S. H. Cruz (10)

Recently uploaded

Recently uploaded (20)

GPU Acceleration of Set Similarity Joins