Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Vectorized VByte
Decoding
Jeff Plaisance, Nathan Kurz, Daniel Lemire
Inverted Indexes
Inverted Index
● Like index in the back of a book
● words = terms, page numbers = doc ids
● Term list is sorted
● Doc list...
doc id query country impressions clicks
0 software Canada 10 1
1 blank Canada 10 0
2 sales US 5 0
3 software US 8 1
4 blan...
Constructing an Inverted Index
query country impression clicks
doc id blank sales software Canada US 5 8 10 0 1
0 ✔ ✔ ✔ ✔
...
Constructing an Inverted Index
field term 0 1 2 3 4
query blank ✔ ✔
sales ✔
software ✔ ✔
country Canada ✔ ✔
US ✔ ✔ ✔
impre...
Inverted Index
field term doc list
query blank 1, 4
sales 2
software 0, 3
country Canada 0, 1
US 2, 3, 4
impressions 5 2
8...
Inverted Indexes
Allow you to:
● Quickly find all documents containing
a term
● Intersect several terms to perform
boolean...
Inverted Index Optimizations
● Compressed data structures
○ Better use of RAM and processor cache
○ Better use of memory b...
Delta / VByte Encoding
● Doc id lists are sorted
● Delta between a doc id and the previous doc
id is sufficient
● Deltas a...
Delta Encoding
field term doc list
query nursing 34, 86, 247, 301, 674, 714
Delta Encoding
field term doc list
query nursing 34, 86, 247, 301, 674, 714
34, 52, 161, 54, 373, 40
Small Integer Compression
● Golomb/Rice
● VByte (or Varint)
● Binary Packing
● PForDelta
Small Integer Compression
● Golomb/Rice
● VByte
● Bit Packing
● PForDelta
VByte Encoding
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
? 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9838
? 1 1 0 1 1 1 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
? 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
? 1 0 0 1 1 0 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
0 1 0 0 1 1 0 0
VByte Encoding
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
9838
1 1 1 0 1 1 1 0
0 1 0 0 1 1 0 0
VByte
Pros:
● Compression
● Can fit more of index in RAM
● Higher information throughput per byte read
from disk
VByte
Cons:
● Decodes one byte at a time
● Lots of branch mispredictions
● Not fast to decode
● Largest ints expand to 5 b...
Vectorized VByte Decoding
Optimized decoder implemented using x86_64
intrinsics
Vectorized VByte Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 101100...
Vectorized VByte Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 101100...
Vectorized VByte Decoding
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 101100...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Pattern of leading bits determines:
● how many varints to decode
● sizes and offsets of varints
● length of l...
010010100111
Decoding options for:
● sixteen 1 byte varints
● six 1-2 byte varints
● four 1-3 byte varints
● two 1-5 byte ...
010010100111
Decoding options for:
● sixteen 1 byte varints - special case
● six 1-2 byte varints - 2^6, 64 possibilities
...
010010100111
Data Distribution:
● Longer doc id lists are necessarily
composed of smaller deltas
● Most deltas in real dat...
Most Significant Bit Decoding
● We separate most significant bit decoding
from integer decoding
● Reduces duplicate most s...
010010100111
● If most significant bits of next 16 bytes are all
0, handle sixteen 1 byte ints case
● Otherwise lookup mos...
010010100111
Lookup table contains:
● Shuffle vector index from 0-169 representing
which possibility we are decoding
● Num...
010010100111
Branch on shuffle vector index to determine
which case we are decoding
● 0-63 - six 1-2 byte ints
● 64-144 - ...
Six 1-2 Byte Ints
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000...
Expected Positions
2 1 2 1 2 1 2 1 2 1 2 1 0 0 0 0
0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1
1 5 0 4 0 3 0 2 1 5 0 4 0 3 0 2
Six 1-2...
Six 1-2 Byte Ints
01001010 11001000 01110001 01001110
10011011 01101010 10110101 00010111
01110110 10001101 10110011 11000...
Six 1-2 Byte Ints
01001010 00000000 11001000 01110001
01001110 00000000 10011011 01101010
10110101 00010111 01110110 00000...
Shuffle input
● Use index to lookup appropriate shuffle
vector
● Shuffle input bytes to get them in the
expected positions
for (i = 0; i < 16; i++) {
if (mask[i] & 0x80) {
dest[i] = 0;
} else {
dest[i] = src[mask[i] & 0xF];
}
}
pshufb
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0
src
mask
dest
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0 43
src
mask
...
pshufb
DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48
0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5
DE 56 27 0 A6 0 43 7C DE 45 ...
Shuffle input
● Use index to lookup appropriate shuffle
vector
● Shuffle input bytes to get them in the
expected positions
Six 1-2 Byte Ints
01001010 00000000 11001000 01110001
01001110 00000000 10011011 01101010
10110101 00010111 01110110 00000...
Six 1-2 Byte Ints
00000000 01001010 01110001 11001000
00000000 01001110 01101010 10011011
00010111 10110101 00000000 01110...
Six 1-2 Byte Ints
00000000 01001010 01110001 11001000
00000000 01001110 01101010 10011011
00010111 10110101 00000000 01110...
Six 1-2 Byte Ints
00000000 01001010 01110001 01001000
00000000 01001110 01101010 00011011
00010111 00110101 00000000 01110...
Six 1-2 Byte Ints
00000000 01001010 01110001 01001000
00000000 01001110 01101010 00011011
00010111 00110101 00000000 01110...
Six 1-2 Byte Ints
00000000 01001010 00111000 11001000
00000000 01001110 00110101 00011011
00001011 10110101 00000000 01110...
Six 1-2 Byte Ints
00000000 01001010 00111000 11001000
00000000 01001110 00110101 00011011
00001011 10110101 00000000 01110...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 11110101 11101101
01111001 11111000 01101001 00100001
00001011 10110101 10111001 0111...
Four 1-3 Byte Ints
11101110 00011101 00000000 00000000
11110101 11101101 01111001 00000000
11111000 01101001 00000000 0000...
Four 1-3 Byte Ints
00000000 00000000 00011101 11101110
00000000 01111001 11101101 11110101
00000000 00000000 01101001 1111...
Four 1-3 Byte Ints
00000000 00000000 00011101 11101110
00000000 01111001 11101101 11110101
00000000 00000000 01101001 1111...
Four 1-3 Byte Ints
00000000 00000000 00011101 01101110
00000000 01111001 01101101 01110101
00000000 00000000 01101001 0111...
Four 1-3 Byte Ints
00000000 00000000 00011101 01101110
00000000 01111001 01101101 01110101
00000000 00000000 01101001 0111...
Four 1-3 Byte Ints
00000000 00000000 00001110 11101110
00000000 01111001 00110110 11110101
00000000 00000000 00110100 1111...
Four 1-3 Byte Ints
00000000 00000000 00001110 11101110
00000000 01111001 00110110 11110101
00000000 00000000 00110100 1111...
Four 1-3 Byte Ints
00000000 00000000 00001110 11101110
00000000 00011110 01110110 11110101
00000000 00000000 00110100 1111...
Four 1-3 Byte Ints
00000000 00000000 00001110 11101110
00000000 00011110 01110110 11110101
00000000 00000000 00110100 1111...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 10011101 11110101 11101101
00000011 11111000 11101001 10100001
00001011 10110101 10111001 01110...
Two 1-5 Byte Ints
11101110 00000011
00000000 11101101
00000000 11110101
00000000 10011101
Clear top bit of each byte
Two 1-5 Byte Ints
01101110 00000011
00000000 01101101
00000000 01110101
00000000 00011101
Clear top bit of each byte
Two 1-5 Byte Ints
01101110 00000011 * 16 (<< 4)
00000000 01101101 * 32 (<< 5)
00000000 01110101 * 64 (<< 6)
00000000 00011...
Two 1-5 Byte Ints
01101110 00000011 * 16 (<< 4)
00000000 01101101 * 32 (<< 5)
00000000 01110101 * 64 (<< 6)
00000000 00011...
Two 1-5 Byte Ints
11100000 00110000 * 16 (<< 4)
00001101 10100000 * 32 (<< 5)
00011101 01000000 * 64 (<< 6)
00001110 10000...
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Left shift everything by 8 bits
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Left shift everything by 8 bits
...
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11100000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11110000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11110000 00110000 00001101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11110000 00111101 10101101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11110000 00111101 10101101 10100000
00011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01000000 00001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Bitwise OR pre-shifted and shift...
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
11110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
Extract result from every other ...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 10000000
OR in low 7 bits of least signif...
Two 1-5 Byte Ints
00110000 00111101 10101101 10111101
01011101 01001110 10001110 11101110
OR in low 7 bits of least signif...
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 1001...
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 1001...
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 1001...
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 1001...
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 1001...
Two 1-5 Byte Ints
00111101 10111101 01001110 11101110
Final result!
Checking my work against initial varint:
11101110 1001...
Results
Results
Q&A
Upcoming SlideShare
Loading in …5
×

MaskedVByte: SIMD-accelerated VByte

763 views

Published on

We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (Masked VByte) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed.

Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015.

http://arxiv.org/pdf/1503.07387.pdf

Published in: Technology
  • How i get my ex girfriend back: Email Dr Trust for urgent love spell Ultimatespellcast@gmail.com Am Travis Bob Jason from United State, I really appreciate what Dr. Trust has done in my life by bringing back my girlfriend to me after all we pass through i was not having it on mind that my girlfriend will come back to me again because of the situation and little misunderstanding we had, so i was thinking of what to do cause i was very confuse,so one day i was checking something on internet when i met a comment of testimony on the site how a great man called Dr. Trust help her and solve all her problem, it was amazing and i was thinking if this is real, i contacted the woman who posted it and i ask her about the man so she told me everything about the great man, that the man is very powerful and also he is a generous person, so i said let me give a try, really i contacted the great man and i told him everything that happened, he just laughed and told me not to worry anymore, and he did everything i told him to do for me, so after two day later i saw Vivian my girlfriend coming in front of me crying and begging me to forgive her, so i was very exacted we are now together and we are going to celebrate the Christmas and New year together in California, i said now, i believe that everybody are not the same this man called Dr. Trust is really powerful, thank you very much sir,contact him through his email. Ultimatespellcast@yahoo.com OR Ultimatespellcast@gmail.com WhatApp him or call +2348156885231
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

MaskedVByte: SIMD-accelerated VByte

  1. 1. Vectorized VByte Decoding Jeff Plaisance, Nathan Kurz, Daniel Lemire
  2. 2. Inverted Indexes
  3. 3. Inverted Index ● Like index in the back of a book ● words = terms, page numbers = doc ids ● Term list is sorted ● Doc list for each term is sorted
  4. 4. doc id query country impressions clicks 0 software Canada 10 1 1 blank Canada 10 0 2 sales US 5 0 3 software US 8 1 4 blank US 10 1 Standard Index
  5. 5. Constructing an Inverted Index query country impression clicks doc id blank sales software Canada US 5 8 10 0 1 0 ✔ ✔ ✔ ✔ 1 ✔ ✔ ✔ ✔ 2 ✔ ✔ ✔ ✔ 3 ✔ ✔ ✔ ✔ 4 ✔ ✔ ✔ ✔
  6. 6. Constructing an Inverted Index field term 0 1 2 3 4 query blank ✔ ✔ sales ✔ software ✔ ✔ country Canada ✔ ✔ US ✔ ✔ ✔ impressions 5 ✔ 8 ✔ 10 ✔ ✔ ✔ clicks 0 ✔ ✔ 1 ✔ ✔ ✔
  7. 7. Inverted Index field term doc list query blank 1, 4 sales 2 software 0, 3 country Canada 0, 1 US 2, 3, 4 impressions 5 2 8 3 10 0, 1, 4 clicks 0 1, 2 1 0, 3, 4
  8. 8. Inverted Indexes Allow you to: ● Quickly find all documents containing a term ● Intersect several terms to perform boolean queries
  9. 9. Inverted Index Optimizations ● Compressed data structures ○ Better use of RAM and processor cache ○ Better use of memory bandwidth ○ Increased CPU usage and time ● Optimizations matter!
  10. 10. Delta / VByte Encoding ● Doc id lists are sorted ● Delta between a doc id and the previous doc id is sufficient ● Deltas are usually small integers
  11. 11. Delta Encoding field term doc list query nursing 34, 86, 247, 301, 674, 714
  12. 12. Delta Encoding field term doc list query nursing 34, 86, 247, 301, 674, 714 34, 52, 161, 54, 373, 40
  13. 13. Small Integer Compression ● Golomb/Rice ● VByte (or Varint) ● Binary Packing ● PForDelta
  14. 14. Small Integer Compression ● Golomb/Rice ● VByte ● Bit Packing ● PForDelta
  15. 15. VByte Encoding 9838
  16. 16. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  17. 17. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  18. 18. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  19. 19. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838
  20. 20. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 ? 1 1 0 1 1 1 0
  21. 21. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9838 ? 1 1 0 1 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0
  22. 22. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 ? 1 1 0 1 1 1 0
  23. 23. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0
  24. 24. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0
  25. 25. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  26. 26. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  27. 27. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 ? 1 0 0 1 1 0 0
  28. 28. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0
  29. 29. VByte Encoding 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 9838 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0
  30. 30. VByte Pros: ● Compression ● Can fit more of index in RAM ● Higher information throughput per byte read from disk
  31. 31. VByte Cons: ● Decodes one byte at a time ● Lots of branch mispredictions ● Not fast to decode ● Largest ints expand to 5 bytes
  32. 32. Vectorized VByte Decoding Optimized decoder implemented using x86_64 intrinsics
  33. 33. Vectorized VByte Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001
  34. 34. Vectorized VByte Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 pmovmskb: Extract top bit of each byte
  35. 35. Vectorized VByte Decoding 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 pmovmskb: Extract top bit of each byte 010010100111
  36. 36. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  37. 37. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  38. 38. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  39. 39. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  40. 40. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  41. 41. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  42. 42. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  43. 43. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  44. 44. 010010100111 Pattern of leading bits determines: ● how many varints to decode ● sizes and offsets of varints ● length of longest varint in bytes ● number of bytes to consume
  45. 45. 010010100111 Decoding options for: ● sixteen 1 byte varints ● six 1-2 byte varints ● four 1-3 byte varints ● two 1-5 byte varints
  46. 46. 010010100111 Decoding options for: ● sixteen 1 byte varints - special case ● six 1-2 byte varints - 2^6, 64 possibilities ● four 1-3 byte varints - 3^4, 81 possibilities ● two 1-5 byte varints - 5^2, 25 possibilities 170 total possibilities
  47. 47. 010010100111 Data Distribution: ● Longer doc id lists are necessarily composed of smaller deltas ● Most deltas in real datasets (ClueWeb09, Indeed’s internal datasets) fall into 1 byte case or 1-2 byte case
  48. 48. Most Significant Bit Decoding ● We separate most significant bit decoding from integer decoding ● Reduces duplicate most significant bit decoding work if we don’t consume all 12 bytes ● Better instruction level parallelism
  49. 49. 010010100111 ● If most significant bits of next 16 bytes are all 0, handle sixteen 1 byte ints case ● Otherwise lookup most significant bits of next 12 bytes in 4096 entry lookup table
  50. 50. 010010100111 Lookup table contains: ● Shuffle vector index from 0-169 representing which possibility we are decoding ● Number of bytes of input that will be consumed
  51. 51. 010010100111 Branch on shuffle vector index to determine which case we are decoding ● 0-63 - six 1-2 byte ints ● 64-144 - four 1-3 byte ints ● 145-169 - two 1-5 byte ints
  52. 52. Six 1-2 Byte Ints 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 Decode 6 varints from 9 bytes
  53. 53. Expected Positions 2 1 2 1 2 1 2 1 2 1 2 1 0 0 0 0 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 1 5 0 4 0 3 0 2 1 5 0 4 0 3 0 2 Six 1-2 byte ints Four 1-3 byte ints Two 1-5 byte ints
  54. 54. Six 1-2 Byte Ints 01001010 11001000 01110001 01001110 10011011 01101010 10110101 00010111 01110110 10001101 10110011 11000001 Pad out 1 byte ints to 2 bytes
  55. 55. Six 1-2 Byte Ints 01001010 00000000 11001000 01110001 01001110 00000000 10011011 01101010 10110101 00010111 01110110 00000000 Pad out 1 byte ints to 2 bytes
  56. 56. Shuffle input ● Use index to lookup appropriate shuffle vector ● Shuffle input bytes to get them in the expected positions
  57. 57. for (i = 0; i < 16; i++) { if (mask[i] & 0x80) { dest[i] = 0; } else { dest[i] = src[mask[i] & 0xF]; } } pshufb
  58. 58. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 src mask dest
  59. 59. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 src mask dest
  60. 60. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 src mask dest
  61. 61. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE src mask dest
  62. 62. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE src mask dest
  63. 63. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE src mask dest
  64. 64. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 src mask dest
  65. 65. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 src mask dest
  66. 66. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 src mask dest
  67. 67. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 src mask dest
  68. 68. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 src mask dest
  69. 69. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 src mask dest
  70. 70. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 src mask dest
  71. 71. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 src mask dest
  72. 72. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 src mask dest
  73. 73. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 src mask dest
  74. 74. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 src mask dest
  75. 75. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 src mask dest
  76. 76. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 src mask dest
  77. 77. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 43 src mask dest
  78. 78. pshufb DE DF 27 E3 7C A9 60 55 1C EA 45 56 A6 43 C9 48 0 11 2 -1 12 -1 13 4 0 10 2 6 4 3 13 5 DE 56 27 0 A6 0 43 7C DE 45 27 60 7C E3 43 A9 src mask dest
  79. 79. Shuffle input ● Use index to lookup appropriate shuffle vector ● Shuffle input bytes to get them in the expected positions
  80. 80. Six 1-2 Byte Ints 01001010 00000000 11001000 01110001 01001110 00000000 10011011 01101010 10110101 00010111 01110110 00000000 Reverse bytes in 2 byte varints *not actually necessary since x86 is little endian
  81. 81. Six 1-2 Byte Ints 00000000 01001010 01110001 11001000 00000000 01001110 01101010 10011011 00010111 10110101 00000000 01110110 Reverse bytes in 2 byte varints *not actually necessary since x86 is little endian
  82. 82. Six 1-2 Byte Ints 00000000 01001010 01110001 11001000 00000000 01001110 01101010 10011011 00010111 10110101 00000000 01110110 Mask out leading purple 1’s
  83. 83. Six 1-2 Byte Ints 00000000 01001010 01110001 01001000 00000000 01001110 01101010 00011011 00010111 00110101 00000000 01110110 Mask out leading purple 1’s
  84. 84. Six 1-2 Byte Ints 00000000 01001010 01110001 01001000 00000000 01001110 01101010 00011011 00010111 00110101 00000000 01110110 Shift top bytes of each varint 1 bit right (mask/shift/or)
  85. 85. Six 1-2 Byte Ints 00000000 01001010 00111000 11001000 00000000 01001110 00110101 00011011 00001011 10110101 00000000 01110110 Shift top bytes of each varint 1 bit right (mask/shift/or)
  86. 86. Six 1-2 Byte Ints 00000000 01001010 00111000 11001000 00000000 01001110 00110101 00011011 00001011 10110101 00000000 01110110 Done!
  87. 87. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110
  88. 88. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  89. 89. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  90. 90. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  91. 91. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  92. 92. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  93. 93. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 101101000110
  94. 94. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 Decode 4 varints from 8 bytes
  95. 95. Four 1-3 Byte Ints 11101110 00011101 11110101 11101101 01111001 11111000 01101001 00100001 00001011 10110101 10111001 01110110 Pad ints to 4 bytes
  96. 96. Four 1-3 Byte Ints 11101110 00011101 00000000 00000000 11110101 11101101 01111001 00000000 11111000 01101001 00000000 00000000 00100001 00000000 00000000 00000000 Pad ints to 4 bytes
  97. 97. Four 1-3 Byte Ints 00000000 00000000 00011101 11101110 00000000 01111001 11101101 11110101 00000000 00000000 01101001 11111000 00000000 00000000 00000000 00100001 Reverse bytes *not actually necessary since x86 is little endian
  98. 98. Four 1-3 Byte Ints 00000000 00000000 00011101 11101110 00000000 01111001 11101101 11110101 00000000 00000000 01101001 11111000 00000000 00000000 00000000 00100001 Clear top bit of each byte
  99. 99. Four 1-3 Byte Ints 00000000 00000000 00011101 01101110 00000000 01111001 01101101 01110101 00000000 00000000 01101001 01111000 00000000 00000000 00000000 00100001 Clear top bit of each byte
  100. 100. Four 1-3 Byte Ints 00000000 00000000 00011101 01101110 00000000 01111001 01101101 01110101 00000000 00000000 01101001 01111000 00000000 00000000 00000000 00100001 Shift 2nd least significant bytes over by 1 bit (mask/shift/or)
  101. 101. Four 1-3 Byte Ints 00000000 00000000 00001110 11101110 00000000 01111001 00110110 11110101 00000000 00000000 00110100 11111000 00000000 00000000 00000000 00100001 Shift 2nd least significant bytes over by 1 bit (mask/shift/or)
  102. 102. Four 1-3 Byte Ints 00000000 00000000 00001110 11101110 00000000 01111001 00110110 11110101 00000000 00000000 00110100 11111000 00000000 00000000 00000000 00100001 Shift 3rd least significant bytes over by 2 bits (mask/shift/or)
  103. 103. Four 1-3 Byte Ints 00000000 00000000 00001110 11101110 00000000 00011110 01110110 11110101 00000000 00000000 00110100 11111000 00000000 00000000 00000000 00100001 Shift 3rd least significant bytes over by 2 bits (mask/shift/or)
  104. 104. Four 1-3 Byte Ints 00000000 00000000 00001110 11101110 00000000 00011110 01110110 11110101 00000000 00000000 00110100 11111000 00000000 00000000 00000000 00100001 Done!
  105. 105. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 111101110110
  106. 106. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 111101110110
  107. 107. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 111101110110
  108. 108. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 111101110110
  109. 109. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 Decode 2 varints from 9 bytes
  110. 110. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 ● Could handle the same way as other cases ● Would require 5 AND operations, 4 shift operations, and 4 OR operations
  111. 111. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 ● We can simulate shifting by different amounts with multiplication ● Only needs 1 multiplication, 1 shift, 1 OR, 1 shuffle
  112. 112. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 1 5 0 4 0 3 0 2 1 5 0 4 0 3 0 2 Two 1-5 byte ints Treat SIMD register as eight 16 bit registers, loading 1 byte into each. First byte doesn’t need to be shifted.
  113. 113. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  114. 114. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000000 00000000 00000000 00000000 00000000 00000000 00000000
  115. 115. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000000 00000000 00000000 00000000 00000000 00000000 10011101
  116. 116. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000000 00000000 00000000 00000000 11110101 00000000 10011101
  117. 117. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000000 00000000 11101101 00000000 11110101 00000000 10011101
  118. 118. Two 1-5 Byte Ints 11101110 10011101 11110101 11101101 00000011 11111000 11101001 10100001 00001011 10110101 10111001 01110110 11101110 00000011 00000000 11101101 00000000 11110101 00000000 10011101
  119. 119. Two 1-5 Byte Ints 11101110 00000011 00000000 11101101 00000000 11110101 00000000 10011101 Clear top bit of each byte
  120. 120. Two 1-5 Byte Ints 01101110 00000011 00000000 01101101 00000000 01110101 00000000 00011101 Clear top bit of each byte
  121. 121. Two 1-5 Byte Ints 01101110 00000011 * 16 (<< 4) 00000000 01101101 * 32 (<< 5) 00000000 01110101 * 64 (<< 6) 00000000 00011101 * 128 (<< 7) Multiply to shift bits into place
  122. 122. Two 1-5 Byte Ints 01101110 00000011 * 16 (<< 4) 00000000 01101101 * 32 (<< 5) 00000000 01110101 * 64 (<< 6) 00000000 00011101 * 128 (<< 7) Multiply to shift bits into place
  123. 123. Two 1-5 Byte Ints 11100000 00110000 * 16 (<< 4) 00001101 10100000 * 32 (<< 5) 00011101 01000000 * 64 (<< 6) 00001110 10000000 * 128 (<< 7) Multiply to shift bits into place
  124. 124. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000
  125. 125. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Left shift everything by 8 bits
  126. 126. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Left shift everything by 8 bits 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  127. 127. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  128. 128. Two 1-5 Byte Ints 11100000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  129. 129. Two 1-5 Byte Ints 11110000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  130. 130. Two 1-5 Byte Ints 11110000 00110000 00001101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  131. 131. Two 1-5 Byte Ints 11110000 00111101 10101101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  132. 132. Two 1-5 Byte Ints 11110000 00111101 10101101 10100000 00011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  133. 133. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  134. 134. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01000000 00001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  135. 135. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Bitwise OR pre-shifted and shifted registers 00110000 00001101 10100000 00011101 01000000 00001110 10000000 00000000
  136. 136. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  137. 137. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  138. 138. Two 1-5 Byte Ints 11110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  139. 139. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  140. 140. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  141. 141. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  142. 142. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  143. 143. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  144. 144. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  145. 145. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 Extract result from every other byte
  146. 146. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 10000000 OR in low 7 bits of least significant byte (remember that we stored it in most significant byte position originally)
  147. 147. Two 1-5 Byte Ints 00110000 00111101 10101101 10111101 01011101 01001110 10001110 11101110 OR in low 7 bits of least significant byte (remember that we stored it in most significant byte position originally)
  148. 148. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result!
  149. 149. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  150. 150. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  151. 151. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  152. 152. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  153. 153. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  154. 154. Two 1-5 Byte Ints 00111101 10111101 01001110 11101110 Final result! Checking my work against initial varint: 11101110 10011101 11110101 11101101 00000011
  155. 155. Results
  156. 156. Results
  157. 157. Q&A

×