Your SlideShare is downloading. ×
0
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
X86opti 05 s5yata
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

X86opti 05 s5yata

1,260

Published on

Remove Branches in BitVector Select Operations - marisa 0.2.2 -

Remove Branches in BitVector Select Operations - marisa 0.2.2 -

Published in: Career
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,260
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Remove Branches in BitVector Select Operations - marisa 0.2.2 - Susumu Yata @s5yata Brazil, Inc. 130 March 2013 Brazil, Inc.
  • 2. Who I AmJob Brazil, Inc. (groonga developer) We need R&D software engineers.Personal research & development Tries darts-clone, marisa-trie, etc. Corpus Nihongo Web Corpus 2010 (NWC 2010) 230 March 2013 Brazil, Inc.
  • 3. Relationships between BitVector and Marisa. BitVector and Marisa 330 March 2013 Brazil, Inc.
  • 4. BitVectorWhat‟s BitVector? A sequence of bitsOperations BitVector::get(i) BitVector::rank(i) BitVector::select(i) 430 March 2013 Brazil, Inc.
  • 5. BitVector – Get OperationsInterface BitVector::get(i)Description The i-th bit (“0” or “1”) 0 1 2 … i–1 i i+1 … n-2 n-1 0 0 1 … 0 1 1 … 0 0 Get! 530 March 2013 Brazil, Inc.
  • 6. BitVector – Rank OperationsInterface BitVector::rank(i)Description The number of “1”s up to the i-th bit 0 1 2 … i–1 i i+1 … n-2 n-1 0 0 1 … 0 1 1 … 0 0 How many “1”s? 630 March 2013 Brazil, Inc.
  • 7. BitVector – Select OperationsInterface BitVector::select(i)Description The position of the i-th “1” 0 1 2 … … … … … n-2 n-1 0 0 1 … … … … … 0 0 Where is the i-th “1”? 730 March 2013 Brazil, Inc.
  • 8. Marisa Who‟s Marisa? An ordinary human magician What‟s Marisa? A static and space-efficient dictionary Data structure Recursive LOUDS-based Patricia tries Site http://code.google.com/p/marisa-trie 830 March 2013 Brazil, Inc.
  • 9. Marisa – PatriciaPatricia is a labeled tree. Keys = Tree + Labels Node Label ID Key 1 “Ar” 4 0 “Argentina” 1 2 “Brazil” 1 “Armenia” 5 3 „C‟ 0 2 2 “Brazil” 4 “gentina” 6 3 “Canada” 3 5 “menia” 4 “Cyprus” 7 6 “anada” 7 “yprus” 930 March 2013 Brazil, Inc.
  • 10. Marisa – RecursivenessUnfortunately, this margin is too small… Keys = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels <– Reasonable Labels = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels Labels = Tree + Labels … 1030 March 2013 Brazil, Inc.
  • 11. Marisa – BitVector UsageLOUDS Level-Order Unary Degree SequenceTerminal flags A node is terminal (“1”) or not (“0”).Link flags A node has a link to its multi-byte label (“1”) or has a built-in single-byte label (“0”). 1130 March 2013 Brazil, Inc.
  • 12. Marisa – BitVector UsageLOUDS BitVector::get(), select()Terminal flags BitVector::get(), rank(), select()Link flags BitVector::get(), rank() 1230 March 2013 Brazil, Inc.
  • 13. How to implement Rank/Select operations. Implementations 1330 March 2013 Brazil, Inc.
  • 14. Rank DictionaryIndex structures r_idx[x].abs = rank(512・x) x = 0, 1, 2, … r_idx[x].rel[y] = rank(512・x + 64・y) – rank(512・x) Y = 1, 2, 3, … , 7Calculation abs + rel + popcnt() 1430 March 2013 Brazil, Inc.
  • 15. Rank OperationsTime complexity = O(1) 512 512 512 512 512 r_idx.abs 64 64 64 64 64 64 64 64 r_idx.rel 64 popcnt() 1530 March 2013 Brazil, Inc.
  • 16. Select DictionaryIndex structure s_idx[x] = select(512・x) i = 0, 1, 2, …Calculation Limit the range by using s_idx. Limit the range by using r_idx[x].abs. Limit the range by using r_idx[x].rel[y]. Find the i-th “1” in the range. 1630 March 2013 Brazil, Inc.
  • 17. Select Operations s_idx s_idx 512 512 512 512 512 512 512 r_idx.abs r_idx.abs 64 64 64 64 64 64 64 64 r_idx.rel r_idx.rel 64 Final round 1730 March 2013 Brazil, Inc.
  • 18. Select Final RoundBinary search & table lookup Three-level branches if if if if if if if 8 8 8 8 8 8 8 8 Table lookup 1830 March 2013 Brazil, Inc.
  • 19. How to remove the branches in the final round. Improvements 1930 March 2013 Brazil, Inc.
  • 20. Original// x is the final 64-bit block (uint64_t).x = x – ((x >> 1) & MASK_55);x = (x & MASK_33) + ((x >> 2) & MASK_33);x = (x + (x >> 4)) & MASK_0F;x *= MASK_01; // Tricky popcountif (i < ((x >> 24) & 0xFF)) { // The first-level branch if (i < ((x >> 8) & 0xFF)) { // The second-level branch if (i < (x & 0xFF)) { // The third-level branch // The first byte contains the i-th “1”. } else { // The second byte contains the i-th “1”. 2030 March 2013 Brazil, Inc.
  • 21. Tips – Tricky PopCount 0 1 1 1 0 0 1 0x = x – ((x >> 1) & MASK_55); 1 2 0 1x = (x & MASK_33) + ((x >> 2) & MASK_33); 3 1x = (x + (x >> 4)) & MASK_0F; 4 2130 March 2013 Brazil, Inc.
  • 22. Tips – Tricky PopCount// MASK_01 = 0x0101010101010101ULL;// x = x | (x << 8) | (x << 16) | (x << 24) | …;x *= MASK_01; 4 1 3 5 2 6 3 4 28 23 15 7 24 20 13 4 2230 March 2013 Brazil, Inc.
  • 23. + SSE2 (After PopCount)// y[0 … 7] = i + 1;__m128i y = _mm_cvtsi64_si128((i + 1) * MASK_01);__m128i z = _mm_cvtsi64_si128(x);// Compare the 16 8-bit signed integers in y and z.// y[k] = (y[k] > z[k]) ? 0xFF : 0x00;y = _mm_cmpgt_epi8(y, z); // PCMPGTB// The j-th byte contains the i-th “1”.// TABLE is a 128-byte pre-computed table.uint8_t j = TABLE[_mm_movemask_epi8(y)]; 2330 March 2013 Brazil, Inc.
  • 24. Tips – PCMPGTBy = _mm_cvtsi64_si128((i + 1) * MASK_01); 20 20 20 20 20 20 20 20z = _mm_cvtsi64_si128(x); 28 24 23 20 15 13 7 4// y[k] = (y[k] > z[k]) ? 0xFF : 0x00;y = _mm_cmpgt_epi8(y, z); 0x00 0x00 0x00 0x00 0xFF 0xFF 0xFF 0xFF 2430 March 2013 Brazil, Inc.
  • 25. + Tricks (After Comparison)uint64_t j = _mm_cvtsi128_si64(y);// Calculation without TABLEj = ((j & MASK_01) * MASK_01) >> 56;// Calculation with BSRj = (63 – __builtin_clzll(j + 1)) / 8;// Calculation with popcnt (SSE4.2 or SSE4a)j = __builtin_popcountll(j) / 8; 2530 March 2013 Brazil, Inc.
  • 26. – SSE2 (Simple and Fast)// x is the final 64-bit block (uint64_t).x = x – ((x >> 1) & MASK_55);x = (x & MASK_33) + ((x >> 2) & MASK_33);x = (x + (x >> 4)) & MASK_0F;x *= MASK_01; // Tricky popcountuint64_t y = (i + 1) * MASK_01;uint64_t z = x | MASK_80;// Compare the 8 7-bit unsigned integers in y and z.z = (z – y) & MASK_80;uint8_t j = __builtin_ctzll(z) / 8; 2630 March 2013 Brazil, Inc.
  • 27. Tips – Comparisonuint64_t y = (i + 1) * MASK_01; 0x14 0x14 0x14 0x14 0x14 0x14 0x14 0x14uint64_t z = x | MASK_80; 0x9C 0x98 0x97 0x94 0x8F 0x8D 0x87 0x84// Compare the 8 7-bit unsigned integers in y and z.z = (z – y) & MASK_80; 0x80 0x80 0x80 0x80 0x00 0x00 0x00 0x00 2730 March 2013 Brazil, Inc.
  • 28. + SSSE3 (For PopCount)// Get lower nibbles and upper nibbles of x.__m128i lower = _mm_cvtsi64_si128(x & MASK_0F);__m128i upper = _mm_cvtsi64_si128(x & MASK_F0);upper = _mm_srli_epi32(upper, 4);// Use PSHUFB for counting “1”s in each nibble.__m128i table = _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0);lower = _mm_shuffle_epi8(table, lower);upper = _mm_shuffle_epi8(table, upper);// Merge the counts to get the number of “1”s in each byte.x = _mm_cvtsi128_si64(_mm_add_epi8(lower, upper));x *= MASK_01; 2830 March 2013 Brazil, Inc.
  • 29. Tips – PSHUFBlower = _mm_cvtsi64_si128(x & MASK_0F); 12 8 7 4 15 13 7 4table = _mm_set_epi8(4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, …); 4 3 3 2 3 2 2 1 3 2 2 1 2 1 1 0// Perform a parallel 16-way lookup.lower = _mm_shuffle_epi8(table, lower); 2 1 3 1 4 3 3 1 2930 March 2013 Brazil, Inc.
  • 30. How effective the improvements are. Evaluation 3030 March 2013 Brazil, Inc.
  • 31. EnvironmentOS Mac OSX 10.8.3 (64-bit)CPU Core i7 3720QM – Ivy Bridge 2.6GHz – up to 3.6GHzCompiler Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn) 3130 March 2013 Brazil, Inc.
  • 32. DataSource Japanese Wikipedia page titles gzip –cd jawiki-20130328-all-titles-in- ns0.gz | LC_ALL=C sort –R > dataDetails Number of keys: 1,367,750 Average length: 21.14 bytes Total length: 28,919,893 bytes 3230 March 2013 Brazil, Inc.
  • 33. Binariesmarisa 0.2.1 ./configure CXX=clang++ --enable-popcnt make tools/marisa-benchmark < datamarisa 0.2.2 ./configure CXX=clang++ --enable-sse4 make tools/marisa-benchmark < data 3330 March 2013 Brazil, Inc.
  • 34. Results – marisa 0.2.1Without improvements #Tries Size Build Lookup Reverse Prefix Predict [KB] [Kqps] [Kqps] [Kqps] [Kqps] [Kqps] 1 11,811 724 1,105 1,223 1,038 711 2 8,639 632 790 877 753 453 3 8,001 621 750 816 708 406 4 7,788 591 723 791 687 391 5 7,701 590 712 781 680 384 Baseline 3430 March 2013 Brazil, Inc.
  • 35. Results – marisa 0.2.2With improvements #Tries Size Build Lookup Reverse Prefix Predict [KB] [Kqps] [Kqps] [Kqps] [Kqps] [Kqps] 1 11,811 757 1,198 1,359 1,115 772 2 8,639 657 873 1,000 820 503 3 8,001 621 817 924 770 453 4 7,788 613 797 900 752 438 5 7,701 610 787 884 737 427 Same size Faster operations 3530 March 2013 Brazil, Inc.
  • 36. Results – ImprovementsImprovement ratios #Tries Size Build Lookup Reverse Prefix Predict [%] [%] [%] [%] [%] [%] 1 0.00 +4.56 +8.42 +11.12 +7.42 +8.58 2 0.00 +3.96 +10.52 +14.03 +8.90 +11.04 3 0.00 0.00 +8.93 +13.24 +8.76 +11.58 4 0.00 +3.72 +10.24 +13.78 +9.46 +12.02 5 0.00 +3.39 +10.53 +13.19 +8.38 +11.20 Same size Faster operations 3630 March 2013 Brazil, Inc.
  • 37. Conclusion “Any sufficiently advanced technology is indistinguishable from magic.” “Any sufficiently advanced technique is indistinguishable from magic.” “You are magician.” 3730 March 2013 Brazil, Inc.

×