Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ja""a'1'2°1° 

Roaring Bitmaps

Consistently faster and smaller compressed
wwwnmruwimvvvmwwnvw bitmaps with Roaring (in pr...
Jawa” 2015 
httpszllgithub. com/ RoaringBitmap

Roaring Bitmaps

Consistently faster and smaller compressed
wi-uiwnwvwwv--...
Why care about Roaring bitmaps? 

Used by: 
- Lucene,  Soir,  Elastic

- Apache Spark, 
_ Whoosh,   

- Apache Kylin,  #T-...
> fjorlpyqrl-ñi . y _
.   trans-agi*
i "iix,  'f ; L T*

    

l?  _'plê| .'r”i! |.| _'I. 'I_Hl, ilHf-` nii* . riiiiigiiwi...
I F. .

'on' , 

i. ; tr,  i!  ,irr 'r i*  (ai : uni- , ` 

` i;  g i g `a , i i? , i* c j , _.`j 'a j `

ni . ii a. . i i...
/

What is a bitmap? 

Efficient way to represent a "set". 

- _
E. g.,  o,  1, 3, 4

becomes Ob11011 or "27"

also called...
"DII array"

Array ii

, _._ _. _

Earliest implementation? 

IBM MODEL 204 database
engine,  commercialized in 1972

(_ _...
.Rai-rita ç; igçiir-i. ic*: .iilififoçiii, içz,  -i i 'h l

,  ELJELTQ) iiriiiarsrçxciiiigiri,  iicielirivlçratgi  “ 
 ali...
<ff; trzirri; ciraaizrfirzitrci i   g  ii

iffiirjicrfiiftç] t-[Tlfl, ,1,, iT? `i`_-}- . iiraihriçllfy/  ,
    'l  

   @f...
Problem: 

If density is too low,  bitmaps
lose their edge.  E. g.,  it is
wasteful to represent
{1,32000,64000} with a
bi...
Bitmap
compression

- Keep the main benefits of bitmaps
- (e. g.,  bit-level parallelism) -
- Support varied sets

Many so...
Th~

Most popular
solution: 

run-length
encoding (RLE) 55

Popularized by Oracle
(BBC). 

Many variations: 

V%AH, EVwAH,...
dblC

3c). 

RLE-compressed bitmaps are
used in many production

systems:     t a,  t

- Apache Hive,  (GiitHqlnyi
- Oracl...
Main idea
behind RLE: 

ObOOOO. ...00O0OO011111.. ...11111
Then code it as

<number of zeros> <number of ones>
Benefit of RLE: 

 
 
   
   

- Good compression
- Pretty fast decoding (though limited
by branch mispredictions? )
Variation on RLE: 
hierarchical bitmaps

Fig.  1. A sing

0101 This is the
/ `  compressed
y,  .^_ representati
o11o 1ooo ...
Downside of RLE :  Difficult to do
random access!  Must start from the

start always? 

Downside of hierarchical :  pointe...
~ ~ = ~~_rz, = ~ 

y (1 lali;  3; YAY/ itil?  Bilflfllê* fIITIKE-lfêlêlliflflf1175l
 ~ Ptê-*Yli/  çn: l"lrlllara; 
E

L l
...
n r* ,  . n u
Q a ha' r* Frf FA' l Q hrg' a rf L 1,1* q I, " C F( F' Pf
`_', `i_“_lc , l ci , l ll lc`. ~_'_l` l ll-"l , l...
When in doubt,  prefer a trivial
data structure. ..

The simpler the data
structure,  the easier it
is to optimize the cod...
QC? ”

"l th' ll r' “IT
n'  , l "-i l . v i
.  tl l *mui Jl

.  H

l f°`g` FpTAI V* I 'q _ `   Ia J? " `f`ñ c?  J", 
` l ....
lzspeClally on TaI DTOCESSOTS
(SIMD)

Downside to
this model: 

You must choose
container type
dynamically. 

You must mai...
Hardware progress: 

Today,  most processors have

a fast instruction (e. g.,  throughput
of 1 per cycle) to compute the
H...
New hardware +
well chosen algorithms

make the hybrid model
fast'.
r~ r' I r* hr*
. l m( F(  , ñfff #mra l"4'l 'Fw',  rr,  '>"`Jfl n',  l  dwi.  16'? 
l -M~', -,~l-, -m-«i'_-. lll, -l{li»ll...
Efvjlliliiêglégççl    'V g t.  g" .  y,  1 ,  

  
 

      '
: ain alrral}~/   
: lZñEtTfrLlEil  l  

tiszl, nleaer'  l a...
Array vs.  Array

intersection is always an array.  If one array
is much larger,  we use O(n log m) galloping. 

Union:  i...
Array vs.  Bitmap

intersection is always array.  Very fast: 
iterate over array and check bits in bitmap. 

Union is alwa...
To sum it up. ..

Array vs.  Bitmap:  super fast (also common)

Bitmap vs.  Bitmap:  union is always fast, 
unlucky inters...
Example
(Census 188.7.)

Concise EWAH Roaring WAH
size (bits/ int) 2.9 3.3 2.7 2.9

intersections 460 150 1.0 370
(relativ...
QC? ”

Several ports:  (available NOW)

Go (complete,  supported by us)
github. com/ RoaringBitmap/ roaring

C++,  C,  Pyt...
KC? ”

Future/ ongoing work: 

- Official C port (highly optimized! )
- C#/ LlNQ

- Support for 64-bit integers

- Paralle...
;gaun rr $151!" l:  "I
laut' Hut'.  ulsrwlgl. ;

Roaring provides fast bitmap
indexes optimized for modern-day
CPUs. 

Ena...
Jawa” 2015 
httpszllgithub. com/ RoaringBitmap

Roaring Bitmaps

Consistently faster and smaller compressed
nwi-iwmviiwi--...
Upcoming SlideShare
Loading in …5
×

Roaring Bitmaps (January 2016)

1,589 views

Published on

Better bitmap performance with Roaring bitmaps

Bitmaps are used to implement fast set operations in software. They are frequently found in databases and search engines.

Without compression, bitmaps scale poorly, so they are often compressed. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). For example, Oracle relies on BBC bitmap compression while the version control system Git and Apache Hive rely on EWAH compression.

We can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two-level tree. An instance of this technique, Roaring, has been adopted by several production platforms (e.g., Apache Lucene/Solr/Elastic, Apache Spark, eBay's Apache Kylin and Metamarkets' Druid).

Overall, our implementation of Roaring can be several times faster (up to two orders of magnitude) than the implementations of traditional RLE-based alternatives (WAH, Concise, EWAH) while compressing better. We review the design choices and optimizations that make these good results possible.

Published in: Technology
  • How i get my ex girfriend back: Email Dr Trust for urgent love spell Ultimatespellcast@gmail.com Am Travis Bob Jason from United State, I really appreciate what Dr. Trust has done in my life by bringing back my girlfriend to me after all we pass through i was not having it on mind that my girlfriend will come back to me again because of the situation and little misunderstanding we had, so i was thinking of what to do cause i was very confuse,so one day i was checking something on internet when i met a comment of testimony on the site how a great man called Dr. Trust help her and solve all her problem, it was amazing and i was thinking if this is real, i contacted the woman who posted it and i ask her about the man so she told me everything about the great man, that the man is very powerful and also he is a generous person, so i said let me give a try, really i contacted the great man and i told him everything that happened, he just laughed and told me not to worry anymore, and he did everything i told him to do for me, so after two day later i saw Vivian my girlfriend coming in front of me crying and begging me to forgive her, so i was very exacted we are now together and we are going to celebrate the Christmas and New year together in California, i said now, i believe that everybody are not the same this man called Dr. Trust is really powerful, thank you very much sir,contact him through his email. Ultimatespellcast@yahoo.com OR Ultimatespellcast@gmail.com WhatApp him or call +2348156885231
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Roaring Bitmaps (January 2016)

  1. 1. Ja""a'1'2°1° Roaring Bitmaps Consistently faster and smaller compressed wwwnmruwimvvvmwwnvw bitmaps with Roaring (in preparation) Izaulrihlrlirrw (hal n . s nm nmzmir- u. tamu» mn EXHAUSUVB ini Among : mms. »AE . .om m. »a : uxmnnmgr omm mmm-nam hum J Better bitmap performance with Roaring bitmaps, ñ? -mamumnv-kvutmn-uvmn Software: Practice and Experience (to appear). open AMM; Vnngnm 4 _ _ _ 59° W °'*: .': f“';3°*, ::^“°;2.": :;: ;.": ;* * 'MEE. wéykmpruncnwnnmm Witlplcl: s, ... ... ... .,. ... rn. ri. r s Wr &batam nu n namu n xmamx vaaiandlhlcm Kylln awam. . umur. .., L »m t mug) Ummat! :mam . ma a (mm nsssssmtm Cntmpmngnnq wak Oman] c um (mqmy upummm cwu NC huppun m. 54-an imtqms Paraileinznlim GPU: Move mrnem. :nmvmrsunc mur n u Hunny? FMue r. Asv lu terurut , 'se' c. . c ? Wina CWIon msn Ha-. keii c. s~ With funding from: m ; syg Academic ' n affiations: # 26143 LICEF, TELUQ LATECE. UQAM CSAS, UNB
  2. 2. Jawa” 2015 httpszllgithub. com/ RoaringBitmap Roaring Bitmaps Consistently faster and smaller compressed wi-uiwnwvwwv--m bitmaps with Roaring (in preparation) zumibinniw ma: rr i; nm nnsqhieru ; umi/ KIP im (YNRUSMVE list mmg mms m: vmu liki: m xvnamxigl: Hmwaic wllliihulmiva hum J QZTTIETESWEITLT? &Liflifmiiiimit Better bitmap performance with Roaring bitmaps, mw, 5 Pelmrjrinu a m. c . Wa _ . . sama mm-a-marmvuim-»omn Software. Practice and Experience (to appear). amin, ... [iuciuuirxu c m. .” HVIUF mm m. _ _ _ , _ "*""°*""° 'mw ”i'm” mm* * mmivnreduurwrimnn . ..m &bima W . ... .., .,. .sr. mrr. wo s m. , dtveldvevshllciudinqi Nun l Yungyprwldeu ieedhack and a : mimi ummi. : Fulureiungnnrhg wi. ONiual c ; m [liiqlW mimimnii Crt/ UNO . Simpul ini 55.171 IIICECVS r rainiielizatmn wus m. demi! wing-ww. ) wmisnbaunuw smm. .m H "rim-sem n m* r. . c PyiliuiLCyVmxi wuah Naskah Cu Se? with funding from: Q rim-inc Academic "m" affiations: # LICEF, TELUQ 26143 LATECE, UQAM “ CSAS. UNB
  3. 3. Why care about Roaring bitmaps? Used by: - Lucene, Soir, Elastic - Apache Spark, _ Whoosh, - Apache Kylin, #T- - Druid. druid J - 29, 59071,3 Apache Kylinm 5.* elastic Extreme OLAP Engine for Big Data
  4. 4. > fjorlpyqrl-ñi . y _ . trans-agi* i "iix, 'f ; L T* l? _'plê| .'r”i! |.| _'I. 'I_Hl, ilHf-` nii* . riiiiigiiwifiiivrçraiii_kaii_ , y . .sy w' _, gi f, : =`~'~ ` , m1 wi . l' *i* JW c: F . ii iiiiuc-: içiiicl Musician-im: .L5 miniigi i! . _plfiIiT-fillcluvl : ic e-i. -i i i "m- ajrii: 7 i . . >*fêç; ñ**`: iileri _I *gi 't J” wii, 155 i J. akh'. " , . . T " `~ * " _"i l 'r "Q ; c,. ",cu. , ~. " 2 'I 71"; . . ' ' p; ”f &ff-jiga; iiiiiiltgihññllmiémfwnféléifêfégwméi& * a - Fia. - . in é i; :- i . -": `-v`~. ;}`~. ;*7i*-L. f'“'
  5. 5. I F. . 'on' , i. ; tr, i! ,irr 'r i* (ai : uni- , ` ` i; g i g `a , i i? , i* c j , _.`j 'a j ` ni . ii a. . i i. .liu°i '4i. twi; i* nt int x= Olri 11111100 =252=Or iFC; Supponedin C/ C++, Python, .Java 7+, Swift, JavaScript, D, PHP, Rust, Clojure, C#(soon), Ruby, Perl. ..
  6. 6. / What is a bitmap? Efficient way to represent a "set". - _ E. g., o, 1, 3, 4 becomes Ob11011 or "27" also called a "bitset" or a "bit array"
  7. 7. "DII array" Array ii , _._ _. _ Earliest implementation? IBM MODEL 204 database engine, commercialized in 1972 (_ _ Patrick O'Nei| i httpszllen. wikipediaorg/ wiki/ Patrick_O%27Nei|
  8. 8. .Rai-rita ç; igçiir-i. ic*: .iilififoçiii, içz, -i i 'h l , ELJELTQ) iiriiiarsrçxciiiigiri, iicielirivlçratgi “ alilcil 'nanti tit-i cxcifiiiagiiitaiçil f l . l a JA5ii*tI["> cipiairaifiçiri illêllillivêlêljcfl ilizllzrliçiifli ` airirl izltcit, crii. i;ii iChJiItgcLl t"fe. i}'iii~t; xir; il "gfêlihlilttê-trkiflfll"
  9. 9. <ff; trzirri; ciraaizrfirzitrci i g ii iffiirjicrfiiftç] t-[Tlfl, ,1,, iT? `i`_-}- . iiraihriçllfy/ , 'l @flrplgJlZa Brasilia (iii, l,ti1;iZ ifi. ;ll. ,l)i i (infra iiliiia i3;ei, rfr, i a lfliiiiij'iiiliiikiiElCiTLi , y I ` *an*
  10. 10. Problem: If density is too low, bitmaps lose their edge. E. g., it is wasteful to represent {1,32000,64000} with a bitmap!
  11. 11. Bitmap compression - Keep the main benefits of bitmaps - (e. g., bit-level parallelism) - - Support varied sets Many solutions have been proposed!
  12. 12. Th~ Most popular solution: run-length encoding (RLE) 55 Popularized by Oracle (BBC). Many variations: V%AH, EVwAH, C ' (mase RLE-con used in r
  13. 13. dblC 3c). RLE-compressed bitmaps are used in many production systems: t a, t - Apache Hive, (GiitHqlnyi - Oracle, i d - Git. .. Counting Objects h: ih ninrin. m nin-
  14. 14. Main idea behind RLE: ObOOOO. ...00O0OO011111.. ...11111 Then code it as <number of zeros> <number of ones>
  15. 15. Benefit of RLE: - Good compression - Pretty fast decoding (though limited by branch mispredictions? )
  16. 16. Variation on RLE: hierarchical bitmaps Fig. 1. A sing 0101 This is the / ` compressed y, .^_ representati o11o 1ooo 0" Of a_4 X 4x4-bit _/ bitmap ~ ~ 'I - containing 1011 0110 1110 many zeroes SparseBitSet: h : ih . m r l ri r Bi Bitmagic: httzbm i. r fr . ne
  17. 17. Downside of RLE : Difficult to do random access! Must start from the start always? Downside of hierarchical : pointer - overhead? Latency from tree traversal? Knp impmmuuon In mind: - Opllmlze lm modem processors u" 00186 o' superscular execllnon rm B-Im SlMD - Fast random access ~ "swt Insrlltn-vns (POPCNT) (helps Wllh sama lnrevseclluus) - Cache "lendly r Few pmmers - DME! !! compression DVEY wide range ol cases Le! us nm vemvem m: muali Ollqlnal ow l bitmap llnplemenla RldBlT) ; level llee wllh lwn mp2; ul Culllniilelh (allnys « slralqhl lylnraps) Hybrid model lor azrbvl lnlegens 14.15 Ulmom 5A 35 sll ucrure stores ilk. . . ellels also common) vays fast, rsection slower T T New hardware . SiZI
  18. 18. ~ ~ = ~~_rz, = ~ y (1 lali; 3; YAY/ itil? Bilflfllê* fIITIKE-lfêlêlliflflf1175l ~ Ptê-*Yli/ çn: l"lrlllara; E L l > "i a 'L' . f Iarralrli lZiiTITFl`. lL`lf'? ~'er_`-ir_`-i1if`ll`l lili/ ali' lil/ tirta P * I relrrla lili (pranala-laa i
  19. 19. n r* , . n u Q a ha' r* Frf FA' l Q hrg' a rf L 1,1* q I, " C F( F' Pf `_', `i_“_lc , l ci , l ll lc`. ~_'_l` l ll-"l , ll '_' l r; i, ” , l ; `,l l »l ~ l` l v l l *` c` `- nl l . l ~. .l` ~` l ` " *- ` l" l l` l l` 1 x i l, .l I l. l 1 l 1'1`J: `j1l, J ni l: l l' - Optimize 'for modern processors: - superscalar execution - SIMD - new instructions (POPCNT) - Cache friendly l
  20. 20. When in doubt, prefer a trivial data structure. .. The simpler the data structure, the easier it is to optimize the code! Especially on fat processors (RINI m
  21. 21. QC? ” "l th' ll r' “IT n' , l "-i l . v i . tl l *mui Jl . H l f°`g` FpTAI V* I 'q _ ` Ia J? " `f`ñ c? J", ` l . l v` l l` J ~ l, J. ? a *lq n', ~'l` i* . _._ `1 &b; `1[ Jl l &vcl T' Decompose 32-bit space into 16-bit spaces (chunk). For each chunk, use best container: - bitset (01001011) - array ({1,20,144}) - runs ([O,10], [15,20])
  22. 22. lzspeClally on TaI DTOCESSOTS (SIMD) Downside to this model: You must choose container type dynamically. You must maintain/ guess the cardinality of each container: expensive?
  23. 23. Hardware progress: Today, most processors have a fast instruction (e. g., throughput of 1 per cycle) to compute the Hamming weight (number of ls) of a 64-bit word. “ ê “Cpo O” 'd
  24. 24. New hardware + well chosen algorithms make the hybrid model fast'.
  25. 25. r~ r' I r* hr* . l m( F( , ñfff #mra l"4'l 'Fw', rr, '>"`Jfl n', l dwi. 16'? l -M~', -,~l-, -m-«i'_-. lll, -l{li»ll l. ,i'l', ".~l, -*l-l, "l l- l& `Jl ` &W 1 l: nl, ,di l, Hp! l, lltllr`ll in U3' ' 'd Instead of making everything fit in one model (bitset or array). We mix them. Many optimization opponunmes!
  26. 26. Efvjlliliiêglégççl 'V g t. g" . y, 1 , ' : ain alrral}~/ : lZñEtTfrLlEil l tiszl, nleaer' l a t , _ l-
  27. 27. Array vs. Array intersection is always an array. If one array is much larger, we use O(n log m) galloping. Union: if some of cardinalities is large, guess that result should be bitmap. Do it (fast). If got it wrong, convert back to array (slow). T_T
  28. 28. Array vs. Bitmap intersection is always array. Very fast: iterate over array and check bits in bitmap. Union is always a bitmap: just set corresponding bits from array in bitmap. Super fast.
  29. 29. To sum it up. .. Array vs. Bitmap: super fast (also common) Bitmap vs. Bitmap: union is always fast, unlucky intersection slowel Array vs. Array : can be slow Run containers : For another talk. ..
  30. 30. Example (Census 188.7.) Concise EWAH Roaring WAH size (bits/ int) 2.9 3.3 2.7 2.9 intersections 460 150 1.0 370 (relative timings) unions 43 43 1.0 38 (relative timings) See httpszllgithub. com/ RoaringBitmap/ Roarin Bitma / tree master 'mh
  31. 31. QC? ” Several ports: (available NOW) Go (complete, supported by us) github. com/ RoaringBitmap/ roaring C++, C, Python, Cython, Rust, Haskell, C# See httpzllroaringbitmaporg/
  32. 32. KC? ” Future/ ongoing work: - Official C port (highly optimized! ) - C#/ LlNQ - Support for 64-bit integers - Parallelization -GPUs - More in-depth comparisons
  33. 33. ;gaun rr $151!" l: "I laut' Hut'. ulsrwlgl. ; Roaring provides fast bitmap indexes optimized for modern-day CPUs. Enables fast random access: select, rank. .. Actively developed. Support over a wide range of platforms and languages.
  34. 34. Jawa” 2015 httpszllgithub. com/ RoaringBitmap Roaring Bitmaps Consistently faster and smaller compressed nwi-iwmviiwi--ii bitmaps with Roaring (in preparation) (UWVlbulDlK nini ll i; nm ummi-ia ainul» Il! crmuslm: list mmg mms m: vmu Ilki: m MYDOMISQE Hmwalc Wlllllhullnlla llum J QZTTIETSWEITLT? ilhiflifmiliimli, Better bitmap performance with Roaring bitmaps, Jacques, s Pelleurlnu a m. a PUHILE _ . . sim mm-a-marmviim-»omn Software. Practice and Experience (to appear). dan, ... .lum c Allen HVIUF m) m. _ _ _ , _ "*""°t""° *mw* ”m” *tm* * mallvnleduuvwrlrknn . ..m &bima 5.. .,, .. ... .,. ,.. si. mla. wo s m. , aevelupmiincitiaimt Hull l Yunmplwliled leedhsck and a : minal menmem Futurelungunhg *Imvk Ultima! c ; m [lllqlW mimimnli Crt/ UNO . suwun luv at. .” lmCqCVS r ralnllelmannn wus m. dual? ! (lulllumlsvi) wmusnbumnlw smm. .m H vena-wii n m* r. . c prlllulhCyilrul nini, Hasta-Jl Cu see with funding from: Q untu: Academic "m" affiations: # LICEF, TELUQ 26143 LATECE, UQAM “ CSAS. UNB

×