Hayden Marchant, Software Engineer
haydenm@wix.com
Bloom Filters
Agenda
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
A
Bloom-what?
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Just a Data Structure
‘Is an element in a set
or not’
Probabilistic
Definitely not in set
Probably in set
OR
ILLUSTRATION
Why would I
want one?
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Why not just
store all used
keys?
Element
Key size:
# elements:
Memory
required:
100 bytes
10 million
100 * 10M
= 1GB
SIMPLE MATHS
Bloom Filters:
Tiny footprint
~ 10 bits per element
>98%
reduction
in space
Element
Key size:
# elements:
Memory
required:
10 bits
10 million
100M bits
~= 12MB
SIMPLE MATHS
(PART 2)
Footprints:
Traditional = 1GB
Bloom Filter <=12MB
Practical
Examples
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Akamai
Prevent caching of
One-Hit-Wonders
75% of web requests are
only ever called once
- Called One-Hit-Wonders
Use Bloom Filters to prevent
one-hit-wonders to be stored
in Disk Cache
1. Saves disk-access for
these 75% of requests
2. reducing workload
Medium
Avoid recommending
articles a user has
read
- Add article recommended
to bloom filter -
$userid-$articleid
- Before recommending,
check in Bloom Filter if
(user,article) exists.
- If not, then recommend
and then add to bloom
filter
Cassandra
Reduce disk lookup for
non-existent rows in file
storage
- Shard responsible over
range of data
- Each shard stored in
immutable shard-files
- Create Bloom Filter in
server for each shard-file
- Do not access shard-files
if Bloom Filter returns
false for key
Chrome
Identifying
Malicious URLs
Store malicious
URLs in Bloom
Filter
In-depth check on
positive response
How it works
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Basics of
Bloom
Filter
● Array of m bits
● initially set to 0
● Hash functions
● k hash functions
defined
● maps or hashes some
set element to one of
the m array positions
Bloom Filter
in
slow motion
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
{ }m = 20
k = 3
0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0
{ cat}m = 20
k = 3
0 0 1 0 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1
{ cat , dog }m = 20
k = 3
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
{cat , dog , mouse }m = 20
k = 3
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
{cat , dog , mouse }
owl
m = 20
k = 3
owl is definitely NOT in set
0 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 1 1 0 1
rat
m = 20
k = 3
rat might be in set
{cat , dog , mouse }
Constant memory
space
Extremely Fast
Processing
Low Error Rate
The maths
‘Under-the-hood’
1
2
3
4
5
PROBABILITIES
101
Probability of slot not selected for a single hash function
Probability of slot not selected for all k hashes
Probability of slot not selected for all k hashes of n elements
Probability of slot selected for all k hashes of n elements
Probability of k slots selected for n elements === False positive
GETTING A LITTLE
TRICKIER
If you want to read
more on the maths,
take a look at
https://en.wikipedia.org/
wiki/Bloom_filter
A Simple
Calculator
Click here: https://hur.st/bloomfilter
Next step
A Bloom what?
Why would I want one?
Practical examples
How it works
Next step
Bloom
Filters in
Code
● Libraries exist in most
languages
○ https://github.com/alexan
drnikitin/bloom-filter-scala
○ https://www.npmjs.com/p
ackage/bloom-filters
● Distributed Bloom Filters
in Redis
○ Perfect for multiple
instance of a service
using single bloom filter
Now, go think how using
Bloom Filters in your
services could reduce
payload and speed things
up
Other
probabilistic
structures
● Counting Bloom Filters
○ Allow deletes
● TopK
○ Keep track of Top
K counts of
elements
● Count-Min Sketch
○ Frequency table of
elements
Q&A

Introduction to Bloom Filters