SlideShare a Scribd company logo
1 of 18
Download to read offline
Rolling and Extendable Hashes
December 18, 2013

John Graham-Cumming

www.cloudflare.com!
CRC

www.cloudflare.com!
Cyclic Redundancy Check
•  Widely used as hash function
•  e.g. common to hash keys uses a CRC for load balancing
•  memcached, nginx, ...
•  crc(key) % #servers

•  Not designed as a hash function!


•  Choice of polynomial can change collisions in real world
•  CRCs have no internal state and they are fast
•  Shifts, XOR, small table lookup

www.cloudflare.com!
Extending a CRC
Compute the hash (a CRC) by extending the string to the
right

 h( )
S

h( )
S

h(
)
S

h(
)
S

Typical CRC allows this because when adding a bit/byte/
word the CRC is calculated from the previous CRC and the
added data.

www.cloudflare.com!
CRC64
•  Typical implementation (t is lookup table)
BIT64 crc = 0;!
!
while ( *s ) {!
BYTE c = *s++;!
crc = (crc >> 8) ^ t[(crc & 0xff) ^ c];!
}!

•  Some common polynomials:
64
4
3
•  CRC64-ISO (0x800000000000000D)
 x +x +x +x +1
•  CRC64-ECMA (0xA17870F5D4F51B49)

x 64 +x 63+ x 61 + x 59 + x 56 + x 55 + x 52 + x 49 + x 48 + x 47 + x 46 + x 44 +
x 41 + x 37 + x 36 + x 34 + x 32 + x 31 + x 28 + x 26 + x 23 + x 22 + x19 + x16 +
x13 + x12 + x10 + x 8 + x 7 + x 5 + x 3 + x1
www.cloudflare.com!
One use: spam filter tokens
•  Common to see in open source spam filters
From: Prince of Nigeria!
Subject: I beg pardon for interrupting your fine day!
!
[...]!
!

•  Generates tokens:

from:prince from:nigeria subject:pardon subject:interrupting!
subject:fine subject:day!

•  And from the message body
body:html:font:red body:html:image:size:0!

•  Repeated fixed parts (from:, subject:, html:image:size:)

can be CRCed once


www.cloudflare.com!
RABIN/KARP

www.cloudflare.com!
Rabin/Karp String Searching
“Efficient randomized pattern-matching algorithms”
Rabin/Karp, IBM Journal of Research and Development
March 1987


h(
)
P


h(
)
S


Compute and compare len(S)-len(P)+2 hashes
When hash matches verify that substrings actually match

www.cloudflare.com!
Two Expensive Operations
•  The h() function itself
•  Need a very fast hash algorithm
•  Average complexity is O(len(S))

•  Substring comparison on hash matches
•  Need minimal hash collisions
•  Worst case complexity is O(len(S) * len(P))
•  Trivial to make work with multiple patterns
•  Just compute their hashes
•  Use Bloom filter (or other hash table) to lookup patterns for each
hash calculated
•  Average complexity is O(len(S) + #patterns)

www.cloudflare.com!
Rabin/Karp Rolling Hash
•  Basic on arithmetic in the ring Zp where p is prime
•  Treats a substring s0 .. sn as a number in a chosen base

b.

n

h(s0 …sn ) = ∑ si b

n−i

mod p

i=0
•  This hash can be updated when sliding across a string

www.cloudflare.com!
Rabin/Karp Efficient Update
h(s1 …sn+1 )
n+1

= ∑ si b n+1−i mod p
i=1
n+1

= b∑ si b n−i mod p
i=1

Previous hash

# n
&
n−i
n
= % b∑ si b − s0 b + sn+1 ( mod p
$ i=0
'
= ( bh(s0 …sn−1 ) − s0 b n + sn+1 ) mod p

(

)

= b ( h(s0 …sn−1 ) − s0 b n−1 ) + sn+1 mod p
www.cloudflare.com!

Precomputed
Rabin/Karp base and modulus values
•  Use a prime base
•  When operating on bytes a suitable b is 257

•  Choose a prime modulus closest to the desired hash

output size
•  e.g. for 32-bit hash value use 4294967291

•  Various tricks can speed up the calculation
•  Don’t calculate bn-1, calculate bn-1 mod p instead
•  Choose a p such that bp fits in a convenient bit size
•  Memoize x (bn-1 mod p) for all x byte values
•  Choose p is a power of two (instead of a prime) can use & instead
of %
•  Result: update requires one *, one +, one –, one &
www.cloudflare.com!
Bentley/McIlroy Compression
"Data Compression Using Long Common Strings”
Jon Bentley, Douglas McIlroy
Proceedings of the IEEE Data Compression Conference,
1999

 “Dictionary” to compress against (could be self)
Broken into blocks, each block hashed
h0

h(

h1

h2

)

h3

h4

h5

S

Slide Rabin/Karp hash across string to compress
Look for matches in dictionary
Extend matches backwards block size bytes, forwards indefinitely
O(len(S))
www.cloudflare.com!
Never gonna give you up
We're no strangers to love!
You know the rules and so do I!
A full commitment's what I'm thinking of!
You wouldn't get this from any other guy!
I just wanna tell you how I'm feeling!
Gotta make you understand!
Never gonna give you up<204,13>let you down<204,13>run around<45,5>desert you<20!
4,13><184,9>cry<204,13>say goodbye<204,13>tell a lie<45,5>hu<285,7>!
We've<30,5>n each<129,7>for so long!
Your heart's been aching but!
You're too shy to say it!
Inside we both<30,6>wha<422,9>going on!
We<30,10>game<45,5>we're<210,7>play it!
And if you ask me<161,17>Don't tell m<220,5>'re too blind to see!
<340,13><217,161><229,12><634,161>(Ooh,<633,12>)<967,24>)<340,13>give, n<230,11>!
give!
(G<635,10><1004,57><377,11><389,160><139,239><229,12><634,333>!

•  Open source implementation from CloudFlare in Go
•  Bentley/McIlroy then gzip very effective

www.cloudflare.com!
RSYNC PROTOCOL

www.cloudflare.com!
RSYNC hashing
"Efficient Algorithms for Sorting and Synchronization”
Andrew Tridgell
Doctoral Thesis, February 1999
File on remote machine
Broken into blocks, each block hashed twice (rolling hash plus MD5)
Hash sent to local machine
h0

h(

h1

h2

)

h3

h4

h5

S

Slide rolling hash across source file looking for match
Check MD5 hash to eliminate rolling hash collisions
Send bytes form S to remote host, or token saying which hash matched
www.cloudflare.com!
The RSYNC rolling hash
•  Block size is b, offset into file is k, the file bytes are s, M

is an arbitrary modulus
# b−1
&
r1 (k, b) = % ∑ ai+k ( mod M
$ i=0
'
# b−1
&
r2 (k, b) = % ∑ (b − i)ai+k ( mod M
$ i=0
'
r(k, b) = r1 (k, b) + Mr2 (k, b)
•  In practice, M = 216

www.cloudflare.com!
Hash updates are fast
r1 (k +1, b) = ( r1 (k, b) − ak + ak+b ) mod M
r2 (k +1, b) = ( r2 (k, b) − bak + r1 (k +1, b)) mod M

(* proof is left as an exercise for the reader)
www.cloudflare.com!

More Related Content

What's hot

Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Maarten Mulders
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Karen Pao
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB
 
Concurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfConcurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfDenys Goldiner
 
ktruss-short
ktruss-shortktruss-short
ktruss-shortJia Wang
 
Pygrib documentation
Pygrib documentationPygrib documentation
Pygrib documentationArulalan T
 
Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)Maarten Mulders
 
Using PyPy instead of Python for speed
Using PyPy instead of Python for speedUsing PyPy instead of Python for speed
Using PyPy instead of Python for speedEnplore AB
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in Rmickey24
 
時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.sts時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.stsYuta Kashino
 
深層学習とベイズ統計
深層学習とベイズ統計深層学習とベイズ統計
深層学習とベイズ統計Yuta Kashino
 
How to extract information from text with Semgrex
How to extract information from text with SemgrexHow to extract information from text with Semgrex
How to extract information from text with SemgrexViktor Turskyi
 
ベイジアンディープニューラルネット
ベイジアンディープニューラルネットベイジアンディープニューラルネット
ベイジアンディープニューラルネットYuta Kashino
 
Cryptography : From Demaratus to RSA
Cryptography : From Demaratus to RSACryptography : From Demaratus to RSA
Cryptography : From Demaratus to RSAbenlamm
 
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウムT. Suwa
 
Presentation template - jupyter2slides
Presentation template - jupyter2slidesPresentation template - jupyter2slides
Presentation template - jupyter2slidesDat Tran
 

What's hot (19)

Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)Building a DSL with GraalVM (VoxxedDays Luxembourg)
Building a DSL with GraalVM (VoxxedDays Luxembourg)
 
Druinsky_SIAMCSE15
Druinsky_SIAMCSE15Druinsky_SIAMCSE15
Druinsky_SIAMCSE15
 
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The CloudMongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
MongoDB World 2019: Event Horizon: Meet Albert Einstein As You Move To The Cloud
 
Concurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdfConcurrency in Go by Denys Goldiner.pdf
Concurrency in Go by Denys Goldiner.pdf
 
ktruss-short
ktruss-shortktruss-short
ktruss-short
 
Pygrib documentation
Pygrib documentationPygrib documentation
Pygrib documentation
 
Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)Building a DSL with GraalVM (CodeOne)
Building a DSL with GraalVM (CodeOne)
 
Using PyPy instead of Python for speed
Using PyPy instead of Python for speedUsing PyPy instead of Python for speed
Using PyPy instead of Python for speed
 
Python Coding Examples for Drive Time Analysis
Python Coding Examples for Drive Time AnalysisPython Coding Examples for Drive Time Analysis
Python Coding Examples for Drive Time Analysis
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Raspberry pi a la cfml
Raspberry pi a la cfmlRaspberry pi a la cfml
Raspberry pi a la cfml
 
時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.sts時系列データと確率的プログラミング tfp.sts
時系列データと確率的プログラミング tfp.sts
 
深層学習とベイズ統計
深層学習とベイズ統計深層学習とベイズ統計
深層学習とベイズ統計
 
How to extract information from text with Semgrex
How to extract information from text with SemgrexHow to extract information from text with Semgrex
How to extract information from text with Semgrex
 
ベイジアンディープニューラルネット
ベイジアンディープニューラルネットベイジアンディープニューラルネット
ベイジアンディープニューラルネット
 
Cryptography : From Demaratus to RSA
Cryptography : From Demaratus to RSACryptography : From Demaratus to RSA
Cryptography : From Demaratus to RSA
 
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム
静的型つき組版処理システムSATySFi @第61回プログラミング・シンポジウム
 
PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8PyHEP 2019: Python 3.8
PyHEP 2019: Python 3.8
 
Presentation template - jupyter2slides
Presentation template - jupyter2slidesPresentation template - jupyter2slides
Presentation template - jupyter2slides
 

Viewers also liked

Dualacy chap 2 fall
Dualacy chap 2 fallDualacy chap 2 fall
Dualacy chap 2 fallSimpony
 
Spring 3 smithe
Spring 3 smitheSpring 3 smithe
Spring 3 smitheSimpony
 
Summer 1
Summer 1Summer 1
Summer 1Simpony
 
CMC Teacher Education SIG Presentation; Mitchell
CMC Teacher Education SIG Presentation; MitchellCMC Teacher Education SIG Presentation; Mitchell
CMC Teacher Education SIG Presentation; MitchellCmcTchrEdSIG
 
Fire safetyinseniorapartments
Fire safetyinseniorapartmentsFire safetyinseniorapartments
Fire safetyinseniorapartmentsmarjellotti
 
Why Outsource Business Issues
Why Outsource Business IssuesWhy Outsource Business Issues
Why Outsource Business Issuesbrooklynclasby
 
Présentation Olive d&rsquo;Or by SIAL
Présentation Olive d&rsquo;Or by SIALPrésentation Olive d&rsquo;Or by SIAL
Présentation Olive d&rsquo;Or by SIALSial Canada
 
CMC Teacher Education SIG Presentation; Guichon
CMC Teacher Education SIG Presentation; GuichonCMC Teacher Education SIG Presentation; Guichon
CMC Teacher Education SIG Presentation; GuichonCmcTchrEdSIG
 
CMC Teacher Education SIG Presentation; Antoniadou
CMC Teacher Education SIG Presentation; AntoniadouCMC Teacher Education SIG Presentation; Antoniadou
CMC Teacher Education SIG Presentation; AntoniadouCmcTchrEdSIG
 
Winter 1 vega
Winter 1 vegaWinter 1 vega
Winter 1 vegaSimpony
 
Lua London Meetup 2013
Lua London Meetup 2013Lua London Meetup 2013
Lua London Meetup 2013Cloudflare
 
Pregnancy four
Pregnancy fourPregnancy four
Pregnancy fourSimpony
 
Dualacy chap 1 fresh ss
Dualacy chap 1 fresh ssDualacy chap 1 fresh ss
Dualacy chap 1 fresh ssSimpony
 
Spring 2 Smithe
Spring 2 SmitheSpring 2 Smithe
Spring 2 SmitheSimpony
 
rowell resperatory system
rowell resperatory systemrowell resperatory system
rowell resperatory systemrowellcabate
 
Spring 3 goodie
Spring 3 goodieSpring 3 goodie
Spring 3 goodieSimpony
 
Spring 3 vega
Spring 3 vegaSpring 3 vega
Spring 3 vegaSimpony
 
Fall 2 smithe
Fall 2 smitheFall 2 smithe
Fall 2 smitheSimpony
 
CMC Teacher Education SIG Presentation; Lomicka
CMC Teacher Education SIG Presentation; LomickaCMC Teacher Education SIG Presentation; Lomicka
CMC Teacher Education SIG Presentation; LomickaCmcTchrEdSIG
 

Viewers also liked (20)

Dualacy chap 2 fall
Dualacy chap 2 fallDualacy chap 2 fall
Dualacy chap 2 fall
 
Spring 3 smithe
Spring 3 smitheSpring 3 smithe
Spring 3 smithe
 
Summer 1
Summer 1Summer 1
Summer 1
 
CMC Teacher Education SIG Presentation; Mitchell
CMC Teacher Education SIG Presentation; MitchellCMC Teacher Education SIG Presentation; Mitchell
CMC Teacher Education SIG Presentation; Mitchell
 
Fire safetyinseniorapartments
Fire safetyinseniorapartmentsFire safetyinseniorapartments
Fire safetyinseniorapartments
 
Why Outsource Business Issues
Why Outsource Business IssuesWhy Outsource Business Issues
Why Outsource Business Issues
 
Présentation Olive d&rsquo;Or by SIAL
Présentation Olive d&rsquo;Or by SIALPrésentation Olive d&rsquo;Or by SIAL
Présentation Olive d&rsquo;Or by SIAL
 
CMC Teacher Education SIG Presentation; Guichon
CMC Teacher Education SIG Presentation; GuichonCMC Teacher Education SIG Presentation; Guichon
CMC Teacher Education SIG Presentation; Guichon
 
CMC Teacher Education SIG Presentation; Antoniadou
CMC Teacher Education SIG Presentation; AntoniadouCMC Teacher Education SIG Presentation; Antoniadou
CMC Teacher Education SIG Presentation; Antoniadou
 
Winter 1 vega
Winter 1 vegaWinter 1 vega
Winter 1 vega
 
Lua London Meetup 2013
Lua London Meetup 2013Lua London Meetup 2013
Lua London Meetup 2013
 
Bio Lab Paper
Bio Lab PaperBio Lab Paper
Bio Lab Paper
 
Pregnancy four
Pregnancy fourPregnancy four
Pregnancy four
 
Dualacy chap 1 fresh ss
Dualacy chap 1 fresh ssDualacy chap 1 fresh ss
Dualacy chap 1 fresh ss
 
Spring 2 Smithe
Spring 2 SmitheSpring 2 Smithe
Spring 2 Smithe
 
rowell resperatory system
rowell resperatory systemrowell resperatory system
rowell resperatory system
 
Spring 3 goodie
Spring 3 goodieSpring 3 goodie
Spring 3 goodie
 
Spring 3 vega
Spring 3 vegaSpring 3 vega
Spring 3 vega
 
Fall 2 smithe
Fall 2 smitheFall 2 smithe
Fall 2 smithe
 
CMC Teacher Education SIG Presentation; Lomicka
CMC Teacher Education SIG Presentation; LomickaCMC Teacher Education SIG Presentation; Lomicka
CMC Teacher Education SIG Presentation; Lomicka
 

Similar to Cloud flare jgc bigo meetup rolling hashes

An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to GoCloudflare
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Sperasoft
 
Code and memory optimization tricks
Code and memory optimization tricksCode and memory optimization tricks
Code and memory optimization tricksDevGAMM Conference
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against itEvgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against itSergey Platonov
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go ProgrammingLin Yo-An
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015Chris Fregly
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksDavid Evans
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTWsunnygleason
 
Go Concurrency
Go ConcurrencyGo Concurrency
Go Concurrencyjgrahamc
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScaleGoDataDriven
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyNETWAYS
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 

Similar to Cloud flare jgc bigo meetup rolling hashes (20)

An Introduction to Go
An Introduction to GoAn Introduction to Go
An Introduction to Go
 
Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks Code and Memory Optimisation Tricks
Code and Memory Optimisation Tricks
 
Code and memory optimization tricks
Code and memory optimization tricksCode and memory optimization tricks
Code and memory optimization tricks
 
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against itEvgeniy Muralev, Mark Vince, Working with the compiler, not against it
Evgeniy Muralev, Mark Vince, Working with the compiler, not against it
 
Happy Go Programming
Happy Go ProgrammingHappy Go Programming
Happy Go Programming
 
Modern C++
Modern C++Modern C++
Modern C++
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and TasksSegmentation Faults, Page Faults, Processes, Threads, and Tasks
Segmentation Faults, Page Faults, Processes, Threads, and Tasks
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Hash Functions FTW
Hash Functions FTWHash Functions FTW
Hash Functions FTW
 
Go Concurrency
Go ConcurrencyGo Concurrency
Go Concurrency
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
PyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at ScalePyData Amsterdam - Name Matching at Scale
PyData Amsterdam - Name Matching at Scale
 
OSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross LawleyOSDC 2012 | Scaling with MongoDB by Ross Lawley
OSDC 2012 | Scaling with MongoDB by Ross Lawley
 
Serial-War
Serial-WarSerial-War
Serial-War
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 

More from Cloudflare

Succeeding with Secure Access Service Edge (SASE)
Succeeding with Secure Access Service Edge (SASE)Succeeding with Secure Access Service Edge (SASE)
Succeeding with Secure Access Service Edge (SASE)Cloudflare
 
Close your security gaps and get 100% of your traffic protected with Cloudflare
Close your security gaps and get 100% of your traffic protected with CloudflareClose your security gaps and get 100% of your traffic protected with Cloudflare
Close your security gaps and get 100% of your traffic protected with CloudflareCloudflare
 
Why you should replace your d do s hardware appliance
Why you should replace your d do s hardware applianceWhy you should replace your d do s hardware appliance
Why you should replace your d do s hardware applianceCloudflare
 
Don't Let Bots Ruin Your Holiday Business - Snackable Webinar
Don't Let Bots Ruin Your Holiday Business - Snackable WebinarDon't Let Bots Ruin Your Holiday Business - Snackable Webinar
Don't Let Bots Ruin Your Holiday Business - Snackable WebinarCloudflare
 
Why Zero Trust Architecture Will Become the New Normal in 2021
Why Zero Trust Architecture Will Become the New Normal in 2021Why Zero Trust Architecture Will Become the New Normal in 2021
Why Zero Trust Architecture Will Become the New Normal in 2021Cloudflare
 
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...Cloudflare
 
Zero trust for everybody: 3 ways to get there fast
Zero trust for everybody: 3 ways to get there fastZero trust for everybody: 3 ways to get there fast
Zero trust for everybody: 3 ways to get there fastCloudflare
 
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...Cloudflare
 
Network Transformation: What it is, and how it’s helping companies stay secur...
Network Transformation: What it is, and how it’s helping companies stay secur...Network Transformation: What it is, and how it’s helping companies stay secur...
Network Transformation: What it is, and how it’s helping companies stay secur...Cloudflare
 
Scaling service provider business with DDoS-mitigation-as-a-service
Scaling service provider business with DDoS-mitigation-as-a-serviceScaling service provider business with DDoS-mitigation-as-a-service
Scaling service provider business with DDoS-mitigation-as-a-serviceCloudflare
 
Application layer attack trends through the lens of Cloudflare data
Application layer attack trends through the lens of Cloudflare dataApplication layer attack trends through the lens of Cloudflare data
Application layer attack trends through the lens of Cloudflare dataCloudflare
 
Recent DDoS attack trends, and how you should respond
Recent DDoS attack trends, and how you should respondRecent DDoS attack trends, and how you should respond
Recent DDoS attack trends, and how you should respondCloudflare
 
Cybersecurity 2020 threat landscape and its implications (AMER)
Cybersecurity 2020 threat landscape and its implications (AMER)Cybersecurity 2020 threat landscape and its implications (AMER)
Cybersecurity 2020 threat landscape and its implications (AMER)Cloudflare
 
Strengthening security posture for modern-age SaaS providers
Strengthening security posture for modern-age SaaS providersStrengthening security posture for modern-age SaaS providers
Strengthening security posture for modern-age SaaS providersCloudflare
 
Kentik and Cloudflare Partner to Mitigate Advanced DDoS Attacks
Kentik and Cloudflare Partner to Mitigate Advanced DDoS AttacksKentik and Cloudflare Partner to Mitigate Advanced DDoS Attacks
Kentik and Cloudflare Partner to Mitigate Advanced DDoS AttacksCloudflare
 
Stopping DDoS Attacks in North America
Stopping DDoS Attacks in North AmericaStopping DDoS Attacks in North America
Stopping DDoS Attacks in North AmericaCloudflare
 
It’s 9AM... Do you know what’s happening on your network?
It’s 9AM... Do you know what’s happening on your network?It’s 9AM... Do you know what’s happening on your network?
It’s 9AM... Do you know what’s happening on your network?Cloudflare
 
Cyber security fundamentals (simplified chinese)
Cyber security fundamentals (simplified chinese)Cyber security fundamentals (simplified chinese)
Cyber security fundamentals (simplified chinese)Cloudflare
 
Bring speed and security to the intranet with cloudflare for teams
Bring speed and security to the intranet with cloudflare for teamsBring speed and security to the intranet with cloudflare for teams
Bring speed and security to the intranet with cloudflare for teamsCloudflare
 
Accelerate your digital transformation
Accelerate your digital transformationAccelerate your digital transformation
Accelerate your digital transformationCloudflare
 

More from Cloudflare (20)

Succeeding with Secure Access Service Edge (SASE)
Succeeding with Secure Access Service Edge (SASE)Succeeding with Secure Access Service Edge (SASE)
Succeeding with Secure Access Service Edge (SASE)
 
Close your security gaps and get 100% of your traffic protected with Cloudflare
Close your security gaps and get 100% of your traffic protected with CloudflareClose your security gaps and get 100% of your traffic protected with Cloudflare
Close your security gaps and get 100% of your traffic protected with Cloudflare
 
Why you should replace your d do s hardware appliance
Why you should replace your d do s hardware applianceWhy you should replace your d do s hardware appliance
Why you should replace your d do s hardware appliance
 
Don't Let Bots Ruin Your Holiday Business - Snackable Webinar
Don't Let Bots Ruin Your Holiday Business - Snackable WebinarDon't Let Bots Ruin Your Holiday Business - Snackable Webinar
Don't Let Bots Ruin Your Holiday Business - Snackable Webinar
 
Why Zero Trust Architecture Will Become the New Normal in 2021
Why Zero Trust Architecture Will Become the New Normal in 2021Why Zero Trust Architecture Will Become the New Normal in 2021
Why Zero Trust Architecture Will Become the New Normal in 2021
 
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...
HARTMANN and Cloudflare Learn how healthcare providers can build resilient in...
 
Zero trust for everybody: 3 ways to get there fast
Zero trust for everybody: 3 ways to get there fastZero trust for everybody: 3 ways to get there fast
Zero trust for everybody: 3 ways to get there fast
 
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...
LendingTree and Cloudflare: Ensuring zero trade-off between security and cust...
 
Network Transformation: What it is, and how it’s helping companies stay secur...
Network Transformation: What it is, and how it’s helping companies stay secur...Network Transformation: What it is, and how it’s helping companies stay secur...
Network Transformation: What it is, and how it’s helping companies stay secur...
 
Scaling service provider business with DDoS-mitigation-as-a-service
Scaling service provider business with DDoS-mitigation-as-a-serviceScaling service provider business with DDoS-mitigation-as-a-service
Scaling service provider business with DDoS-mitigation-as-a-service
 
Application layer attack trends through the lens of Cloudflare data
Application layer attack trends through the lens of Cloudflare dataApplication layer attack trends through the lens of Cloudflare data
Application layer attack trends through the lens of Cloudflare data
 
Recent DDoS attack trends, and how you should respond
Recent DDoS attack trends, and how you should respondRecent DDoS attack trends, and how you should respond
Recent DDoS attack trends, and how you should respond
 
Cybersecurity 2020 threat landscape and its implications (AMER)
Cybersecurity 2020 threat landscape and its implications (AMER)Cybersecurity 2020 threat landscape and its implications (AMER)
Cybersecurity 2020 threat landscape and its implications (AMER)
 
Strengthening security posture for modern-age SaaS providers
Strengthening security posture for modern-age SaaS providersStrengthening security posture for modern-age SaaS providers
Strengthening security posture for modern-age SaaS providers
 
Kentik and Cloudflare Partner to Mitigate Advanced DDoS Attacks
Kentik and Cloudflare Partner to Mitigate Advanced DDoS AttacksKentik and Cloudflare Partner to Mitigate Advanced DDoS Attacks
Kentik and Cloudflare Partner to Mitigate Advanced DDoS Attacks
 
Stopping DDoS Attacks in North America
Stopping DDoS Attacks in North AmericaStopping DDoS Attacks in North America
Stopping DDoS Attacks in North America
 
It’s 9AM... Do you know what’s happening on your network?
It’s 9AM... Do you know what’s happening on your network?It’s 9AM... Do you know what’s happening on your network?
It’s 9AM... Do you know what’s happening on your network?
 
Cyber security fundamentals (simplified chinese)
Cyber security fundamentals (simplified chinese)Cyber security fundamentals (simplified chinese)
Cyber security fundamentals (simplified chinese)
 
Bring speed and security to the intranet with cloudflare for teams
Bring speed and security to the intranet with cloudflare for teamsBring speed and security to the intranet with cloudflare for teams
Bring speed and security to the intranet with cloudflare for teams
 
Accelerate your digital transformation
Accelerate your digital transformationAccelerate your digital transformation
Accelerate your digital transformation
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Cloud flare jgc bigo meetup rolling hashes

  • 1. Rolling and Extendable Hashes December 18, 2013 John Graham-Cumming www.cloudflare.com!
  • 3. Cyclic Redundancy Check •  Widely used as hash function •  e.g. common to hash keys uses a CRC for load balancing •  memcached, nginx, ... •  crc(key) % #servers •  Not designed as a hash function! •  Choice of polynomial can change collisions in real world •  CRCs have no internal state and they are fast •  Shifts, XOR, small table lookup www.cloudflare.com!
  • 4. Extending a CRC Compute the hash (a CRC) by extending the string to the right h( ) S h( ) S h( ) S h( ) S Typical CRC allows this because when adding a bit/byte/ word the CRC is calculated from the previous CRC and the added data. www.cloudflare.com!
  • 5. CRC64 •  Typical implementation (t is lookup table) BIT64 crc = 0;! ! while ( *s ) {! BYTE c = *s++;! crc = (crc >> 8) ^ t[(crc & 0xff) ^ c];! }! •  Some common polynomials: 64 4 3 •  CRC64-ISO (0x800000000000000D) x +x +x +x +1 •  CRC64-ECMA (0xA17870F5D4F51B49) x 64 +x 63+ x 61 + x 59 + x 56 + x 55 + x 52 + x 49 + x 48 + x 47 + x 46 + x 44 + x 41 + x 37 + x 36 + x 34 + x 32 + x 31 + x 28 + x 26 + x 23 + x 22 + x19 + x16 + x13 + x12 + x10 + x 8 + x 7 + x 5 + x 3 + x1 www.cloudflare.com!
  • 6. One use: spam filter tokens •  Common to see in open source spam filters From: Prince of Nigeria! Subject: I beg pardon for interrupting your fine day! ! [...]! ! •  Generates tokens: from:prince from:nigeria subject:pardon subject:interrupting! subject:fine subject:day! •  And from the message body body:html:font:red body:html:image:size:0! •  Repeated fixed parts (from:, subject:, html:image:size:) can be CRCed once www.cloudflare.com!
  • 8. Rabin/Karp String Searching “Efficient randomized pattern-matching algorithms” Rabin/Karp, IBM Journal of Research and Development March 1987 h( ) P h( ) S Compute and compare len(S)-len(P)+2 hashes When hash matches verify that substrings actually match www.cloudflare.com!
  • 9. Two Expensive Operations •  The h() function itself •  Need a very fast hash algorithm •  Average complexity is O(len(S)) •  Substring comparison on hash matches •  Need minimal hash collisions •  Worst case complexity is O(len(S) * len(P)) •  Trivial to make work with multiple patterns •  Just compute their hashes •  Use Bloom filter (or other hash table) to lookup patterns for each hash calculated •  Average complexity is O(len(S) + #patterns) www.cloudflare.com!
  • 10. Rabin/Karp Rolling Hash •  Basic on arithmetic in the ring Zp where p is prime •  Treats a substring s0 .. sn as a number in a chosen base b. n h(s0 …sn ) = ∑ si b n−i mod p i=0 •  This hash can be updated when sliding across a string www.cloudflare.com!
  • 11. Rabin/Karp Efficient Update h(s1 …sn+1 ) n+1 = ∑ si b n+1−i mod p i=1 n+1 = b∑ si b n−i mod p i=1 Previous hash # n & n−i n = % b∑ si b − s0 b + sn+1 ( mod p $ i=0 ' = ( bh(s0 …sn−1 ) − s0 b n + sn+1 ) mod p ( ) = b ( h(s0 …sn−1 ) − s0 b n−1 ) + sn+1 mod p www.cloudflare.com! Precomputed
  • 12. Rabin/Karp base and modulus values •  Use a prime base •  When operating on bytes a suitable b is 257 •  Choose a prime modulus closest to the desired hash output size •  e.g. for 32-bit hash value use 4294967291 •  Various tricks can speed up the calculation •  Don’t calculate bn-1, calculate bn-1 mod p instead •  Choose a p such that bp fits in a convenient bit size •  Memoize x (bn-1 mod p) for all x byte values •  Choose p is a power of two (instead of a prime) can use & instead of % •  Result: update requires one *, one +, one –, one & www.cloudflare.com!
  • 13. Bentley/McIlroy Compression "Data Compression Using Long Common Strings” Jon Bentley, Douglas McIlroy Proceedings of the IEEE Data Compression Conference, 1999 “Dictionary” to compress against (could be self) Broken into blocks, each block hashed h0 h( h1 h2 ) h3 h4 h5 S Slide Rabin/Karp hash across string to compress Look for matches in dictionary Extend matches backwards block size bytes, forwards indefinitely O(len(S)) www.cloudflare.com!
  • 14. Never gonna give you up We're no strangers to love! You know the rules and so do I! A full commitment's what I'm thinking of! You wouldn't get this from any other guy! I just wanna tell you how I'm feeling! Gotta make you understand! Never gonna give you up<204,13>let you down<204,13>run around<45,5>desert you<20! 4,13><184,9>cry<204,13>say goodbye<204,13>tell a lie<45,5>hu<285,7>! We've<30,5>n each<129,7>for so long! Your heart's been aching but! You're too shy to say it! Inside we both<30,6>wha<422,9>going on! We<30,10>game<45,5>we're<210,7>play it! And if you ask me<161,17>Don't tell m<220,5>'re too blind to see! <340,13><217,161><229,12><634,161>(Ooh,<633,12>)<967,24>)<340,13>give, n<230,11>! give! (G<635,10><1004,57><377,11><389,160><139,239><229,12><634,333>! •  Open source implementation from CloudFlare in Go •  Bentley/McIlroy then gzip very effective www.cloudflare.com!
  • 16. RSYNC hashing "Efficient Algorithms for Sorting and Synchronization” Andrew Tridgell Doctoral Thesis, February 1999 File on remote machine Broken into blocks, each block hashed twice (rolling hash plus MD5) Hash sent to local machine h0 h( h1 h2 ) h3 h4 h5 S Slide rolling hash across source file looking for match Check MD5 hash to eliminate rolling hash collisions Send bytes form S to remote host, or token saying which hash matched www.cloudflare.com!
  • 17. The RSYNC rolling hash •  Block size is b, offset into file is k, the file bytes are s, M is an arbitrary modulus # b−1 & r1 (k, b) = % ∑ ai+k ( mod M $ i=0 ' # b−1 & r2 (k, b) = % ∑ (b − i)ai+k ( mod M $ i=0 ' r(k, b) = r1 (k, b) + Mr2 (k, b) •  In practice, M = 216 www.cloudflare.com!
  • 18. Hash updates are fast r1 (k +1, b) = ( r1 (k, b) − ak + ak+b ) mod M r2 (k +1, b) = ( r2 (k, b) − bak + r1 (k +1, b)) mod M (* proof is left as an exercise for the reader) www.cloudflare.com!