SlideShare a Scribd company logo
1 of 51
Failing Fast with Redis 
backed BloomFilters 
• Christopher Curtin 
• Head of Technical 
Research 
• @ChrisCurtin
About Me 
 25+ years in technology 
 Head of Technical Research at Silverpop, an IBM Company (14 + years at 
Silverpop) 
 Built a SaaS platform before the term ‘SaaS’ was being used 
 Prior to Silverpop: real-time control systems, factory automation and 
warehouse management 
 Always looking for technologies and algorithms to help with our 
challenges
Silverpop Open Positions 
 Technical Lead 
 Senior Engineer 
 Architect 
 Automation Engineers
Agenda 
 Redis 
 Bloom Filters 
 Failing Fast
Agenda 
 Redis 
 What it is 
 Why we started looking at using it 
 Basics 
 Concurrency 
 Operational Considerations 
 Challenges
Redis – What is it? 
From redis.io: 
"Redis is an open source, BSD licensed, advanced key-value cache and store. 
It is often referred to as a data structure server since keys can contain strings, 
hashes, lists, sets, sorted sets, bitmaps and hyperloglogs."
Hyper-what-what? 
HyperLogLog 
Approximation technique for counting distinct entries in a set. 
Very small memory footprint for rough approximations (16 kb for 99% 
accuracy) 
Nice – but too much loss for what we need
Features 
• Unlike typical key-value stores, you can send commands to edit the 
value on the server vs. reading back to the client, updating and 
pushing to the server 
• pub/sub 
•TTL on keys 
•Clustering and automatic fail-over 
•Lua scripting 
•client libraries for just about any language you can think of
So Why did we start looking at NoSQL? 
“For the cost of an Oracle Enterprise license I can give you 64 cores 
and 3 TB of memory”
Redis Basics 
 In Memory-only key-value store 
 Single Threaded. Yes, Single Threaded 
 No Paging, no reading from disk 
 CS 101 data structures and operations 
 10's of millions of keys isn't a big deal 
 How much RAM defines how big the store can get
Basic DataTypes 
 String 
 Hashes 
 Lists 
 Sets and Sorted Sets 
CS 101 ...
Hashes 
Hashes 
- collection of key-value pairs with a single name 
- useful for storing data under a common name 
- values can only be strings or numeric. No hash of lists 
http://redis.io/commands/hget
Sets and Sorted Sets 
 Buckets of values with very fast membership look-up 
 No duplicates allowed 
 Sorted Sets have scores to make them sortable 
– Automatically keeps them in order for fast 'top x' look ups 
http://redis.io/commands/zadd 
http://redis.io/commands/zrange
Lists 
 Most interesting due to how operations are applied to the remote 
store 
 Unbounded (except by memory) 
 Atomic operations between lists (pop from one, push to another) 
 CS 101: lpush, rpush, lpop, range etc. 
 Advanced: blocking pops 
Http://redis.io/commands/rpush 
http://redis.io/commands/rpoplpush
Concurrency 
 Single threaded 
 Each operation can work on one or two keys, atomically 
 Pipelines allow execution of commands in sequence in a single 
server request (Redis will only execute the pipeline) 
 Pipelines do not allow for logic between commands 
 LUA Scripts allow for logic between commands 
 BE CAREFUL with LUA, scripts block all clients!
Pipeline Java Example 
 BloomFilterRedis.java line 43
Lua Example 
 Lua-scripts example
Operational Information 
 Persistence can be 'none', journal (AOF) or point in time (RDB) 
 Optional Master/Slave replication 
 Home-grown HA platform (Sentinel) 
 Common deployment model is lots of instances per machine 
 Millions of keys gets hard to manage – build 'directory' hashes to 
make it easier for operations to find keys to look at
Challenges with Redis 
 Key Explosion – single name space 
 LUA scripts can block all others users 
 Pipelines can block all other users 
 No nested data types (I want a hash of lists!) 
 Without name spaces be cautious of how you define key names
Concurrency Demo – JMS replacement 
 Client submits a request to the queue (LPUSH) 
 Consumer application polls for work when worker is available 
(RPOPLPUSH) 
 Worker executes the task assigned to it 
 When worker is done, its list is removed 
 Lather, Rinse, Repeat 
 (We provide a hash of workers for Operations to query for 
monitoring)
Agenda 
 Bloomfilters 
 What they are 
 Why we started looking at using them 
 Basics 
 False Positives 
 Example Uses 
 Why not do this in a database?
Bloom Filters 
From WikiPedia (Don't tell my kid's teacher!) 
"A Bloom filter is a space-efficient probabilistic data structure, 
conceived by Burton Howard Bloom in 1970, that is used to test 
whether an element is a member of a set. False positive matches are 
possible, but false negatives are not, thus a Bloom filter has a 100% 
recall rate"
Hashing 
 Apply 'x' hash functions to the key to be stored/queried 
 Each function returns a bit to set in the bitset 
 Mathematical equations to determine how big to make the bitset, 
how many functions to use and your acceptable error level 
 http://hur.st/bloomfilter?n=4&p=1.0E-20
Example
False Positives 
 Perfect hash functions aren't worth the cost to develop 
 Sometimes existing bits for a key are set by many other keys 
 Make sure you understand the business impact of a false positive 
 Remember, never a false negative
Creation 
 Libraries are available for every language I looked up (even 
JavaScript) 
 Some are built in memory, for a single process/JVM to use 
 Read-only (ad networks) are built using Hadoop and loaded into 
memory 
 In memory is great for lots of reads, single process/JVM etc. 
 But ...
Updates 
 Updating a 16 MB structure in memory and persisting to disk is 
expensive 
 8 bits change and you write 16 MB!!!!!! (DBAs will love you …)
Deletes 
 Not possible in a regular Bloom Filter – how would you know what bits 
are used by other keys? 
 Counting BloomFilters keep a few bits (3-4) per bit in the bitmap as a 
counter. 'delete' decrements the key 
 Not as space friendly any more … 
 Instead, consider having bloom filters based around the lifetime of the 
data to be queried 
– For a filter 'visited in the last 4 hours' have 4 filters and age the oldest 
out (TTL in Redis maybe ...)
Issue: Persistence 
 Load a 16 MB filter from database to check 6 bits? 
 Worse: update 6 bits in a 16 MB filter 
 DBAs will not be happy 
– Undo/redo 
– SGA misses, page faults 
– Backups, replication traffic etc.
Why were we interested in Bloom Filters? 
 Found a lot of places we went to the database to find the data 
didn't exist 
 Found lots of places where we want to know if a user DIDN'T do 
something
Persistent Bloom Filters 
 We needed persistent Bloom Filters for lots of user stories 
 Found Orestes-BloomFilter on GitHub that used Redis as a store 
and enhanced it 
 Added population filters 
 Fixed a few bugs 
 Did a pull request and it was accepted!
Benefits 
 Filters are stored in Redis 
• Only bitset/bitget calls to server 
 Reads and updates of the filter from set of application servers 
 Persistence has a cost, but a fraction of the RDBMS costs 
 Can load a BF created offline and begin using it
Remember “For the cost of an Oracle License” 
 Thousands of filters 
 Dozens of Redis instances 
 TTL on a Redis key makes cleanup of old filters trivial
Population Bloom Filters 
 Unique need we had 
 Users access the system frequently, but I really only need to count 
them once per month for billing 
 10's of Thousands of clients, Finance wants monthly report in 
seconds 
 Logic is simple: if any bits weren't set for the key (user id), 
increment the counter 
 Note: there are mathematical methods of estimating a BF 
population but we needed better error rate
Example Uses of Bloom Filters 
 Webcache – what URLs are already in the cache on another 
server? 
 P2P networks – what node contains which part of the file? 
 Databases 
– Do keys exist in this page? If not, don't load the page 
– Hbase uses them to detect which blocks do not have the data (HDFS is write-once) 
– Many RDBMS use them internally to 'fail fast' and not load pages into memory 
– Sadly, no RDBMS or NoSQL I know of offers them as user data types
Example Uses of Bloom Filters 
 Ad networks (old way ...) 
– Big Hadoop job hourly/nightly to determine which ads to show 
based on prior behavior 
– Load the filter into a common storage (disk usually) 
– Ad servers load all the filters into memory and query for your 
cookie id to see what to show you
Examples of Redis-backed BloomFilters 
 Has the user be here this month? If not show them a Message. False 
positive doesn't matter 
 White vs. Black list for IP 
– Known bad IP in the filter 
– Upon login check the filter. Not found, login. Found – check DB to 
validate bad IP. 
– False Positive will lead to query that returns false, but should be rare 
• Ad Networks (real time BF updates based on what you searched on)
Client side Joins 
 Most NoSQL don't support joins 
 Architecture may have data across multiple stores 
 Keep a Population Bloom Filter by day of unique users in a data 
source 
 When needing to join, load smallest data source as the driver and 
query other sources in order of size 
 If queries are time based and filters are available for the time, 
looking up key matches can be very fast
Agenda 
 Fail Fast 
 What it is 
 Redis-backed BloomFilters 
 Examples
Fail Fast 
 The ability to quickly know to NOT do something expensive 
 Example: Black-list of IPs 
 Think about ways to NOT do some work 
 Cost of Redis servers is much less than an RDBMS license or the 
cost of a good DB server with storage!
Hammer Time
Be careful 
 Sometimes the cost of building and maintaining the structures 
outweighs the benefit 
 Convoluted designs to avoid the database 
 Collect Metrics on 'hits' to see if they are any benefit (CodaHale)
Example (naive) 
 Build a BF for ads shown to a user (hash on user id and ad id) 
 When the user visits, hash their user id and the top ad to display 
this hour and set the bits in the BF 
 If any were not set, the Population count is incremented and you 
display the ad 
 If already set, move to the next most important ad. 
 Now know total unique views by ad by hour 
 Can do total gross with a Redis Hash too!
Example – smarter 
 Hash the top 10 ad ids to the user id and parallel request (Pipeline) 
 Check the return to see which ones aren't set, submit an update 
request and set the population 
 2 round trips to check 10 ads. 
 (Can also do this in LUA in 1 round trip)
Example – part 2 
 Same idea as before, but build the bloom filter for each hour 
 When user visits, query last 6 filters in parallel (pipeline!) to see if 
they've seen the ad(s). 
 Redis TTL on the hourly filter will drop it automatically when it 
becomes too old
Example 3 
 Collect lots of data about users (such as virtual cows, farm land, 
chickens etc.) 
 Run a predictive model on the data and identify which special 
offers to show the user visits again. Store user ids in a Bloom Filter 
 Load the BF into Redis 
 Query each time the user logs in and display appropriate offer 
 No massive database insert/updates to flag who should see it 
 False positive isn't too bad
Example 4 – Query optimization 
 Client-side joins 
 Ask the Bloom Filter if the user has performed the action (filters 
for hour, day, week of year etc.) 
 If not, don't even call the data source 
 May need to read some extra data due to 'in the last 11 days' but 
asking the BF and being told 'no' prevents ANY data source 
resources to be used 
 What if the BF is lost? Rebuild it from the base events (Hadoop!)
Conclusion 
 Redis is a very fast, very simple and very powerful name value 
store “Data structure server” 
 Bloom Filters have lots of applications when you want to quickly 
look up if one of millions of 'things' happened 
 Redis-backed BloomFilters make updatable bloom filters trivial to 
use 
 Think about what you need to know to NOT do an expensive 
operation 
 Fail fast
References 
 Redis.io 
 http://en.wikipedia.org/wiki/Bloom_filter 
 http://hur.st/bloomfilter?n=4&p=1.0E-20 
 https://github.com/Baqend/Orestes-Bloomfilter 
 http://www.slideshare.net/chriscurtin 
 @ChrisCurtin on twitter 
 Github.com/chriscurtin
Questions?

More Related Content

What's hot

Big Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneBig Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneDouglas Moore
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010Christopher Curtin
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for HadoopHadoop User Group
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21Hadoop User Group
 
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganCh-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganMongoDB
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at MendeleyDan Harvey
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream DataDataWorks Summit
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to dfMohit Jaggi
 
2011 march cloud computing atlanta
2011 march cloud computing atlanta2011 march cloud computing atlanta
2011 march cloud computing atlantaChristopher Curtin
 

What's hot (20)

Big Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIneBig Data Anti-Patterns: Lessons From the Front LIne
Big Data Anti-Patterns: Lessons From the Front LIne
 
NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010NoSQL, Hadoop, Cascading June 2010
NoSQL, Hadoop, Cascading June 2010
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for Hadoop
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew MorganCh-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
Ch-ch-ch-ch-changes....Stitch Triggers - Andrew Morgan
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at Mendeley
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
Visual Mapping of Clickstream Data
Visual Mapping of Clickstream DataVisual Mapping of Clickstream Data
Visual Mapping of Clickstream Data
 
Final deck
Final deckFinal deck
Final deck
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
Dataiku big data paris - the rise of the hadoop ecosystem
Dataiku   big data paris - the rise of the hadoop ecosystemDataiku   big data paris - the rise of the hadoop ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Introduction to df
Introduction to dfIntroduction to df
Introduction to df
 
2011 march cloud computing atlanta
2011 march cloud computing atlanta2011 march cloud computing atlanta
2011 march cloud computing atlanta
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
 

Similar to Redis and Bloom Filters - Atlanta Java Users Group 9/2014

Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practiceswebuploader
 
Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesJohn Coggeshall
 
Top 10 Scalability Mistakes
Top 10 Scalability MistakesTop 10 Scalability Mistakes
Top 10 Scalability MistakesJohn Coggeshall
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseTim Vaillancourt
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceAshok Modi
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGLucidworks
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platformLuis Cabaceira
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack OptimizationDave Ross
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience ReportNetcetera
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkMukesh Singh
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysqlliufabin 66688
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Altan Khendup
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 

Similar to Redis and Bloom Filters - Atlanta Java Users Group 9/2014 (20)

Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practices
 
Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 Mistakes
 
Top 10 Scalability Mistakes
Top 10 Scalability MistakesTop 10 Scalability Mistakes
Top 10 Scalability Mistakes
 
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce DatabaseBlack Friday and Cyber Monday- Best Practices for Your E-Commerce Database
Black Friday and Cyber Monday- Best Practices for Your E-Commerce Database
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
DrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performanceDrupalCampLA 2011: Drupal backend-performance
DrupalCampLA 2011: Drupal backend-performance
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
 
Sizing your alfresco platform
Sizing your alfresco platformSizing your alfresco platform
Sizing your alfresco platform
 
Lamp Stack Optimization
Lamp Stack OptimizationLamp Stack Optimization
Lamp Stack Optimization
 
Apache Solr - An Experience Report
Apache Solr - An Experience ReportApache Solr - An Experience Report
Apache Solr - An Experience Report
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
High Performance Mysql
High Performance MysqlHigh Performance Mysql
High Performance Mysql
 
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 

Recently uploaded

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 

Recently uploaded (20)

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 

Redis and Bloom Filters - Atlanta Java Users Group 9/2014

  • 1. Failing Fast with Redis backed BloomFilters • Christopher Curtin • Head of Technical Research • @ChrisCurtin
  • 2. About Me  25+ years in technology  Head of Technical Research at Silverpop, an IBM Company (14 + years at Silverpop)  Built a SaaS platform before the term ‘SaaS’ was being used  Prior to Silverpop: real-time control systems, factory automation and warehouse management  Always looking for technologies and algorithms to help with our challenges
  • 3. Silverpop Open Positions  Technical Lead  Senior Engineer  Architect  Automation Engineers
  • 4. Agenda  Redis  Bloom Filters  Failing Fast
  • 5. Agenda  Redis  What it is  Why we started looking at using it  Basics  Concurrency  Operational Considerations  Challenges
  • 6. Redis – What is it? From redis.io: "Redis is an open source, BSD licensed, advanced key-value cache and store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs."
  • 7. Hyper-what-what? HyperLogLog Approximation technique for counting distinct entries in a set. Very small memory footprint for rough approximations (16 kb for 99% accuracy) Nice – but too much loss for what we need
  • 8. Features • Unlike typical key-value stores, you can send commands to edit the value on the server vs. reading back to the client, updating and pushing to the server • pub/sub •TTL on keys •Clustering and automatic fail-over •Lua scripting •client libraries for just about any language you can think of
  • 9. So Why did we start looking at NoSQL? “For the cost of an Oracle Enterprise license I can give you 64 cores and 3 TB of memory”
  • 10.
  • 11. Redis Basics  In Memory-only key-value store  Single Threaded. Yes, Single Threaded  No Paging, no reading from disk  CS 101 data structures and operations  10's of millions of keys isn't a big deal  How much RAM defines how big the store can get
  • 12. Basic DataTypes  String  Hashes  Lists  Sets and Sorted Sets CS 101 ...
  • 13. Hashes Hashes - collection of key-value pairs with a single name - useful for storing data under a common name - values can only be strings or numeric. No hash of lists http://redis.io/commands/hget
  • 14. Sets and Sorted Sets  Buckets of values with very fast membership look-up  No duplicates allowed  Sorted Sets have scores to make them sortable – Automatically keeps them in order for fast 'top x' look ups http://redis.io/commands/zadd http://redis.io/commands/zrange
  • 15. Lists  Most interesting due to how operations are applied to the remote store  Unbounded (except by memory)  Atomic operations between lists (pop from one, push to another)  CS 101: lpush, rpush, lpop, range etc.  Advanced: blocking pops Http://redis.io/commands/rpush http://redis.io/commands/rpoplpush
  • 16. Concurrency  Single threaded  Each operation can work on one or two keys, atomically  Pipelines allow execution of commands in sequence in a single server request (Redis will only execute the pipeline)  Pipelines do not allow for logic between commands  LUA Scripts allow for logic between commands  BE CAREFUL with LUA, scripts block all clients!
  • 17. Pipeline Java Example  BloomFilterRedis.java line 43
  • 18. Lua Example  Lua-scripts example
  • 19. Operational Information  Persistence can be 'none', journal (AOF) or point in time (RDB)  Optional Master/Slave replication  Home-grown HA platform (Sentinel)  Common deployment model is lots of instances per machine  Millions of keys gets hard to manage – build 'directory' hashes to make it easier for operations to find keys to look at
  • 20. Challenges with Redis  Key Explosion – single name space  LUA scripts can block all others users  Pipelines can block all other users  No nested data types (I want a hash of lists!)  Without name spaces be cautious of how you define key names
  • 21. Concurrency Demo – JMS replacement  Client submits a request to the queue (LPUSH)  Consumer application polls for work when worker is available (RPOPLPUSH)  Worker executes the task assigned to it  When worker is done, its list is removed  Lather, Rinse, Repeat  (We provide a hash of workers for Operations to query for monitoring)
  • 22. Agenda  Bloomfilters  What they are  Why we started looking at using them  Basics  False Positives  Example Uses  Why not do this in a database?
  • 23. Bloom Filters From WikiPedia (Don't tell my kid's teacher!) "A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate"
  • 24. Hashing  Apply 'x' hash functions to the key to be stored/queried  Each function returns a bit to set in the bitset  Mathematical equations to determine how big to make the bitset, how many functions to use and your acceptable error level  http://hur.st/bloomfilter?n=4&p=1.0E-20
  • 26. False Positives  Perfect hash functions aren't worth the cost to develop  Sometimes existing bits for a key are set by many other keys  Make sure you understand the business impact of a false positive  Remember, never a false negative
  • 27. Creation  Libraries are available for every language I looked up (even JavaScript)  Some are built in memory, for a single process/JVM to use  Read-only (ad networks) are built using Hadoop and loaded into memory  In memory is great for lots of reads, single process/JVM etc.  But ...
  • 28. Updates  Updating a 16 MB structure in memory and persisting to disk is expensive  8 bits change and you write 16 MB!!!!!! (DBAs will love you …)
  • 29. Deletes  Not possible in a regular Bloom Filter – how would you know what bits are used by other keys?  Counting BloomFilters keep a few bits (3-4) per bit in the bitmap as a counter. 'delete' decrements the key  Not as space friendly any more …  Instead, consider having bloom filters based around the lifetime of the data to be queried – For a filter 'visited in the last 4 hours' have 4 filters and age the oldest out (TTL in Redis maybe ...)
  • 30. Issue: Persistence  Load a 16 MB filter from database to check 6 bits?  Worse: update 6 bits in a 16 MB filter  DBAs will not be happy – Undo/redo – SGA misses, page faults – Backups, replication traffic etc.
  • 31. Why were we interested in Bloom Filters?  Found a lot of places we went to the database to find the data didn't exist  Found lots of places where we want to know if a user DIDN'T do something
  • 32. Persistent Bloom Filters  We needed persistent Bloom Filters for lots of user stories  Found Orestes-BloomFilter on GitHub that used Redis as a store and enhanced it  Added population filters  Fixed a few bugs  Did a pull request and it was accepted!
  • 33. Benefits  Filters are stored in Redis • Only bitset/bitget calls to server  Reads and updates of the filter from set of application servers  Persistence has a cost, but a fraction of the RDBMS costs  Can load a BF created offline and begin using it
  • 34. Remember “For the cost of an Oracle License”  Thousands of filters  Dozens of Redis instances  TTL on a Redis key makes cleanup of old filters trivial
  • 35. Population Bloom Filters  Unique need we had  Users access the system frequently, but I really only need to count them once per month for billing  10's of Thousands of clients, Finance wants monthly report in seconds  Logic is simple: if any bits weren't set for the key (user id), increment the counter  Note: there are mathematical methods of estimating a BF population but we needed better error rate
  • 36. Example Uses of Bloom Filters  Webcache – what URLs are already in the cache on another server?  P2P networks – what node contains which part of the file?  Databases – Do keys exist in this page? If not, don't load the page – Hbase uses them to detect which blocks do not have the data (HDFS is write-once) – Many RDBMS use them internally to 'fail fast' and not load pages into memory – Sadly, no RDBMS or NoSQL I know of offers them as user data types
  • 37. Example Uses of Bloom Filters  Ad networks (old way ...) – Big Hadoop job hourly/nightly to determine which ads to show based on prior behavior – Load the filter into a common storage (disk usually) – Ad servers load all the filters into memory and query for your cookie id to see what to show you
  • 38. Examples of Redis-backed BloomFilters  Has the user be here this month? If not show them a Message. False positive doesn't matter  White vs. Black list for IP – Known bad IP in the filter – Upon login check the filter. Not found, login. Found – check DB to validate bad IP. – False Positive will lead to query that returns false, but should be rare • Ad Networks (real time BF updates based on what you searched on)
  • 39. Client side Joins  Most NoSQL don't support joins  Architecture may have data across multiple stores  Keep a Population Bloom Filter by day of unique users in a data source  When needing to join, load smallest data source as the driver and query other sources in order of size  If queries are time based and filters are available for the time, looking up key matches can be very fast
  • 40. Agenda  Fail Fast  What it is  Redis-backed BloomFilters  Examples
  • 41. Fail Fast  The ability to quickly know to NOT do something expensive  Example: Black-list of IPs  Think about ways to NOT do some work  Cost of Redis servers is much less than an RDBMS license or the cost of a good DB server with storage!
  • 43. Be careful  Sometimes the cost of building and maintaining the structures outweighs the benefit  Convoluted designs to avoid the database  Collect Metrics on 'hits' to see if they are any benefit (CodaHale)
  • 44. Example (naive)  Build a BF for ads shown to a user (hash on user id and ad id)  When the user visits, hash their user id and the top ad to display this hour and set the bits in the BF  If any were not set, the Population count is incremented and you display the ad  If already set, move to the next most important ad.  Now know total unique views by ad by hour  Can do total gross with a Redis Hash too!
  • 45. Example – smarter  Hash the top 10 ad ids to the user id and parallel request (Pipeline)  Check the return to see which ones aren't set, submit an update request and set the population  2 round trips to check 10 ads.  (Can also do this in LUA in 1 round trip)
  • 46. Example – part 2  Same idea as before, but build the bloom filter for each hour  When user visits, query last 6 filters in parallel (pipeline!) to see if they've seen the ad(s).  Redis TTL on the hourly filter will drop it automatically when it becomes too old
  • 47. Example 3  Collect lots of data about users (such as virtual cows, farm land, chickens etc.)  Run a predictive model on the data and identify which special offers to show the user visits again. Store user ids in a Bloom Filter  Load the BF into Redis  Query each time the user logs in and display appropriate offer  No massive database insert/updates to flag who should see it  False positive isn't too bad
  • 48. Example 4 – Query optimization  Client-side joins  Ask the Bloom Filter if the user has performed the action (filters for hour, day, week of year etc.)  If not, don't even call the data source  May need to read some extra data due to 'in the last 11 days' but asking the BF and being told 'no' prevents ANY data source resources to be used  What if the BF is lost? Rebuild it from the base events (Hadoop!)
  • 49. Conclusion  Redis is a very fast, very simple and very powerful name value store “Data structure server”  Bloom Filters have lots of applications when you want to quickly look up if one of millions of 'things' happened  Redis-backed BloomFilters make updatable bloom filters trivial to use  Think about what you need to know to NOT do an expensive operation  Fail fast
  • 50. References  Redis.io  http://en.wikipedia.org/wiki/Bloom_filter  http://hur.st/bloomfilter?n=4&p=1.0E-20  https://github.com/Baqend/Orestes-Bloomfilter  http://www.slideshare.net/chriscurtin  @ChrisCurtin on twitter  Github.com/chriscurtin