Footalks#1 Bloom Filters


Published on

Introduction to fooTalks and Bloom Filter Data structure.

Published in: Education, Technology, Business
1 Comment
  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Footalks#1 Bloom Filters

  1. 1. fooTalks <ul><li>Increasing you knowledge through sharing </li></ul><ul><li>Tell what you know, hear what others know </li></ul><ul><li>Expect it to happen periodically, every week </li></ul><ul><li>Expects only a limited audience </li></ul><ul><li>Expects contributing audience </li></ul><ul><li>Intrest level rises, can bring experts </li></ul><ul><li>Need volunteered speakers </li></ul>
  2. 2. FooTalks #1 <ul><li>Agenda </li></ul><ul><li>Introduction </li></ul><ul><li>Explanation </li></ul><ul><li>Demo </li></ul><ul><li>Algorithms </li></ul><ul><li>Q&A </li></ul>
  3. 3. Agenda Bloom Filters
  4. 4. Introduction <ul><li>Bloom filters are compact data structures for probabilistic representation of a set in order to support membership queries (i.e. queries that ask: “Is element X in set Y ?”). This compact representation is the payoff for allowing a small rate of false positives in membership queries; that is, queries might incorrectly recognize an element as member of the set. </li></ul>
  5. 5. In a simple way.. <ul><li>A datastructure to represent presence of an element in a set </li></ul><ul><li>With bloom filters you can check if an element is present in the set or not </li></ul><ul><li>Allows false positives: will say yes even if no </li></ul><ul><li>Never allow false negatives: will never say no if yes </li></ul><ul><li>Ie {a,b,c,d} check for 'z' might give yes but check for 'a' will never give a no </li></ul>
  6. 6. What else you need? <ul><li>f(”hello”) = 9 </li></ul><ul><li>f(”fooTalks”) = 23 </li></ul><ul><li>hash function(s): given an i/p gives a numerical o/p </li></ul>
  7. 7. How its represented <ul><li>A bit vector </li></ul>0 1 2 3 4 5
  8. 8. And how it works? <ul><li>f(”hello”) = 5->map this to our bit vector </li></ul><ul><li>Ie, set index 5 in bit vector </li></ul>0 1 2 3 4 5
  9. 9. If index exceeds <ul><li>Use mod!, why the hell did u learn it for? </li></ul><ul><li>f('fooTalks') = 9%5=4 </li></ul>0 1 2 3 4 5
  10. 10. And what else? <ul><li>Multiple hashes can map to same index </li></ul><ul><li>f(”hello”) = 5 & f(”foo”)=5 </li></ul><ul><li>This leads to false positives </li></ul>
  11. 11. And you now know why no false negs <ul><li>If not, get out of here... </li></ul>
  12. 12. Usage <ul><li>Simple Spell Checker </li></ul><ul><li>Or i could complicate, used in networks,bioinformatics,blah..blah..blah.. </li></ul>
  13. 13. Enough with the talking.... <ul><li>Demo </li></ul><ul><li>Code available at: </li></ul>
  14. 14. Algorithm: Setting up BF <ul><li>Procedure BloomFilter(set A, hash_functions, integer m) </li></ul><ul><li>returns filter </li></ul><ul><li>filter = allocate m bits initialized to 0 </li></ul><ul><li>foreach a i in A : </li></ul><ul><li>foreach hash function h j : </li></ul><ul><li>filter[ h j (a i ) ] = 1 </li></ul><ul><li>end foreach </li></ul><ul><li>end foreach </li></ul><ul><li>return filter </li></ul>
  15. 15. Algorithm: Membership Test <ul><li>Procedure MembershipTest (elm, filter, hash_functions) </li></ul><ul><li>returns yes/no </li></ul><ul><li>foreach hash function h j : </li></ul><ul><li>if filter[ h j (elm) ] != 1 return No </li></ul><ul><li>end foreach </li></ul><ul><li>return Yes </li></ul>
  16. 16. Q&A <ul><li>I hate this part </li></ul><ul><li>Well you can ask, i can try answering... </li></ul>
  17. 17. fooTalks Ends <ul><li>May the force be with you </li></ul>