Upcoming SlideShare
×

# Footalks#1 Bloom Filters

517 views

Published on

Introduction to fooTalks and Bloom Filter Data structure.

1 Comment
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Hello...I’m learning about bloom filters
“Multi-dimensional Range Query for Data
Management using Bloom Filters”
I speak English is not good! i’m Sorry.

Are you sure you want to  Yes  No
• Be the first to like this

Views
Total views
517
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
11
1
Likes
0
Embeds 0
No embeds

No notes for slide

### Footalks#1 Bloom Filters

1. 1. fooTalks <ul><li>Increasing you knowledge through sharing </li></ul><ul><li>Tell what you know, hear what others know </li></ul><ul><li>Expect it to happen periodically, every week </li></ul><ul><li>Expects only a limited audience </li></ul><ul><li>Expects contributing audience </li></ul><ul><li>Intrest level rises, can bring experts </li></ul><ul><li>Need volunteered speakers </li></ul>
2. 2. FooTalks #1 <ul><li>Agenda </li></ul><ul><li>Introduction </li></ul><ul><li>Explanation </li></ul><ul><li>Demo </li></ul><ul><li>Algorithms </li></ul><ul><li>Q&A </li></ul>
3. 3. Agenda Bloom Filters
4. 4. Introduction <ul><li>Bloom filters are compact data structures for probabilistic representation of a set in order to support membership queries (i.e. queries that ask: “Is element X in set Y ?”). This compact representation is the payoff for allowing a small rate of false positives in membership queries; that is, queries might incorrectly recognize an element as member of the set. </li></ul>
5. 5. In a simple way.. <ul><li>A datastructure to represent presence of an element in a set </li></ul><ul><li>With bloom filters you can check if an element is present in the set or not </li></ul><ul><li>Allows false positives: will say yes even if no </li></ul><ul><li>Never allow false negatives: will never say no if yes </li></ul><ul><li>Ie {a,b,c,d} check for 'z' might give yes but check for 'a' will never give a no </li></ul>
6. 6. What else you need? <ul><li>f(”hello”) = 9 </li></ul><ul><li>f(”fooTalks”) = 23 </li></ul><ul><li>hash function(s): given an i/p gives a numerical o/p </li></ul>
7. 7. How its represented <ul><li>A bit vector </li></ul>0 1 2 3 4 5
8. 8. And how it works? <ul><li>f(”hello”) = 5->map this to our bit vector </li></ul><ul><li>Ie, set index 5 in bit vector </li></ul>0 1 2 3 4 5
9. 9. If index exceeds <ul><li>Use mod!, why the hell did u learn it for? </li></ul><ul><li>f('fooTalks') = 9%5=4 </li></ul>0 1 2 3 4 5
10. 10. And what else? <ul><li>Multiple hashes can map to same index </li></ul><ul><li>f(”hello”) = 5 & f(”foo”)=5 </li></ul><ul><li>This leads to false positives </li></ul>
11. 11. And you now know why no false negs <ul><li>If not, get out of here... </li></ul>
12. 12. Usage <ul><li>Simple Spell Checker </li></ul><ul><li>Or i could complicate, used in networks,bioinformatics,blah..blah..blah.. </li></ul>
13. 13. Enough with the talking.... <ul><li>Demo </li></ul><ul><li>Code available at: github.com/jesly.varghese </li></ul>
14. 14. Algorithm: Setting up BF <ul><li>Procedure BloomFilter(set A, hash_functions, integer m) </li></ul><ul><li>returns filter </li></ul><ul><li>filter = allocate m bits initialized to 0 </li></ul><ul><li>foreach a i in A : </li></ul><ul><li>foreach hash function h j : </li></ul><ul><li>filter[ h j (a i ) ] = 1 </li></ul><ul><li>end foreach </li></ul><ul><li>end foreach </li></ul><ul><li>return filter </li></ul>
15. 15. Algorithm: Membership Test <ul><li>Procedure MembershipTest (elm, filter, hash_functions) </li></ul><ul><li>returns yes/no </li></ul><ul><li>foreach hash function h j : </li></ul><ul><li>if filter[ h j (elm) ] != 1 return No </li></ul><ul><li>end foreach </li></ul><ul><li>return Yes </li></ul>
16. 16. Q&A <ul><li>I hate this part </li></ul><ul><li>Well you can ask, i can try answering... </li></ul>
17. 17. fooTalks Ends <ul><li>May the force be with you </li></ul>