Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- 04 design concepts_n_principles by University of Com... 1580 views
- Inheritance in c++ ppt (Powerpoint)... by cprogrammings 37837 views
- The Physical Layer by adil raja 2564 views
- Classes And Methods by adil raja 412 views
- Inheritance in c++ by Deepak Singh 12229 views
- Data structure and algorithms in c++ by Karmjeet Chahal 3209 views

4,369 views

Published on

A Lecture about hashing and hash tables.

No Downloads

Total views

4,369

On SlideShare

0

From Embeds

0

Number of Embeds

114

Shares

0

Downloads

286

Comments

0

Likes

11

No embeds

No notes for slide

- 1. Introduction Hashing Techniques Applications HASHING Muhammad Adil Raja
- 2. Introduction Hashing Techniques Applications OUTLINE 1 INTRODUCTION 2 HASHING TECHNIQUES 3 APPLICATIONS
- 3. Introduction Hashing Techniques Applications OUTLINE 1 INTRODUCTION 2 HASHING TECHNIQUES 3 APPLICATIONS
- 4. Introduction Hashing Techniques Applications OUTLINE 1 INTRODUCTION 2 HASHING TECHNIQUES 3 APPLICATIONS
- 5. Introduction Hashing Techniques Applications HASHING The idea of hashing is to distribute the entries of a dataset across an array of buckets. Given a key, the algorithm computes an index that suggests where an entry can be found: index = f(key, array_size) Often this is done in two steps: hash = hashfunc(key). index = hash % array_size
- 6. Introduction Hashing Techniques Applications WHAT IS HASHING A Hash Table A data structure to implement an associative array. A structure that can map keys to values. Uses a hash function to compute an index into an array of buckets or slots from which the correct value can be found.
- 7. Introduction Hashing Techniques Applications HASHING TECHNIQUES Separate Chaining. Open Addressing. Coalesced Hashing. ....
- 8. Introduction Hashing Techniques Applications HASH FUNCTION Crucial for good hash table performance. Can be difﬁcult to achieve. A basic expectation is that the function would provide a uniform distribution of hash values. A non-uniform distribution increases the number of collisions and the cost of resolving them.
- 9. Introduction Hashing Techniques Applications COLLISION RESOLUTION Practically unavoidable. Birthday problem.
- 10. Introduction Hashing Techniques Applications SEPARATE CHAINING Every bucket is independent. And maintains a list of entries with the same index. Time for hash function operations depends on the time to ﬁnd the bucket (constant) and the time for list operations. The technique is also called open hashing or closed addressing. In a good hash table every bucket has very few entries.
- 11. Introduction Hashing Techniques Applications SEPARATE CHAINING FIGURE: Pause
- 12. Introduction Hashing Techniques Applications SEPARATE CHAINING WITH LINKED LISTS Popular as they require basic data structures with simple algorithms. They can use simple hash functions that are unsuitable for other methods. Cost of the table operation depends on the size of the selected bucket for the desired key. The worst case scenario is when all the entries are inserted into the same bucket.
- 13. Introduction Hashing Techniques Applications SEPARATE CHAINING WITH OTHER DATA STRUCTURES AVL Trees. BSTs. Dynamic Arrays.
- 14. Introduction Hashing Techniques Applications TIME COMPLEXITY MEASURES TABLE: Time Complexity Measures Guarantee Average Case Implementation Search Insert Delete Search Insert Delete Unordered Array N N N N/2 N/2 N/2 Ordered Array lg N N N lg N N/2 N/2 Unordered List N N N N/2 N N/2 Ordered List N N N N/2 N/2 N/2 BST N N N 1.39 lg N 1.39 lg N ? Randomized BST 7 lg N 7 lg N 7 lg N 1.39 lg N 1.39 lg N 1.39 lg N
- 15. Introduction Hashing Techniques Applications OPEN ADDRESSING (CLOSED HASHING) All entry records are stored in the bucket array itself. Insertion of a new entry: The buckets are examined, starting from the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. Searching: The buckets are scanned in the same sequence, until the target entry is found, or an unused slot is found, which indicates that there is no such key in the table. Open Addressing: Refers to the fact that location (address) of an entry is not determined by its hash value. Closed Hashing: Not to be confused with open hashing or close addressing -> names reserved for separate chaining.
- 16. Introduction Hashing Techniques Applications PROBE SEQUENCES Linear Probing – A ﬁxed interval between probes (usually 1). Quadratic Probing – Interval between probes is increased by adding the successive outputs of a quadratic polynomial to the starting value given by the original computation. Double Hashing – Interval between probes is computed by another hash function. Drawback: The number of stored entries cannot exceed the number of slots in the bucket array.
- 17. Introduction Hashing Techniques Applications OPEN ADDRESSING
- 18. Introduction Hashing Techniques Applications LOAD FACTOR – A KEY STATISTIC Number of entries divided by the number of buckets – n/k. If this grows too large the hash table becomes slow. Variance of number of entires per bucket is important. Two tables have 1000 entries and 1000 buckets. One has one entry in one bucket and the second has all the entries in one bucket. Hashing is not working in the second hash table. A low load factor is not beneﬁcial. As the load factor approaches 0, the proportion of unused areas in the hash table increases. This does not necessarily reduce the search cost. This results in wasted memory.
- 19. Introduction Hashing Techniques Applications HOW DROPBOX KNOWS YOU ARE SHARING COPYRIGHTED STUFF Dropbox checks the hash of a shared ﬁle against a banned list, and blocks the share if there is a match. With a properly implemented hash function, running the same exact ﬁle through the algorithm twice will return the same identiﬁer both times – but changing a ﬁle even slightly completely changes the hash. This identiﬁer can be used to tell you if a ﬁle is exactly the same as another ﬁle – but it is a one way street. The hash couldn’t tell you what that original ﬁle is, without you already knowing or having a copy of the ﬁle to compare it to.
- 20. Introduction Hashing Techniques Applications DROPBOX FIGURE: Pause
- 21. Introduction Hashing Techniques Applications DROPBOX When you upload a ﬁle to Dropbox, two things happen to it: a hash is generated, and then the ﬁle gets encrypted to keep any unauthorized user (be it a hacker or a Dropbox employee) who somehow stumbles it sitting on Dropbox’s servers from easily being able to open it up. After a DMCA complaint is veriﬁed by Dropbox’s legal team, Dropbox adds that ﬁle’s hash to a big blacklist of hashes known to be those corresponding to ﬁles they can’t legally allow to be shared. When you share a link to a ﬁle, it checks that ﬁle’s hash against the blacklist. If the ﬁle you are sharing is the exact same ﬁle that a copyright holder complained about, it is blocked from being shared with others. If it is something else – a new ﬁle, or even a modiﬁed version of the same ﬁle – a hash-based anti-infringement system should not have any idea what it is looking at.
- 22. Introduction Hashing Techniques Applications SUBTREE CACHING (IN SYMBOLIC REGRESSION) log log tan z + y x * (tan y + z ) log (x + yz ) * x + * x * y z parents Functions subtrees selected randomly for crossover
- 23. Introduction Hashing Techniques Applications SUBTREE CACHING Every subtree is evaluated and cached, along with its evaluation. As a new tree arrives, its subtrees are supposed to be evaluated recursively. Before evaluation, the cache is checked for an evaluation of a matching subtree. If found, evaluation is kept. If not found, the new subtree is evaluated and its evaluation is stored in the cache. Improves performance by saving time on unnecessary evaluations.
- 24. Introduction Hashing Techniques Applications THANKYOU

No public clipboards found for this slide

Be the first to comment