Understanding Collisions in
Data Structures
This presentation explores collisions in data structures. Collisions occur
when distinct keys map to the same location. Understanding collisions
is key for efficient data storage. Like two people assigned the same seat.
by Mosiuoa Wesi
The Pigeonhole Principle
Pigeonhole Principle
The Pigeonhole Principle states that if you have more items
than containers, at least one container must have 2 items.
This principle is crucial for understanding collisions.
Hashing and Collisions
In hashing, having more keys than available slots
guarantees collisions. For example, storing 11 elements in
a hash table of size 10 will always result in at least one
collision.
Separate Chaining
1 Definition
Separate Chaining (open
hashing) resolves collisions.
Each hash table slot points
to a linked list of key-value
pairs. This method allows
multiple elements to be
stored at the same index.
2 Operations
Insertion, search, and
deletion operations occur
within the chain. The
average case performance
is O(1+n/k). Here, k
represents the number of
slots in the table.
3 Python Example
Lists can implement chaining in Python, to handle collisions. It
involves appending colliding elements to the same list.
Open Addressing Techniques
Linear Probing
Check the next
available slot. It
wraps around, but is
prone to clustering.
Example: insert keys
18, 89, 26, 19 into a
table of size 10, using
hash function k mod
10
Quadratic
Probing
Check slots i + k^2.
This approach
reduces primary
clustering. For
example, insert keys
50, 70, 76, 85, 93 into
a table of size 11,
using hash function k
mod 11.
Double Hashing
Use a second hash
function to determine
the step size. This
avoids clustering.
Example: hash1(key)
= key % tableSize,
hash2(key) = prime -
(key % prime), where
prime < tableSize
Impact on Performance
1
Time Complexity
Collisions impact search and insertion time complexity. In
the worst-case, all keys collide. This results in O(n) search
time. 2 Load Factor
A good hash function yields O(1) search and insertion time
on average. The load factor affects performance. The load
factor is the number of elements divided by the table size.
3
Rehashing
Rehashing is dynamic resizing to maintain performance. By
growing the size of the hashtable you reduce the load
factor.
Choosing a Good Hash Function
Minimize Collisions
Selecting a suitable hash function
minimizes collisions. A good hash
function is crucial for performance.
Desirable Properties
Uniformity and efficiency are desirable
properties of good hash functions. A
hash function should distribute keys
evenly.
Common Pitfalls
Avoid common pitfalls when
designing hash functions. For
instance, do not use only part of the
key.
Real-World Applications
Databases
Hash tables are used in databases. They are useful for indexing and searching
data records. Examples include MySQL and MongoDB.
Caching
Caching systems use hash tables to store frequently accessed data. This allows for
quick retrieval. Examples are Redis and Memcached.
Compilers
In compilers, hash tables are used as symbol tables. Symbol tables provides
efficient variable lookup.
Cryptography
Cryptography uses hash functions for data integrity and security. SHA-256 is an
example of a hash function in cryptography.
Summary and Conclusion
1
Summary
Recap of key concepts: collisions, resolution, and performance.
2
Importance
Understanding collisions is key for design.
3
Future
New research and strategies are underway.
Collisions impact data structure performance. Addressing audience questions can clarify doubts. Future research will
improve hashing.

Understanding-Collisions-in-Data-Structures.pptx

  • 1.
    Understanding Collisions in DataStructures This presentation explores collisions in data structures. Collisions occur when distinct keys map to the same location. Understanding collisions is key for efficient data storage. Like two people assigned the same seat. by Mosiuoa Wesi
  • 2.
    The Pigeonhole Principle PigeonholePrinciple The Pigeonhole Principle states that if you have more items than containers, at least one container must have 2 items. This principle is crucial for understanding collisions. Hashing and Collisions In hashing, having more keys than available slots guarantees collisions. For example, storing 11 elements in a hash table of size 10 will always result in at least one collision.
  • 3.
    Separate Chaining 1 Definition SeparateChaining (open hashing) resolves collisions. Each hash table slot points to a linked list of key-value pairs. This method allows multiple elements to be stored at the same index. 2 Operations Insertion, search, and deletion operations occur within the chain. The average case performance is O(1+n/k). Here, k represents the number of slots in the table. 3 Python Example Lists can implement chaining in Python, to handle collisions. It involves appending colliding elements to the same list.
  • 4.
    Open Addressing Techniques LinearProbing Check the next available slot. It wraps around, but is prone to clustering. Example: insert keys 18, 89, 26, 19 into a table of size 10, using hash function k mod 10 Quadratic Probing Check slots i + k^2. This approach reduces primary clustering. For example, insert keys 50, 70, 76, 85, 93 into a table of size 11, using hash function k mod 11. Double Hashing Use a second hash function to determine the step size. This avoids clustering. Example: hash1(key) = key % tableSize, hash2(key) = prime - (key % prime), where prime < tableSize
  • 5.
    Impact on Performance 1 TimeComplexity Collisions impact search and insertion time complexity. In the worst-case, all keys collide. This results in O(n) search time. 2 Load Factor A good hash function yields O(1) search and insertion time on average. The load factor affects performance. The load factor is the number of elements divided by the table size. 3 Rehashing Rehashing is dynamic resizing to maintain performance. By growing the size of the hashtable you reduce the load factor.
  • 6.
    Choosing a GoodHash Function Minimize Collisions Selecting a suitable hash function minimizes collisions. A good hash function is crucial for performance. Desirable Properties Uniformity and efficiency are desirable properties of good hash functions. A hash function should distribute keys evenly. Common Pitfalls Avoid common pitfalls when designing hash functions. For instance, do not use only part of the key.
  • 7.
    Real-World Applications Databases Hash tablesare used in databases. They are useful for indexing and searching data records. Examples include MySQL and MongoDB. Caching Caching systems use hash tables to store frequently accessed data. This allows for quick retrieval. Examples are Redis and Memcached. Compilers In compilers, hash tables are used as symbol tables. Symbol tables provides efficient variable lookup. Cryptography Cryptography uses hash functions for data integrity and security. SHA-256 is an example of a hash function in cryptography.
  • 8.
    Summary and Conclusion 1 Summary Recapof key concepts: collisions, resolution, and performance. 2 Importance Understanding collisions is key for design. 3 Future New research and strategies are underway. Collisions impact data structure performance. Addressing audience questions can clarify doubts. Future research will improve hashing.