Evolving Universal Hash Function using Genetic Algorithms

2,036 views
1,909 views

Published on

The ppt presented at the International Conference on Future Computer and Communication, 2009 at Kuala Lumpur, Malaysia. Includes the early work done in the project: "Evolving Universal Hash Functions using Genetic Algorithms". The revised version of this project was presented at GECCO 2009.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,036
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Evolving Universal Hash Function using Genetic Algorithms

  1. 1. Evolving Universal Hash Functions Using Genetic Algorithms Ramprasad Joshi, Mustafa Safdari 2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI GOA CAMPUS
  2. 2. Outline <ul><li>Introduction </li></ul><ul><li>Implementation of Genetic Algorithms </li></ul><ul><li>Simulation and Result </li></ul><ul><li>Conclusion and future work </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  3. 3. Introduction <ul><li>Universal Hash Functions </li></ul><ul><li>Selecting h randomly </li></ul>
  4. 4. Universal Hash Functions <ul><li>Mapping integers in the range [0,M-1] to [0,N-1] </li></ul><ul><li>A Set H of hash functions is Universal if for any 2 keys j and k and a randomly chose hash function h, </li></ul><ul><li>Expected no. of collisions for any key is n/N </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  5. 5. Selecting h randomly <ul><li>One such type of Hash function: </li></ul><ul><li>p is a prime number, </li></ul><ul><li>a, b are any two random integers, </li></ul><ul><li>How do we select a, b, p ? </li></ul><ul><li>Minimize collisions as much as possible </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  6. 6. Implementation of GA <ul><li>Chromosome, Fitness Function, Crossover, Mutation </li></ul><ul><li>p_values, p_Array </li></ul>
  7. 7. Elements of the GA <ul><li>Chromosome: </li></ul><ul><li>Fitness function: </li></ul><ul><li>Crossover types: single point, 2 point, midway and random </li></ul><ul><li>Mutation: single point, multi point </li></ul><ul><li>Roulette Wheel Selection </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  8. 8. p_values, p_Array <ul><li>p is any prime number such that M ≤ p < 2M . An array p_values called keeps track of the allowable values of p so that it can be used in the above steps. p_values can be constructed and populated it using any sieve algorithm (from Primality testing) to find out prime numbers within a range. The method used in our implementation of the algorithm uses Sieve of Eratosthenes. </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  9. 9. p_Array <ul><li>For every chromosome of ( a, b) there is an associated value for p such that </li></ul><ul><li>To store this information in the chromosome, we create a separate array called p_Array which stores for each chromosome, the index of the prime number present in p_values. For example, if a chromosome in the population has a=9, b=7, p=4, it means that the value of p assigned for this chromosome is the one found in p_values at index 4. </li></ul><ul><li>Index values of p don’t undergo crossover/mutation. Only a, b do. But after each such operation, a suitable p is found for the new resultant a, b pair if the one associated with the parent chromosome doesn’t satisfy (1). </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  10. 10. Simulations and Results <ul><li>Simulation settings </li></ul><ul><li>Results </li></ul>
  11. 11. Simulation Settings <ul><li>No. of generations = 30 </li></ul><ul><li>Size of populations = 50 </li></ul><ul><li>p c = 0.8, p m = 0.01 </li></ul><ul><li>Input set of keys N </li></ul><ul><ul><li>Uniformly Randomly Generated in (0, 50000) </li></ul></ul><ul><ul><li>Different sets of size 10, 100, 1000, 10000 </li></ul></ul><ul><ul><li>Taking N as prime gives better results </li></ul></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  12. 12. Results 2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION TABLE I RESULTS OF RUNNING THE ALGORITHM FOR RANDOM INPUT DISTRIBUTIONS Sr. No. Range Of Input Crossover Type * Mutation Type * No. of keys n No. of buckets N No. of initial collisions n collisions n filled p a b 1. 0-10 1 2 10 10 0 0 10 11 3 2 2. 0-500 1 2 10 11 1 4 6 701 67 452 3. 0-600 1 2 20 23 2 2 18 1013 626 635 4. 0-100 1 1 100 100 0 0 100 179 109 114 5. 0-50000 1 2 100 101 8 21 79 98869 54339 35059 6. 0-1000 1 2 500 499 0 1 499 1823 747 581 7. 0-50000 1 2 500 499 37 108 392 69313 46631 9950 8. 1 2 10000 10000 0 0 10000 14153 9347 517 9. 1 2 10000 10000 0 0 10000 57203 25869 37769 10. 0-50000 1 2 10000 10000 911 2397 6692 79063 33068 31178 * Indices from the crossover and mutation type as mentioned in the previous section
  13. 13. Case 1 <ul><li>Multiple point mutations (2 points) gave a much better result in lesser number of generations as compared to single point or more than 2 point mutation, Single Point Random crossover was found to produce much better results. </li></ul><ul><li>The convergence of the algorithm under any case was within 7-8 generations in the worst case. </li></ul><ul><li>For some cases, where the range of distribution was really big and not coincident with [0, N -1], the number of collisions was relatively more. However, this number was drastically reduced when N was taken as a prime number in the nearby range. </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  14. 14. Case 2 (Comparative Runs) <ul><li>In the next type of simulation, the algorithm was tested against randomly selecting h. </li></ul><ul><li>The algorithm performed much better than the random selection, giving lesser number of collisions. </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION Table 2. Results of Comparative Run 1 Input File n collisions by random selection n collisions by GA generated function 1 286 251 2 273 256 3 267 245 4 285 244 5 285 255 6 285 262 7 281 259 8 273 255 9 273 258 10 304 259 Setting for GA: P=100, N=1423, p c =0.75 (1), p m =0.01 (1)
  15. 15. In the End… <ul><li>Conclusion </li></ul><ul><li>Future Work </li></ul><ul><li>Acknowledgement </li></ul>
  16. 16. Conclusion <ul><li>The proposed algorithm produces an efficient Universal Hash function for hashing a given distribution of keys which results in the relatively less number of collisions. </li></ul><ul><li>The problem of clustering is avoided by generating a hash function using metaheuristic, in this case Genetic Algorithms. </li></ul><ul><li>It performs better than random selection of h . </li></ul><ul><li>This algorithm is ideal for scenarios where the input distribution to be hashed is changing frequently and the hash function needs to be changed dynamically to rehash the input. </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  17. 17. Future Work <ul><li>The scope for future work on this algorithm include </li></ul><ul><ul><li>selection of an efficient Sieve algorithm </li></ul></ul><ul><ul><li>an efficient encoding of the chromosome </li></ul></ul><ul><ul><li>understanding the effect of various types of crossover and mutation on the result </li></ul></ul><ul><ul><li>better design of fitness function so that the few exceptional cases are also taken care of </li></ul></ul><ul><ul><li>Testing the algorithm against some standard hash functions. </li></ul></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  18. 18. Acknowledgment <ul><li>My sincere thanks to Mr. Ramprasad Joshi, my mentor and guide for this project. </li></ul><ul><li>I also thank my colleague Miss Joanna Mary Oommen for assistance with the paper and presentation. </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION
  19. 19. Thank You! <ul><li>Any Questions? </li></ul>2009 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER AND COMMUNICATION BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI GOA CAMPUS

×