Hashing is a technique for mapping data to array indices to allow for fast insertion and search operations in O(1) time on average. It works by applying a hash function to a key to obtain an array index, which may cause collisions that require resolution techniques like separate chaining or open addressing. Open addressing resolves collisions by probing alternative indices using functions like linear probing, quadratic probing, or double hashing to find the next available empty slot.
Hashing is a technique used to store and retrieve information quickly by mapping keys to values in a hash table using a hash function. Common hash functions include division, mid-square, and folding methods. Collision resolution techniques like chaining, linear probing, quadratic probing, and double hashing are used to handle collisions in the hash table. Hashing provides constant-time lookup and is widely used in applications like databases, dictionaries, and encryption.
The document discusses different hashing techniques used to store and retrieve data in hash tables. It begins by motivating the need for hashing through the limitations of linear and binary search. It then defines hashing as a process to map keys of arbitrary size to fixed size values. Popular hash functions discussed include division, folding, and mid-square methods. The document also covers collision resolution techniques for hash tables, including open addressing methods like linear probing, quadratic probing and double hashing as well as separate chaining using linked lists.
The document discusses hashing and hash tables. It defines hashing as a technique where the location of an element in a collection is determined by a hashing function of the element's value. Collisions can occur if multiple elements map to the same location. Common techniques for resolving collisions include chaining and open addressing. The Java Collections API provides several implementations of hash tables like HashMap and HashSet.
This document discusses disk structures and file handling. It describes different types of disks including floppy disks and hard disks. It explains disk structures such as tracks, sectors, cylinders and partitions. It also covers file allocation using file allocation tables, how DOS stores and reads files, file handles, errors and operations such as open, write, read, close and delete. File pointers are used to locate positions in a file.
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...NaveenPeter8
Hashing is a process that converts a key into a hash value using a hash function and mathematical algorithm. A good hash function uses a one-way hashing algorithm so the hash cannot be converted back to the original key. Hash tables store data in an array format using the hash value as an index, allowing very fast access if the index is known. Hashing provides constant-time search, insert, and delete operations on average and is widely used for applications like message digests, passwords, and data structures.
1. The document discusses searching and hashing algorithms. It describes linear and binary searching techniques. Linear search has O(n) time complexity, while binary search has O(log n) time complexity for sorted arrays.
2. Hashing is described as a technique to allow O(1) access time by mapping keys to table indexes via a hash function. Separate chaining and open addressing are two common techniques for resolving collisions when different keys hash to the same index. Separate chaining uses linked lists at each table entry while open addressing probes for the next open slot.
Hashing is a technique for mapping data to array indices to allow for fast insertion and search operations in O(1) time on average. It works by applying a hash function to a key to obtain an array index, which may cause collisions that require resolution techniques like separate chaining or open addressing. Open addressing resolves collisions by probing alternative indices using functions like linear probing, quadratic probing, or double hashing to find the next available empty slot.
Hashing is a technique used to store and retrieve information quickly by mapping keys to values in a hash table using a hash function. Common hash functions include division, mid-square, and folding methods. Collision resolution techniques like chaining, linear probing, quadratic probing, and double hashing are used to handle collisions in the hash table. Hashing provides constant-time lookup and is widely used in applications like databases, dictionaries, and encryption.
The document discusses different hashing techniques used to store and retrieve data in hash tables. It begins by motivating the need for hashing through the limitations of linear and binary search. It then defines hashing as a process to map keys of arbitrary size to fixed size values. Popular hash functions discussed include division, folding, and mid-square methods. The document also covers collision resolution techniques for hash tables, including open addressing methods like linear probing, quadratic probing and double hashing as well as separate chaining using linked lists.
The document discusses hashing and hash tables. It defines hashing as a technique where the location of an element in a collection is determined by a hashing function of the element's value. Collisions can occur if multiple elements map to the same location. Common techniques for resolving collisions include chaining and open addressing. The Java Collections API provides several implementations of hash tables like HashMap and HashSet.
This document discusses disk structures and file handling. It describes different types of disks including floppy disks and hard disks. It explains disk structures such as tracks, sectors, cylinders and partitions. It also covers file allocation using file allocation tables, how DOS stores and reads files, file handles, errors and operations such as open, write, read, close and delete. File pointers are used to locate positions in a file.
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...NaveenPeter8
Hashing is a process that converts a key into a hash value using a hash function and mathematical algorithm. A good hash function uses a one-way hashing algorithm so the hash cannot be converted back to the original key. Hash tables store data in an array format using the hash value as an index, allowing very fast access if the index is known. Hashing provides constant-time search, insert, and delete operations on average and is widely used for applications like message digests, passwords, and data structures.
1. The document discusses searching and hashing algorithms. It describes linear and binary searching techniques. Linear search has O(n) time complexity, while binary search has O(log n) time complexity for sorted arrays.
2. Hashing is described as a technique to allow O(1) access time by mapping keys to table indexes via a hash function. Separate chaining and open addressing are two common techniques for resolving collisions when different keys hash to the same index. Separate chaining uses linked lists at each table entry while open addressing probes for the next open slot.
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
The first part of a series of talks about modern algorithms and data structures, used by nosql databases like HBase and Cassandra. An explanation of Bloom Filters and several derivates, and Merkle Trees.
This document discusses hashing techniques for storing data in a hash table. It describes hash collisions that can occur when multiple keys map to the same hash value. Two primary techniques for dealing with collisions are chaining and open addressing. Open addressing resolves collisions by probing to subsequent table indices, but this can cause clustering issues. The document proposes various rehashing functions that incorporate secondary hash values or quadratic probing to reduce clustering in open addressing schemes.
O documento descreve a estrutura de dados Skip List, incluindo seus conceitos, operações como pesquisa, inserção e remoção, e espaço necessário. Skip List pode ser usada como alternativa às árvores balanceadas com complexidade de tempo comparável para a maioria das operações.
The document discusses and compares linear and binary search algorithms. Linear search sequentially checks each element of an unsorted array to find a target value, while binary search works on a sorted array by repeatedly calculating the midpoint and comparing the target to the value there to narrow the search range. It provides steps for performing a binary search, including sorting the array, calculating the midpoint, and updating the search range based on whether the target is less than, greater than, or equal to the midpoint value.
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
Auditing System Password Using L0phtcrackVishal Kumar
The objective of this presentation is to help peoples to learn how to use L0htCrack tool to attain and crack the user password from any Windows Machine.
At the beginning, the number of elements in a set of numbers to be stored in a computer system used to be not so large or having a wide range. Then, using a
simple table T [0, 1, ..., m − 1]called, direct-address table, could be used to store those numbers. As the situation became more and more complex, and a new idea came to be:
Definition
An associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of tuples {(key, value)}
This can bee seen in the example of dictionaries in any spoken language. The problem became more complex when the range of the possible values for the
keys at the tuples became unbounded. Therefore a new type of data structure is needed to avoid the sparsity problem in the data, the hash table.
Python lists allow storing heterogeneous data elements and are mutable. Lists use square brackets and store elements by index starting from 0. Common list operations include accessing elements, slicing, concatenation, replication, updating and deleting elements. Functions like min(), max() and len() operate on lists while methods such as append(), insert(), pop(), sort() modify lists.
Linear probing is a collision resolution technique for hash tables that uses open addressing. When a collision occurs by inserting a key-value pair, linear probing searches through consecutive table indices to find the next empty slot. This provides constant expected time for search, insertion, and deletion when using a random hash function. However, linear probing can cause clustering where many elements are stored near each other, degrading performance.
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
The first part of a series of talks about modern algorithms and data structures, used by nosql databases like HBase and Cassandra. An explanation of Bloom Filters and several derivates, and Merkle Trees.
This document discusses hashing techniques for storing data in a hash table. It describes hash collisions that can occur when multiple keys map to the same hash value. Two primary techniques for dealing with collisions are chaining and open addressing. Open addressing resolves collisions by probing to subsequent table indices, but this can cause clustering issues. The document proposes various rehashing functions that incorporate secondary hash values or quadratic probing to reduce clustering in open addressing schemes.
O documento descreve a estrutura de dados Skip List, incluindo seus conceitos, operações como pesquisa, inserção e remoção, e espaço necessário. Skip List pode ser usada como alternativa às árvores balanceadas com complexidade de tempo comparável para a maioria das operações.
The document discusses and compares linear and binary search algorithms. Linear search sequentially checks each element of an unsorted array to find a target value, while binary search works on a sorted array by repeatedly calculating the midpoint and comparing the target to the value there to narrow the search range. It provides steps for performing a binary search, including sorting the array, calculating the midpoint, and updating the search range based on whether the target is less than, greater than, or equal to the midpoint value.
An introduction to elasticsearch with a short demonstration on Kibana to present the search API. The slide covers:
- Quick overview of the Elastic stack
- indexation
- Analysers
- Relevance score
- One use case of elasticsearch
The query used for the Kibana demonstration can be found here:
https://github.com/melvynator/elasticsearch_presentation
Auditing System Password Using L0phtcrackVishal Kumar
The objective of this presentation is to help peoples to learn how to use L0htCrack tool to attain and crack the user password from any Windows Machine.
At the beginning, the number of elements in a set of numbers to be stored in a computer system used to be not so large or having a wide range. Then, using a
simple table T [0, 1, ..., m − 1]called, direct-address table, could be used to store those numbers. As the situation became more and more complex, and a new idea came to be:
Definition
An associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of tuples {(key, value)}
This can bee seen in the example of dictionaries in any spoken language. The problem became more complex when the range of the possible values for the
keys at the tuples became unbounded. Therefore a new type of data structure is needed to avoid the sparsity problem in the data, the hash table.
Python lists allow storing heterogeneous data elements and are mutable. Lists use square brackets and store elements by index starting from 0. Common list operations include accessing elements, slicing, concatenation, replication, updating and deleting elements. Functions like min(), max() and len() operate on lists while methods such as append(), insert(), pop(), sort() modify lists.
Linear probing is a collision resolution technique for hash tables that uses open addressing. When a collision occurs by inserting a key-value pair, linear probing searches through consecutive table indices to find the next empty slot. This provides constant expected time for search, insertion, and deletion when using a random hash function. However, linear probing can cause clustering where many elements are stored near each other, degrading performance.
Семинар 5. Многопоточное программирование на OpenMP (часть 5)
Лекция 6: Хеш-таблицы
1. Лекция 6
Хеш-таблицы
Курносов Михаил Георгиевич
к.т.н. доцент Кафедры вычислительных систем
Сибирский государственный университет
телекоммуникаций и информатики
http://www.mkurnosov.net/teaching
2. АТД “Словарь” (Dictionary)
2
Словарь (ассоциативный массив, associative array, map,
dictionary) – структура (контейнер) данных для хранения
пар вида “ключ – значение” (key – value)
Реализации словарей отличаются вычислительной
сложностью операций добавления (Add), поиска (Lookup)
и удаления элементов (Delete)
Наибольшее распространение получили следующие
реализации:
1. Деревья поиска (Search trees)
2. Хэш-таблицы (Hash tables)
3. Связные списки
4. Массивы
3. Хеш-таблицы (Hash tables)
3
Хеш-таблица (Hash table) – это структура данных
для хранения пар “ключ – значение”
Доступ к элементам осуществляется по ключу (key)
Ключи могут быть строками, числами, указателями
Хеш-таблицы позволяют в среднем за время О(1)
выполнять добавление, поиски и удаление элементов
4. Основная идея (мотивация)
4
Чем хороши статические массивы int v[100]?
Очень быстрый (за O(1)) доступ к элементу массива по его
ключу: v[17] = 45
Ограничение – ключи (индексы) это целые числа
Можно ли как-то использовать типы float, double,
строки (char []) в качестве индексов в массиве?
Пример: массив анкет пользователей Twitter (словарь),
ключ – login, значение – анкета (профиль с данными
пользователя)
Массив структур:
struct twitter_user users[MAX_USERS];
5. Хеш-таблицы (Hash tables)
5
Ключ
(Key)
Value:
160
“Антилопа Гну”
Хеш-таблица
(Hash table)
Хеш-функция
(Hash function)
hash(key) -> int
0
1
2
…
h – 1
Хеш-функция – отображает (преобразует)
ключ (key) в номер элемента (index) массива
(в целое число от 0 до h – 1)
Время вычисления хеш-функции зависит
от длины ключа и не зависит от количества
элементов в массиве
Ячейки массива называются buckets, slots
index
6. Хеш-таблицы (Hash tables)
6
Хеш-таблица
(Hash table)
0
1
2
…
h – 1
На практике, обычно известна информация
о диапазоне значений ключей
На основе этого выбирается размер h хеш-таблицы
и выбирается хеш-функция
Коэффициент заполнения хеш-таблицы
(load factor, fill factor) – отношение числа n
хранимых элементов в хеш-таблицы к размеру h
массива (среднее число элементов на одну ячейку)
α =
𝑛
ℎ
Пример: h = 128, в хеш-таблицу добавили 50
элементов,тогда = 50 / 128 0.39
От этого коэффициента зависит среднее время
выполнения операций добавления, поиска
и удаления элементов
7. 7
Хеш-функции (Hash function)
Хеш-функция (Hash function) – это функция преобразующая
значения ключа (например: строки, числа, файла) в целое число
Значение возвращаемое хеш-функцией называется хеш-кодом
(hash code), контрольной суммой (hash sum) или хешем (hash)
#define HASH_MUL 31
#define HASH_SIZE 128
// Хеш-функция для строк [KP, “Practice of Programming”]
unsigned int hash(char *s)
{
unsigned int h = 0;
char *p;
for (p = s; *p != '0'; p++)
h = h * HASH_MUL + (unsigned int)*p;
return h % HASH_SIZE;
} Thash = O(|s|)
8. Хеш-функции (Hash function)
Хеш-функция (Hash function) – это функция преобразующая
значения ключа (например: строки, числа, файла) в целое число
Значение возвращаемое хеш-функцией называется хеш-кодом
(hash code), контрольной суммой (hash sum) или хешем (hash)
#define HASH_MUL 31
#define HASH_SIZE 128
int main()
{
unsigned int h = hash(“ivanov”);
}
8
i v a n o v
105 118 97 110 111 118
9. Хеш-функции (Hash function)
#define HASH_MUL 31
#define HASH_SIZE 128
unsigned int hash(char *s) {
unsigned int h = 0;
char *p;
for (p = s; *p != '0'; p++)
h = h * HASH_MUL + (unsigned int)*p;
return h % HASH_SIZE;
}
9
h = 0 * HASH_MUL + 105
h = 105 * HASH_MUL + 118
h = 3373 * HASH_MUL + 97
h = 104660 * HASH_MUL + 110
h = 3244570 * HASH_MUL + 111
h = 100581781 * HASH_MUL + 118
return 3118035329 % HASH_SIZE // hash(“ivanov”) = 1
i v a n o v
105 118 97 110 111 118
10. Понятие коллизии (Collision)
10
Коллизия (Collision) – это совпадение значений хеш-функции
для двух разных ключей
Keys HashesHash function
Жираф
Слон
Муха Цеце
Волк
127
...
05
04
03
02
01
00
Collision!
hash(“Волк”) = 2 hash(“Жираф”) = 2
Существуют хеш-функции без коллизий –
совершенные хеш-функции (perfect hash function)
11. Разрешение коллизий (Collision resolution)
11
Метод цепочек (Chaining) – закрытая адресация
Элементы с одинаковым значением хеш-функции объединяются в
связный список. Указатель на список хранится в советующей ячейке хеш-
таблицы.
При коллизии элемент добавляется в начало списка.
Поиск и удаление элемента требуют просмотра всего списка.
Keys HashesHash function
Жираф
Слон
Муха Цеце
Волк
127
...
05
04
03
02
01
00
Муха Цеце NULL
Слон NULL
Жираф Волк
NULL
12. Разрешение коллизий (Collision resolution)
12
Открытая адресация (Open addressing)
В каждой ячейке хеш-таблицы хранится не указатель на связный
список, а один элемент (ключ, значение)
Если ячейка с индексом hash(key) занята, то осуществляется поиск
свободной ячейки в следующих позициях таблицы Hash Элемент
0 B
1
2
3 A
4 C
5 D
6
7
Линейное хеширование (linear probing) – проверяются
позиции:
hash(key) + 1, hash(key) + 2, …, (hash(key) + i) mod h, …
Если свободных ячеек нет, то таблица заполнена
Пример:
hash(D) = 3, но ячейка с иднексом 3 занята.
Присматриваем ячейки: 4 – занята, 5 – свободна.
13. Требования к хеш-функциям
13
Быстрое вычисление хэш-кода по значению ключа
Сложность вычисления хэш-кода не должна зависеть
от количества n элементов в хеш-таблице
Детерминированность – для заданного значения ключа
хэш-функция всегда должна возвращать одно и то же
значение
unsigned int hash(char *s) {
unsigned int h = 0;
char *p;
for (p = s; *p != '0'; p++)
h = h * HASH_MUL + (unsigned int)*p;
return h % HASH_SIZE;
}
14. Требования к хеш-функциям
14
Равномерность (uniform distribution) –
хеш-функция должна равномерно заполнять
индексы массива возвращаемыми номерами
Желательно, чтобы все хэш-коды формировались с
одинаковой равномерной распределенной вероятностью
Hash Элемент
0 C, F, G
1
2
3
4 B, E
5
6 A, D, H
7
A
B
C
D
E
F
G
H
Неравномерное
распределение
15. Требования к хеш-функциям
15
Равномерность (uniform distribution) –
хеш-функция должна равномерно заполнять
индексы массива возвращаемыми номерами
Желательно, чтобы все хэш-коды формировались с
одинаковой равномерной распределенной вероятностью
Hash Элемент
0 C
1 A
2 D
3 H
4 B
5 G
6 E
7 F
A
B
C
D
E
F
G
H
Равномерное
распределение
16. Эффективность хеш-таблиц
16
Хеш-таблица требует предварительной инициализации
ячеек значениями NULL – трудоемкость O(h)
Операция
Вычислительная
сложность
в среднем случае
Вычислительная
сложность
в худшем случае
Add(key, value) O(1) O(1)
Lookup(key) O(1 + n/h) O(n)
Delete(key) O(1 + n/h) O(n)
Min() O(n + h) O(n + h)
Max() O(n + h) O(n + h)
17. Пример хэш-функции для строк
17
unsigned int ELFHash(char *key, unsigned int mod)
{
unsigned int h = 0, g;
while (*key) {
h = (h << 4) + *key++;
g = h & 0xf0000000L;
if (g)
h ^= g >> 24;
h &= ~g;
}
return h % mod;
}
Используются только поразрядные операции
(для эффективности)
19. Пример хэш-функции для чисел
19
Ключи: размер файла – число.
Значение, хранимое в словаре: название файла.
Требуется разработать хеш-функцию
hash(filesize) [0..1023]
function hash(int filesize)
return filesize mod 1024
end function
20. Пример хэш-функции для строк
20
hash 𝑠 =
𝑖=0
𝐿−1
ℎ 𝐿−(𝑖+1) ∙ 𝑠 𝑖 =
= 𝑠[0]ℎ 𝐿−1 + 𝑠[1]ℎ 𝐿−2 + ⋯ + 𝑠 𝐿 − 2 ℎ + 𝑠 𝐿 − 1 ,
где s – ключ (строка), L – длина строки, s[i] – код символа i
21. Хеш-таблицы (Hash table)
21
Длину h хеш-таблицы выбирают как простое число
Для такой таблицы модульная хеш-функция дает
равномерное распределение значений ключей
hash(key) = key mod h
22. Хеш-таблицы vs. Бинарное дерево поиска
22
Операция
Hash table
(unordered map)
Binary search tree
(ordered map)
Add(key, value) O(m) O(nm)
Lookup(key) O(m + nm) O(nm)
Delete(key) O(m + nm) O(nm)
Min() O(m(n + h)) O(n)
Max() O(m(n + h)) O(n)
Эффективность реализации словаря хеш-таблицей
(метод цепочек) и бинарным деревом поиска
Ключ – это строка из m символов
Оценка сложности для худшего случая (worst case)
23. Хеш-таблицы vs. Бинарное дерево поиска
23
Операция
Hash table
(unordered map)
Binary search tree
(ordered map)
Add(key, value) O(m) O(mlogn)
Lookup(key) O(m + mn/h) O(mlogn)
Delete(key) O(m + mn/h) O(mlogn)
Min() O(m(n + h)) O(logn)
Max() O(m(n + h)) O(logn)
Эффективность реализации словаря хеш-таблицей
(метод цепочек) и бинарным деревом поиска
Ключ – это строка из m символов
Оценка сложности для среднего случая (average case)
32. Домашнее чтение
32
Прочитать в [Sedgewick2001, С. 575] про хеш-функции для
вещественных чисел
Прочитать в “Практике программирования” [KP]
информацию о хеш-таблицах
Прочитать “Глава 11. Хеш-таблицы” [CLRS, С. 282]
Perfect hash function (Совершенная хеш-функция)