Kukushkino hashing

An example of cuckoo hashing. Arrows indicate the alternate key position. The new value that is inserted into cell A, pushing A into the alternative cell occupied by key B, the value B is transferred to an alternative place that is not currently occupied. Inserting a new value into cell H fails — H enters the loop (along with W), so the element just inserted must be pushed out.

Kukushkino hashing is a programming scheme for solving collisions of hash values in a table with a constant sampling time in the . The name comes from the behavior of some cuckoo species, when the cuckoo chick pushes eggs or other chicks out of the nest immediately after it hatches. The same happens in the cuckoo hash, when inserting a new key into a table can push an old key to another place in the table.

Content

History

Kukushkino hashing was first described by Rasmus Page and Fleming Frisch Rodler in 2001 ^[1] .

Operations

Kukushkino hashing is a form of , in which each non-empty hash table cell contains a key or . The hash function is used to determine the location for each key, and its presence in the table (or the value associated with it) can be found by checking this cell in the table. However, public addressing suffers from collisions that occur when more than one key falls into a single cell. The basic idea of cuckoo hashing is to resolve collisions by using two hash functions instead of one. This provides two possible positions in the hash table for each key. In one of the usual variants of the algorithm, the hash table is divided into two smaller tables of smaller size and each hash function gives an index to one of these two tables. It is also possible to provide indexing within one table for both hash functions.

The sample requires viewing only two places in the hash table, which requires constant time at worst ( see “O” large and “o” small ). This contrasts with many other hash-table algorithms that do not provide constant sampling time in the worst case. Deletion can also be done by clearing a cell containing a key in constant time in the worst case, which is easier than in other schemes, such as linear sounding .

When a new key is inserted and one of the two cells is empty, the key can be placed in this cell. In the case when both cells are occupied, it is necessary to move other keys to other places (or, on the contrary, to their previous places) in order to make room for the new key. A greedy algorithm is used — the key is placed in one of the possible positions, “pushing out” any key that was in that position. The ejected key is then placed in its alternate position, again pushing out any key that might have been there. The process continues until there is an empty position. However, it is possible that the insertion process fails, falling into an infinite loop, or when a chain is too long (longer than a predetermined threshold, depending logarithmically on the length of the table). In this case, the hash table is rebuilt with new hash functions :

There is no need to place a new table for re-hashing - we can simply browse the table to delete and re-insert all keys that are not in the position in which they should have been. ^[one]

Pagh & Rodler, "Cuckoo Hashing"

Theory

The expected insertion time is constant ^[1] , even if we take into account the possible need for restructuring the table, while the number of keys is less than half the capacity of the hash table, i.e. less than 50%.

To ensure this, a random graph theory is used — a non-oriented graph called a “cuckoo graph” can be formed, in which the vertices are cells of the hash table, and the edges for each hash are joined by two possible positions (cells of the hash table). Then the greedy algorithm for inserting a set of values into a cuckoo hash table succeeds if and only if the cuckoo graph for this set of values is a pseudo-tree , a graph with a maximum of one cycle in each connected component . Any vertex generated subgraph with the number of edges greater than the number of vertices corresponds to the set of keys for which the number of slots in the hash table is not enough. If the hash function is chosen randomly, the kukushkin graph will be a random graph in . With a high degree of probability, for a random graph in which the ratio of the number of edges to the number of vertices is bounded above 1/2, the graph is pseudo-forest and the cuckoo hashing algorithm successfully locates all the keys. Moreover, the same theory proves that the expected size of the components of the cuckoo graph's connectivity is small, which ensures a constant expected insertion time ^[2] .

Example

Let the following two hash functions be given:

$h\left(k\right)=k\mod 11$ ${\ displaystyle h \ left (k \ right) = k \ mod 11}$
$h'\left(k\right)=\left\lfloor {\frac {k}{11}}\right\rfloor \mod 11$ ${\ displaystyle h '\ left (k \ right) = \ left \ lfloor {\ frac {k} {11}} \ right \ rfloor \ mod 11}$

k	h (k)	h '(k)
20	9	one
50	6	four
53	9	four
75	9	6
100	one	9
67	one	6
105	6	9
3	3	0
36	3	3
39	6	3

The columns in the following two tables show the state of the hash table after inserting the elements.

1. table for h (k)
	20	50	53	75	100	67	105	3	36	39
0
one					100	67	67	67	67	100
2
3								3	36	36
four
five
6		50	50	50	50	50	105	105	105	50
7
eight
9	20	20	53	75	75	75	53	53	53	75
ten

2. table for h '(k)
	20	50	53	75	100	67	105	3	36	39
0									3	3
one			20	20	20	20	20	20	20	20
2
3										39
four				53	53	53	50	50	50	53
five
6							75	75	75	67
7
eight
9						100	100	100	100	105
ten

Cycles

If you want to insert element 6, you will get an infinite loop. In the last row of the table we find the same initial situation as in the beginning.

$h\left(6\right)=6\mod 11=6$ ${\ displaystyle h \ left (6 \ right) = 6 \ mod 11 = 6}$
$h'\left(6\right)=\left\lfloor {\frac {6}{11}}\right\rfloor \mod 11=0$ ${\ displaystyle h '\ left (6 \ right) = \ left \ lfloor {\ frac {6} {11}} \ right \ rfloor \ mod 11 = 0}$

key	table 1		table 2
	the old value	new value	the old value	new value
6	50	6	53	50
53	75	53	67	75
67	100	67	105	100
105	6	105	3	6
3	36	3	39	36
39	105	39	100	105
100	67	100	75	67
75	53	75	50	53
50	39	50	36	39
36	3	36	6	3
6	50	6	53	50

Variations

Some variations of cuckoo hashing were studied, mainly with the aim of improving the use of space by increasing . In these embodiments, a loading threshold of more than 50% can be achieved. Some of these methods can be used to significantly reduce the number of data structure changes required.

From a generalization of cuckoo hashing that uses more than two hash functions, one can expect a better use of the hash table, sacrificing some speed of sampling and insertion. The use of three hash functions increases the load factor to 91% ^[3] . Another generalization of cuckoo hashing, called block cuck hashing , contains more than one key per cell. Using two keys per cell allows you to increase the load above 80% ^[4] .

Another variant that was studied is a cuckoo hashing with a margin . A “stock” is an array of keys of constant length that is used to store keys that cannot be successfully inserted into the master hash table. This modification reduces the number of failures to a reverse polynomial function with a degree that can be arbitrarily large by increasing the size of the stock. However, a large stock means a slower search for a key that is not in the main table, or if it is in stock. The stock can be used in combination with more than two hash functions or with block cuckoo hashing to obtain both a high load level and a small number of insertion failures ^[5] . The analysis of cuckoo hashing with a reserve has spread to practical hash functions, not only random models of hash functions used in theoretical analysis of hashing ^[6] .

Some researchers suggest using a simplified generalization of cuckoo hashing, called asymmetric associative cache , in some processor caches . ^[7]

Comparison with similar structures

There are other algorithms that use several hash functions, such as the Bloom filter , a memory efficient data structure for fuzzy sets. An alternative data structure for tasks with the same fuzzy sets, based on cuckoo hashing, called a cuckoo filter, uses even less memory and (unlike classical Bloom filters) allows element deletion, not just insertion and existence checking. However, the theoretical analysis of these methods was carried out significantly weaker than the analysis of Bloom filters ^[8] .

Studies conducted by Zhukovsky, Heman and Bonz ^[9] showed that cuckoo hashing is significantly faster than the chain method for small hash tables that are in the cache of modern processors. Kenneth Ross ^[10] showed a block version of cuckoo hashing (the block contains more than one key), which works faster than the usual methods for large hash tables in the case of a high load factor. The speed of the block version of the cuckoo hash table was later investigated by Askitis ^[11] compared to other caching schemes.

Mutzemacher's review ^[3] presents open problems with cuckoo hashing.

Notes

↑ ¹ ² ³ Pagh, Rodler, 2001 , p. 121–133.
↑ Kutzelnigg, 2006 , p. 403–406.
↑ ¹ ² Mitzenmacher, 2009 .
↑ Dietzfelbinger, Weidling, 2007 , p. 47–68.
↑ Kirsch, Mitzenmacher, Wieder, 2010 , p. 1543–1561.
↑ Aumüller, Dietzfelbinger, Woelfel, 2014 , p. 428–456.
↑ "Micro-Architecture" .
↑ Fan, Kaminsky, Andersen, 2013 , p. 36–40.
↑ Zukowski, Heman, Boncz, 2006 .
↑ Ross, 2006 .
↑ Askitis, 2009 , p. 113-122.

Literature

Rasmus Pagh, Flemming Friche Rodler. Cuckoo Hashing // Algorithms - ESA 2001. - 2001. - Vol. 2161. - P. 121–133. - (Lecture Notes in Computer Science). - ISBN 978-3-540-42493-2 . - DOI : 10.1007 / 3-540-44676-1_10 .
Reinhard Kutzelnigg. Fourth Colloquium on Mathematics and Computer Science. - 2006. - T. AG. - p. 403–406. - (Discrete Mathematics and Theoretical Computer Science).
M. Mitzenmacher. Proceedings of the 17th Annual European Symposium on Algorithms (ESA). - 2009.
Martin Dietzfelbinger, Christoph Weidling. Balanced allocation and dictionaries with tightly packed constant size bins // Theoret. Comput. Sci .. - 2007. - T. 380 , no. 1-2 . - p . 47–68 . - DOI : 10.1016 / j.tcs.2007.02.02.054 .
Adam Kirsch, Michael D. Mitzenmacher, Udi Wieder. More robust hashing: cuckoo hashing with a stash // SIAM J. Comput .. - 2010. - V. 39 , no. 4 - p . 1543–1561 . - DOI : 10.1137 / 080728743 .
Martin Aumüller, Martin Dietzfelbinger, Philipp Woelfel. Explicit and efficient hash strings for cuckoo hashing with a stash // Algorithmica. - 2014. - T. 70 , no. 3 - p . 428–456 . - DOI : 10.1007 / s00453-013-9840-x .
Bin Fan, Michael Kaminsky, David Andersen. Cuckoo Filter: Better Than Bloom //; login :. - USENIX, 2013. - V. 38 , no. 4 - p . 36–40 .
Marcin Zukowski, Sandor Heman, Peter Boncz. Architecture-Conscious Hashing. - Proceedings of International Workshop on Data Management on New Hardware (DaMoN), 2006.
Kenneth Ross. Efficient Hash Probes on Modern Processors. - IBM Research Report RC24100, 2006.
Nikolas Askitis. Fast and Compact Hash Tables for Integer Keys. - Proceedings of the 32nd Australas Computer Science Conference (ACSC 2009). - 2009. - T. 91. - p. 113–122. - ISBN 978-1-920682-72-9 .

Links

U. Erlingsson, M. Manasse, F. Mcsherry, 2006.
Cuckoo Hashing for Undergraduates, 2006 , R. Pagh, 2006.
Cuckoo Hashing, Theory and Practice (Part 1, Part 2 and Part 3 ), Michael Mitzenmacher, 2007.
Moni Naor, Gil Segev, Udi Wieder. International Colloquium on Automata, Languages and Programming (ICALP). - Reykjavik, Iceland, 2008.
Cuckoo Hashing Algorithmic Improvements for Fast Concurrent , X. Li, D. Andersen, M. Kaminsky, M. Freedman. EuroSys 2014.

Examples

[_427266838b550b3c-1] ¹ ² ³ Pagh, Rodler, 2001 , p. 121–133.

[_489279d25a075f4d-2] Kutzelnigg, 2006 , p. 403–406.

[_79ab71ecc71efd1f-3] ¹ ² Mitzenmacher, 2009 .

[_623fe03d854a8bbb-4] Dietzfelbinger, Weidling, 2007 , p. 47–68.

[_3e425c9b92244afa-5] Kirsch, Mitzenmacher, Wieder, 2010 , p. 1543–1561.

[_9c24c81ecb5f7a0d-6] Aumüller, Dietzfelbinger, Woelfel, 2014 , p. 428–456.

[7] "Micro-Architecture" .

[_5887f09cc5d371ff-8] Fan, Kaminsky, Andersen, 2013 , p. 36–40.

[_0b69433823b1d481-9] Zukowski, Heman, Boncz, 2006 .

[_262327bd7f0b45b4-10] Ross, 2006 .

[_b0a71b3d7a12278f-11] Askitis, 2009 , p. 113-122.