hashing

A hash table contains a set of key-value pairs. The access functions for hash tables accept a key and retrieve the corresponding values. Here are the details.

The keys and values are stored in the hash table. This means that the function hash is applied to each key to produce an integer in a deterministic way, and this integer is used to determine which bucket the key and value will be stored in. It is essential that the hash code of the key never change, for otherwise the next attempt to find the key in the hash table have an unpredictable result - the new hash code of the key may or may not lead to the same bucket, and the value may or may not be located.

Some hash tables and lists mutable, i.e., their contents can be altered by assignment statements. As explained above, the hash code of a mutable thing is not permitted to change when the contents of the thing change. Hence, the algorithm for computing the hash code may not refer to the contents of a mutable thing.

The strong comparison operator === is provided to parrot the equality testing that occurs implicitly when accessing a key in a hash table. Hence, since the hash codes for mutable things cannot depend on the contents the things, neither can this strong test for equality. We may call such things enclosed, because the contents are hidden from the equality testing function. For mutable things, the test for equality must be the same as equality of the hash codes. A reasonable choice for such a hash code would thus be the address in memory of the thing. But this address can change depending on environmental factors not under the control of the interpreter, so we use something equivalent: a counter with initial value 1000000 is incremented each time a mutable thing is created, and its value is taken as the hash code of the thing and stored within it.

It is essential to have some hash tables for which equality amounts to equality of the contents. This cannot be achieved for mutable hash tables, but we do achieve it for non-mutable hash tables -- the hash code is computed directly from the contents of the thing in a deterministic way. This allows us to implement the notion of polynomial, say, as a hash table -- the keys can be the monomials (necessarily non-mutable) and the values can be the coefficients. The notion of monomial can be implemented as a hash table where the keys are the variables and the values are the corresponding exponents.

One further comforting remark: the routines that compute hash codes or strong equality do not get into infinite loops, despite the existence of circular structures: any circular structure must come into being by means of changing something, and so the circular loop in the structure necessarily involves a mutable thing, whose contents are not examined by the routines.

See also HashTable.

Go to main index.

Go to concepts index.