Hash Tables
We are implementing the Dictionary ADT – specifically, the insert,
search, and delete operations. Assume that the keys in the
dictionary are drawn from the set U.
Recall that a hash table consists of the following.
• An underlying array A of length r. Let R = {0,1,...,r −1},
the set of all valid indices for entries in A.
• A hash function h : U → R.
• A choice of collision management strategy.
For today, we will consider collision management using separate
chaining. Each entry A in A is a linked list. If the key x ∈ U is
i
added to the dictionary, and h(x) = i, then this key is inserted at
the front of linked list A .
i
1/17
Dictionaries – Expected Run Times
In COMP 2140, you may have heard a statement like:
“The insert, search and delete operations can be implemented in
O(1) time using a hash table.”
And then that statement might have been qualified with something
like:
“...on average? If the hash function is good/the hash table is big
enough/maybe other conditions? Probably? But you can just call
it O(1)!”
Let’s see if we can be a bit more precise today.
2/17
Dictionaries – Expected Run Times
What properties make a hash function h into a “good” hash
function?
• Patterns in the keys do not give rise to patterns in the hash
values. h randomizes its inputs (or something close to it).
• For all x ∈ U, h(x) can be computed quickly.
For the moment, assume that we have somehow created an
independent, uniform hash function.
• For any x ∈ U, h(x) is equally likely to be any one of
0,1,...,r −1.
• For any distinct x ,x ∈ U, the values of h(x ) and h(x ) are
1 2 1 2
independent.
• For any x ∈ U, h(x) can be computed in O(1) time.
3/17
Dictionaries – Expected Run Times
Assume that there are currently n keys in the dictionary. Label
these keys
x ,x ,...,x ,x
1 2 n−1 n
in the order in which they were inserted.
Define α = n, and note that α could be greater than 1. α is called
r
the load factor of the hash table.
Under all of these assumptions, what are the time complexities of
the insert, search and delete operations?
• In the worst case, insert(x) has to compute h(x) and insert
key x at the head of linked list A . Both of these are O(1)
h(x)
operations, so insert is guaranteed to run in O(1) time in the
worst case.
• Search and delete, on the other hand, might have to traverse
all or part of one of the linked lists.
4/17
Dictionaries – Expected Run Times
QUICK EXERCISE 1. Fix an arbitrary j ∈ R. Under all of the
stated assumptions, what is the expected value of the length of
linked list A ?
j
Solution (the quick way). Treat the insertions of keys x ,...,x
1 n
as n trials, where a trial is successful if x is added to linked list A .
i j
From our assumptions about h, the trials are independent, and the
probability of success is 1.
r
The number of successes follows a binomial distribution whose
expected value is n× 1 = α.
r
5/17
Dictionaries – Expected Run Times
Expected run times:
• An unsuccessful search has to evaluate h and traverse an
entire linked list. The expected run time is Θ(1+α).
• The run time of a successful search clearly bounded above by
that of an unsuccessful search. Can we establish a lower
bound?
6/17
Dictionaries – Expected Run Times
QUICK EXERCISE 2. Let i be fixed but arbitrary, 1 ≤ i ≤ n,
and consider a search for x . The number of entries in linked list
i
A that need to be checked before x is found equals the
h(xi) i
number of keys from among x ,...,x that also hashed to h(x ).
i+1 n i
For all j > i, define
(cid:26)
1, h(x ) = h(x ),
X = j i
ij 0, otherwise.
Let N be the number of entries in linked list A that precede x .
i h(xi) i
Find E(X ), then use that result to find E(N ). Finally, find the
ij i
expected running time of a successful search by averaging over i
and adding any additional steps. Assume that each value of i is
equally likely.
7/17
Dictionaries – Expected Run Times
Solution
From the uniform random nature of h, we see that
P(X = 1) = 1, and therefore
ij r
1
E(X ) = .
ij
r
The number of entries in linked list A that precede x is
h(xi) i
n
(cid:88)
N = X .
i ij
j=i+1
By the linearity of expected value, we get
n
(cid:88) n−i
E(N ) = E(X ) = .
i ij
r
j=i+1
8/17
Dictionaries – Expected Run Times
Now we average E(N ) over i, assuming each value of i occurs
i
with probability 1:
n
n n
1 (cid:88) 1 (cid:88) n−i
E(N ) =
i
n n r
i=1 i=1
1 n(n−1)
= ×
nr 2
n−1 1 1
= = α− α.
2r 2 2n
The above calculation shows that the expected number of entries
to check before finding the target of the search is Θ(α). A
successful search also has to evaluate h and locate the target value
itself, both of which are constant-time operations.
We conclude that the expected run time of a successful search is
Θ(1+α).
9/17
Dictionaries – Expected Run Times
The analysis of delete is identical to that of search. Thus two of
our three operations have expected run times in Θ(1+α).
Recall that we wanted O(1) expected run times. This is achievable
if α ∈ O(1), or in other words, if n ∈ Θ(r).
When collisions are managed using separate chaining, n can exceed
the size r of the underlying array, but it must be bounded above by
some constant multiple of r. Hash tables that get too full should
be resized in order to maintain this condition.
10/17
Dictionaries – Hash Functions In Practice
Here is a hash function that you may have seen in COMP 2140.
Convert a key x ∈ U to an integer if necessary, then define
h(x) = x mod r,
where the array size r is a prime number. This method for hashing
keys is called the division method.
In light of the preceding discussion, what are some drawbacks to
this method?
One problem is the condition that the table size has to be a prime
number. To maintain O(1) expected run times for our operations,
we might have to resize the table after a long sequence of
insertions.
But we can’t just double the table size: we have to find a new,
larger prime number! That might get complicated.
11/17
Dictionaries – Hash Functions In Practice
Here is an alternative hash function, called the multiplication
method. Choose an irrational constant 0 < a < 1. Again, convert
x ∈ U to an integer if needed, and define
h(x) = ⌊r(ax mod 1)⌋.
A favored choice for a is the golden ratio:
√
5−1
a = ≈ 0.618033....
2
With this approach, there is no longer any motivation to choose r
to be prime. In fact, if r is an integer power of 2, then the
multiplication by r is equivalent to a bit shift and can be done
quite quickly.
12/17
Dictionaries – Hash Functions In Practice
However, there is another drawback that I can’t fix by changing
my hash function. If an adversary knows what hash function h I
am using, they can choose x ,...,x that all hash to the same
1 n
value. My hash table will turn into a linked list, and my expected
run times for search and delete will go up to Θ(n).
The solution is to use a family H of hash functions, all of which
have good properties. Each time I create a new hash table, I select
one function from that family randomly.
13/17
Dictionaries – Hash Functions In Practice
Suppose h ∈ H is selected uniformly randomly. Some terminology:
• H is called uniform if for all x ∈ U and all y ∈ R,
P(h(x) = y) = 1.
r
• H is called universal if for all distinct x ,x ∈ U,
1 2
P(h(x ) = h(x )) ≤ 1.
1 2 r
• As defined in the paper that we will discuss on Friday: H is
called (c,k)-universal if for all distinct x ,...,x ∈ U and for
1 k
all y ,...,y ∈ R, not necessarily distinct,
1 k
c
P(h(x ) = y ∧...∧h(x ) = y ) ≤ .
1 1 k k rk
These properties aren’t quite as good as a uniformly random hash
function, but – as we will see on Friday – they can be good enough
to preserve the desired properties of a hash table.
(Also–unliketheuniformlyrandomhashfunction–wecanactuallycreatetheminpractice.)
14/17
Dictionaries – Hash Functions In Practice
QUICK EXERCISE 3. Suppose H is (c,k)-universal for some c
and some k ≤ |U|. Show that H is also (c,k −1)-universal.
Solution
Let x ,...,x ∈ U be distinct. Let y ,...,y ∈ R be arbitrary.
1 k−1 1 k−1
Choose x ∈ U that is distinct from all of x ,...,x . This is
k 1 k−1
possible since k ≤ |U|.
Let h ∈ H be chosen uniformly randomly. h(x ) must be one of
k
0,1,...,r −1, and these are mutually exclusive cases. Thus
P(h(x )=y , j =1,...,k −1)
j j
= P((h(x )=y , j =1,...,k −1)∧h(x )=0)
j j k
+P((h(x )=y , j =1,...,k −1)∧h(x )=1)
j j k
+ ...
+P((h(x )=y , j =1,...,k −1)∧h(x )=r −1)
j j k
15/17
Dictionaries – Hash Functions In Practice
There are r terms in the sum on the right-hand side. By the
assumption that H is (c,k)-universal, each one is ≤ c .
rk
Consequently,
c c
P(h(x ) = y , j = 1,...,k −1) ≤ r × = ,
j j rk rk−1
which proves that H is (c,k −1)-universal.
By induction, it is immediate that if H is (c,k)-universal, then H
is also (c,v)-universal for any v < k.
JUST FOR FUN EXERCISE. Why does (c,k)-universality imply
universality for smaller k, but not for larger k?
Let U = R, and make one of the functions in H the identity:
h (x) = x for all x ∈ U. Can you complete the definition of H so
0
that H is (1,1)-universal but not (1,2)-universal?
16/17
Upcoming: Let’s Read A Paper Friday
Before class on Fri Jan 10, please read sections 1 and 2 of the
paper “Cuckoo hashing” by Pagh and Rodler, which is posted on
UM Learn under Content - Papers. Sections 1 and 2 are located on
pages 1 – 9 of the pdf document (pages 122 – 130 of the journal).
Pagh also published a simplified description of cuckoo hashing,
which you might find to be a useful reference. This document can
be found on UM Learn in the same location.
You don’t have to follow every detail of the analysis. Just
familiarize yourself with the overall idea. We will discuss some of
the results in class on Friday.
17/17