Cuckoo Hashing
Source. Pagh and Rodler. Cuckoo hashing. J. Alg. 51(2):122-144.
2004.
Notation.
• Keys are drawn from some universe U.
• The two tables, T and T , both have length r. The total
1 2
capacity of the hash table is 2r.
• The hash table contains n entries, where r ≥ (1+ϵ)n for
some constant ϵ > 0.
• Hash functions h and h have been chosen for tables T and
1 2 1
T , respectively (properties to be discussed later).
2
1/13
Cuckoo Hashing
QUICK EXERCISE 1. Start from an empty cuckoo hash table
with r = 6, and insert the keys 5, 10, 15, 20, 25, 30 in that order.
Use the hash values shown below. Draw the final states of T and
1
T .
2
x 5 10 15 20 25 30
h (x) 0 1 0 1 0 1
1
h (x) 5 4 4 3 3 2
2
Now insert the key 35, where h (35) = 0 and h (35) = 2.
1 2
Compare the process with the discussion at the top of page 6 in
the pdf/page 127 in the journal. At what point do you know that
you are in a loop?
2/13
Cuckoo Hashing
Answer
Here is what I get after inserting the first six keys.
T 15 20
1
0 1 2 3 4 5
T 30 25 10 5
2
0 1 2 3 4 5
And here is the sequence of steps due to inserting 35:
nestless key 35 15 10 20 25 35 30
removed from new T T T T T T
1,0 2,4 1,1 2,3 1,0 2,2
moved to T T T T T T T
1,0 2,4 1,1 2,3 1,0 2,2 1,1
paper notation i j ℓ
A loop is guaranteed the second time a cell is revisited.
3/13
Cuckoo Hashing
More notation. Let M be the maximum number of iterations
allowed before a rehash with new hash functions must be
performed. This quantity is denoted by MaxLoop in the paper.
Section 2.3 of the paper is concerned with estimating the
probability that the insertion algorithm performs more than t
iterations, for some arbitrary t.
Suppose an execution of the insert algorithm results in a sequence
x ,x ,x ... of nestless keys.
1 2 3
Scenarios in which the tth iteration occur are divided into three
cases.
(1) The hash functions behave badly.
(2) The insert algorithm goes into a loop.
(3) The insert algorithm does not go into a loop.
4/13
Cuckoo Hashing
Case 1. By a result drawn from another paper, it is possible to
construct a family of hash functions such that the probability of
this case is O( 1 ).
n2
I could look up this other paper if I had to, but for the moment I’ll
take their word for it.
Remaining cases. Assume Case 1 does not occur. Then –
according to the quoted result – the family of hash functions is (or
behaves like it is) (1,M)-universal on any set of r2 keys. That is,
for all distinct x ,...,x ∈ U, all y ,...,y ∈ R not necessarily
1 M 1 M
distinct, and uniformly randomly selected h,
1
P(h(x ) = y ∧...∧h(x ) = y ) ≤ .
1 1 M M rM
Pause and remind yourself of why a (1,M)-universal family is also
(1,v)-universal for any v < M.
5/13
Cuckoo Hashing
Case 2. Suppose that there is a sequence of v distinct keys that
form a closed loop. The paper claims that the number of such
sequences is bounded above by the expression
v3rv−1nv−1.
QUICK EXERCISE 2. In your own words, justify this upper
bound by explaining what each factor represents.
The paper then claims that the probability of any such sequence
occurring is bounded above by 1 . Explain this upper bound.
r2v
6/13
Cuckoo Hashing
Answer
• A loop consists of four phases:
• keys x ,...,x, in which the algorithm never revisits a cell
1 i
• keys x ,...,x =x, where key x is the first key to revisit
i+1 j i j−1
a cell, namely that of x
i
• keys x =x ,...,x =x , in which the first phase of
j+1 i−1 i+j−1 1
the insert routine undoes itself
• keys x ,...,x , where x is the next key to revisit a cell.
i+j ℓ ℓ
Roughly speaking, we can choose the lengths of phases 1, 2
and 4; there are fewer than v ×v ×v = v3 ways to do so.
• The v entries in the loop must share only v −1 locations in
the tables. Since each table has length r, there are fewer than
rv−1 ways to select these locations.
To think about: why rv−1 and not (2r)v−1?
7/13
Cuckoo Hashing
• Key x has to take part in the loop. There are fewer than
1
nv−1 to choose the remaining v −1 keys.
The product of all of these approximations gives the stated
expression, v3rv−1nv−1.
Recall that we uniformly randomly select two hash functions, h
1
and h . To produce a particular loop, both h and h have to send
2 1 2
each of the v keys to specific values in R. Since both h and h are
1 2
(1,v)-universal by assumption, the probabililty of this occurring is
1 1 1
× = .
rv rv r2v
(cid:124)(cid:123)(cid:122)(cid:125) (cid:124)(cid:123)(cid:122)(cid:125)
forh1 forh2
8/13
Cuckoo Hashing
QUICK EXERCISE 3. Using the above analysis of Case 2, the
paper concludes that the probability of any closed loop occurring is
bounded above by
∞ ∞ (cid:18) (cid:19)
(cid:88) 1 (cid:88) (cid:16)n(cid:17)v 1
v3rv−1nv−1r−2v ≤ v3 ∈ O .
rn r n2
v=3 v=3
Justify this asymptotic estimate. Hint: recall what ϵ is.
Here is a trick: if 0 < p < 1 and m is a nonnegative integer, then
to leading order,
∞
(cid:88) 1
vmpv ∼ .
(1−p)m+1
v=0
9/13
Cuckoo Hashing
Answer
Recall that r ≥ (1+ϵ)n, which implies that n ≤ 1 . We find that
r 1+ϵ
(cid:88)∞
(cid:16)n(cid:17)v
(cid:88)∞ (cid:18)
1
(cid:19)v (cid:32) (cid:18) 1(cid:19)4(cid:33)
v3 ≤ v3 ∈ O 1+ ,
r 1+ϵ ϵ
v=3 v=0
which is constant with respect to n so long as we resize the table
as needed to maintain the appropriate load factor.
Further,
(cid:18) (cid:19)
1 1 1
≤ ∈ O .
rn n2(1+ϵ) n2
Combining these two results shows that
∞ (cid:18) (cid:19)
(cid:88) 1
v3rv−1nv−1r−2v ∈ O .
n2
v=3
10/13
Cuckoo Hashing
More to think about!
• Case 3. This calculation is a simpler version of the one in
Case 2.
• Read Lemma 1 in page 6 of the pdf/127 of the journal.
Persuade yourself that you agree with it.
In the case p ≥i +j in the proof, why do both of the
candidate sequances start with an occurrence of the key x ?
1
• Apply Lemma 1 to the analysis of Case 3, and justify the claim
that the probability of Case 3 is bounded above by
2(1+ϵ)−(2t−1)/3+1.
11/13
Cuckoo Hashing
• This section of the paper concludes by calculating the
expected value of the number of iterations in one execution of
the insert algorithm.
Let X be a discrete random variable that can take on the
values 1,2,3..., each with a certain probability. The usual
formula for the expected value of X is
∞
(cid:88)
E(X) = x ·P(X = x).
x=1
The calculation in equation (3) of the paper (page 9 of the
pdf/129 in the journal) is clearly not applying this formula.
What formula is being used instead? Why does it work?
12/13
Cuckoo Hashing
(cid:6) (cid:7)
• In your own words, justify the choice of M = 3log r as
1+ϵ
given at the bottom of page 9 of the pdf/129 of the journal.
• The last few paragraphs in this section talk through an
amortized analysis argument for the overall amortized cost of
the insert algorithm, including those cases where the table
must be rehashed. Read this argument and make sure you are
following the key points.
13/13