程序辅导案例 > Program >

代写辅导接单-Cuckoo Hashing

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Cuckoo Hashing

Source. Pagh and Rodler. Cuckoo hashing. J. Alg. 51(2):122-144.

2004.

Notation.

• Keys are drawn from some universe U.

• The two tables, T and T , both have length r. The total

1 2

capacity of the hash table is 2r.

• The hash table contains n entries, where r ≥ (1+ϵ)n for

some constant ϵ > 0.

• Hash functions h and h have been chosen for tables T and

1 2 1

T , respectively (properties to be discussed later).

1/13

Cuckoo Hashing

QUICK EXERCISE 1. Start from an empty cuckoo hash table

with r = 6, and insert the keys 5, 10, 15, 20, 25, 30 in that order.

Use the hash values shown below. Draw the final states of T and

T .

x 5 10 15 20 25 30

h (x) 0 1 0 1 0 1

h (x) 5 4 4 3 3 2

Now insert the key 35, where h (35) = 0 and h (35) = 2.

1 2

Compare the process with the discussion at the top of page 6 in

the pdf/page 127 in the journal. At what point do you know that

you are in a loop?

2/13

Cuckoo Hashing

Answer

Here is what I get after inserting the first six keys.

T 15 20

0 1 2 3 4 5

T 30 25 10 5

0 1 2 3 4 5

And here is the sequence of steps due to inserting 35:

nestless key 35 15 10 20 25 35 30

removed from new T T T T T T

1,0 2,4 1,1 2,3 1,0 2,2

moved to T T T T T T T

1,0 2,4 1,1 2,3 1,0 2,2 1,1

paper notation i j ℓ

A loop is guaranteed the second time a cell is revisited.

3/13

Cuckoo Hashing

More notation. Let M be the maximum number of iterations

allowed before a rehash with new hash functions must be

performed. This quantity is denoted by MaxLoop in the paper.

Section 2.3 of the paper is concerned with estimating the

probability that the insertion algorithm performs more than t

iterations, for some arbitrary t.

Suppose an execution of the insert algorithm results in a sequence

x ,x ,x ... of nestless keys.

1 2 3

Scenarios in which the tth iteration occur are divided into three

cases.

(1) The hash functions behave badly.

(2) The insert algorithm goes into a loop.

(3) The insert algorithm does not go into a loop.

4/13

Cuckoo Hashing

Case 1. By a result drawn from another paper, it is possible to

construct a family of hash functions such that the probability of

this case is O( 1 ).

I could look up this other paper if I had to, but for the moment I’ll

take their word for it.

Remaining cases. Assume Case 1 does not occur. Then –

according to the quoted result – the family of hash functions is (or

behaves like it is) (1,M)-universal on any set of r2 keys. That is,

for all distinct x ,...,x ∈ U, all y ,...,y ∈ R not necessarily

1 M 1 M

distinct, and uniformly randomly selected h,

P(h(x ) = y ∧...∧h(x ) = y ) ≤ .

1 1 M M rM

Pause and remind yourself of why a (1,M)-universal family is also

(1,v)-universal for any v < M.

5/13

Cuckoo Hashing

Case 2. Suppose that there is a sequence of v distinct keys that

form a closed loop. The paper claims that the number of such

sequences is bounded above by the expression

v3rv−1nv−1.

QUICK EXERCISE 2. In your own words, justify this upper

bound by explaining what each factor represents.

The paper then claims that the probability of any such sequence

occurring is bounded above by 1 . Explain this upper bound.

r2v

6/13

Cuckoo Hashing

Answer

• A loop consists of four phases:

• keys x ,...,x, in which the algorithm never revisits a cell

1 i

• keys x ,...,x =x, where key x is the first key to revisit

i+1 j i j−1

a cell, namely that of x

• keys x =x ,...,x =x , in which the first phase of

j+1 i−1 i+j−1 1

the insert routine undoes itself

• keys x ,...,x , where x is the next key to revisit a cell.

i+j ℓ ℓ

Roughly speaking, we can choose the lengths of phases 1, 2

and 4; there are fewer than v ×v ×v = v3 ways to do so.

• The v entries in the loop must share only v −1 locations in

the tables. Since each table has length r, there are fewer than

rv−1 ways to select these locations.

To think about: why rv−1 and not (2r)v−1?

7/13

Cuckoo Hashing

• Key x has to take part in the loop. There are fewer than

nv−1 to choose the remaining v −1 keys.

The product of all of these approximations gives the stated

expression, v3rv−1nv−1.

Recall that we uniformly randomly select two hash functions, h

and h . To produce a particular loop, both h and h have to send

2 1 2

each of the v keys to specific values in R. Since both h and h are

1 2

(1,v)-universal by assumption, the probabililty of this occurring is

1 1 1

× = .

rv rv r2v

(cid:124)(cid:123)(cid:122)(cid:125) (cid:124)(cid:123)(cid:122)(cid:125)

forh1 forh2

8/13

Cuckoo Hashing

QUICK EXERCISE 3. Using the above analysis of Case 2, the

paper concludes that the probability of any closed loop occurring is

bounded above by

∞ ∞ (cid:18) (cid:19)

(cid:88) 1 (cid:88) (cid:16)n(cid:17)v 1

v3rv−1nv−1r−2v ≤ v3 ∈ O .

rn r n2

v=3 v=3

Justify this asymptotic estimate. Hint: recall what ϵ is.

Here is a trick: if 0 < p < 1 and m is a nonnegative integer, then

to leading order,

∞

(cid:88) 1

vmpv ∼ .

(1−p)m+1

v=0

9/13

Cuckoo Hashing

Answer

Recall that r ≥ (1+ϵ)n, which implies that n ≤ 1 . We find that

r 1+ϵ

(cid:88)∞

(cid:16)n(cid:17)v

(cid:88)∞ (cid:18)

(cid:19)v (cid:32) (cid:18) 1(cid:19)4(cid:33)

v3 ≤ v3 ∈ O 1+ ,

r 1+ϵ ϵ

v=3 v=0

which is constant with respect to n so long as we resize the table

as needed to maintain the appropriate load factor.

Further,

(cid:18) (cid:19)

1 1 1

≤ ∈ O .

rn n2(1+ϵ) n2

Combining these two results shows that

∞ (cid:18) (cid:19)

(cid:88) 1

v3rv−1nv−1r−2v ∈ O .

v=3

10/13

Cuckoo Hashing

More to think about!

• Case 3. This calculation is a simpler version of the one in

Case 2.

• Read Lemma 1 in page 6 of the pdf/127 of the journal.

Persuade yourself that you agree with it.

In the case p ≥i +j in the proof, why do both of the

candidate sequances start with an occurrence of the key x ?

• Apply Lemma 1 to the analysis of Case 3, and justify the claim

that the probability of Case 3 is bounded above by

2(1+ϵ)−(2t−1)/3+1.

11/13

Cuckoo Hashing

• This section of the paper concludes by calculating the

expected value of the number of iterations in one execution of

the insert algorithm.

Let X be a discrete random variable that can take on the

values 1,2,3..., each with a certain probability. The usual

formula for the expected value of X is

∞

(cid:88)

E(X) = x ·P(X = x).

x=1

The calculation in equation (3) of the paper (page 9 of the

pdf/129 in the journal) is clearly not applying this formula.

What formula is being used instead? Why does it work?

12/13

Cuckoo Hashing

(cid:6) (cid:7)

• In your own words, justify the choice of M = 3log r as

1+ϵ

given at the bottom of page 9 of the pdf/129 of the journal.

• The last few paragraphs in this section talk through an

amortized analysis argument for the overall amortized cost of

the insert algorithm, including those cases where the table

must be rehashed. Read this argument and make sure you are

following the key points.

13/13