代写辅导接单-Hash Tables

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

Hash Tables

We are implementing the Dictionary ADT – specifically, the insert,

search, and delete operations. Assume that the keys in the

dictionary are drawn from the set U.

Recall that a hash table consists of the following.

• An underlying array A of length r. Let R = {0,1,...,r −1},

the set of all valid indices for entries in A.

• A hash function h : U → R.

• A choice of collision management strategy.

For today, we will consider collision management using separate

chaining. Each entry A in A is a linked list. If the key x ∈ U is

i

added to the dictionary, and h(x) = i, then this key is inserted at

the front of linked list A .

i

1/17

Dictionaries – Expected Run Times

In COMP 2140, you may have heard a statement like:

“The insert, search and delete operations can be implemented in

O(1) time using a hash table.”

And then that statement might have been qualified with something

like:

“...on average? If the hash function is good/the hash table is big

enough/maybe other conditions? Probably? But you can just call

it O(1)!”

Let’s see if we can be a bit more precise today.

2/17

Dictionaries – Expected Run Times

What properties make a hash function h into a “good” hash

function?

• Patterns in the keys do not give rise to patterns in the hash

values. h randomizes its inputs (or something close to it).

• For all x ∈ U, h(x) can be computed quickly.

For the moment, assume that we have somehow created an

independent, uniform hash function.

• For any x ∈ U, h(x) is equally likely to be any one of

0,1,...,r −1.

• For any distinct x ,x ∈ U, the values of h(x ) and h(x ) are

1 2 1 2

independent.

• For any x ∈ U, h(x) can be computed in O(1) time.

3/17

Dictionaries – Expected Run Times

Assume that there are currently n keys in the dictionary. Label

these keys

x ,x ,...,x ,x

1 2 n−1 n

in the order in which they were inserted.

Define α = n, and note that α could be greater than 1. α is called

r

the load factor of the hash table.

Under all of these assumptions, what are the time complexities of

the insert, search and delete operations?

• In the worst case, insert(x) has to compute h(x) and insert

key x at the head of linked list A . Both of these are O(1)

h(x)

operations, so insert is guaranteed to run in O(1) time in the

worst case.

• Search and delete, on the other hand, might have to traverse

all or part of one of the linked lists.

4/17

Dictionaries – Expected Run Times

QUICK EXERCISE 1. Fix an arbitrary j ∈ R. Under all of the

stated assumptions, what is the expected value of the length of

linked list A ?

j

Solution (the quick way). Treat the insertions of keys x ,...,x

1 n

as n trials, where a trial is successful if x is added to linked list A .

i j

From our assumptions about h, the trials are independent, and the

probability of success is 1.

r

The number of successes follows a binomial distribution whose

expected value is n× 1 = α.

r

5/17

Dictionaries – Expected Run Times

Expected run times:

• An unsuccessful search has to evaluate h and traverse an

entire linked list. The expected run time is Θ(1+α).

• The run time of a successful search clearly bounded above by

that of an unsuccessful search. Can we establish a lower

bound?

6/17

Dictionaries – Expected Run Times

QUICK EXERCISE 2. Let i be fixed but arbitrary, 1 ≤ i ≤ n,

and consider a search for x . The number of entries in linked list

i

A that need to be checked before x is found equals the

h(xi) i

number of keys from among x ,...,x that also hashed to h(x ).

i+1 n i

For all j > i, define

(cid:26)

1, h(x ) = h(x ),

X = j i

ij 0, otherwise.

Let N be the number of entries in linked list A that precede x .

i h(xi) i

Find E(X ), then use that result to find E(N ). Finally, find the

ij i

expected running time of a successful search by averaging over i

and adding any additional steps. Assume that each value of i is

equally likely.

7/17

Dictionaries – Expected Run Times

Solution

From the uniform random nature of h, we see that

P(X = 1) = 1, and therefore

ij r

1

E(X ) = .

ij

r

The number of entries in linked list A that precede x is

h(xi) i

n

(cid:88)

N = X .

i ij

j=i+1

By the linearity of expected value, we get

n

(cid:88) n−i

E(N ) = E(X ) = .

i ij

r

j=i+1

8/17

Dictionaries – Expected Run Times

Now we average E(N ) over i, assuming each value of i occurs

i

with probability 1:

n

n n

1 (cid:88) 1 (cid:88) n−i

E(N ) =

i

n n r

i=1 i=1

1 n(n−1)

= ×

nr 2

n−1 1 1

= = α− α.

2r 2 2n

The above calculation shows that the expected number of entries

to check before finding the target of the search is Θ(α). A

successful search also has to evaluate h and locate the target value

itself, both of which are constant-time operations.

We conclude that the expected run time of a successful search is

Θ(1+α).

9/17

Dictionaries – Expected Run Times

The analysis of delete is identical to that of search. Thus two of

our three operations have expected run times in Θ(1+α).

Recall that we wanted O(1) expected run times. This is achievable

if α ∈ O(1), or in other words, if n ∈ Θ(r).

When collisions are managed using separate chaining, n can exceed

the size r of the underlying array, but it must be bounded above by

some constant multiple of r. Hash tables that get too full should

be resized in order to maintain this condition.

10/17

Dictionaries – Hash Functions In Practice

Here is a hash function that you may have seen in COMP 2140.

Convert a key x ∈ U to an integer if necessary, then define

h(x) = x mod r,

where the array size r is a prime number. This method for hashing

keys is called the division method.

In light of the preceding discussion, what are some drawbacks to

this method?

One problem is the condition that the table size has to be a prime

number. To maintain O(1) expected run times for our operations,

we might have to resize the table after a long sequence of

insertions.

But we can’t just double the table size: we have to find a new,

larger prime number! That might get complicated.

11/17

Dictionaries – Hash Functions In Practice

Here is an alternative hash function, called the multiplication

method. Choose an irrational constant 0 < a < 1. Again, convert

x ∈ U to an integer if needed, and define

h(x) = ⌊r(ax mod 1)⌋.

A favored choice for a is the golden ratio:

5−1

a = ≈ 0.618033....

2

With this approach, there is no longer any motivation to choose r

to be prime. In fact, if r is an integer power of 2, then the

multiplication by r is equivalent to a bit shift and can be done

quite quickly.

12/17

Dictionaries – Hash Functions In Practice

However, there is another drawback that I can’t fix by changing

my hash function. If an adversary knows what hash function h I

am using, they can choose x ,...,x that all hash to the same

1 n

value. My hash table will turn into a linked list, and my expected

run times for search and delete will go up to Θ(n).

The solution is to use a family H of hash functions, all of which

have good properties. Each time I create a new hash table, I select

one function from that family randomly.

13/17

Dictionaries – Hash Functions In Practice

Suppose h ∈ H is selected uniformly randomly. Some terminology:

• H is called uniform if for all x ∈ U and all y ∈ R,

P(h(x) = y) = 1.

r

• H is called universal if for all distinct x ,x ∈ U,

1 2

P(h(x ) = h(x )) ≤ 1.

1 2 r

• As defined in the paper that we will discuss on Friday: H is

called (c,k)-universal if for all distinct x ,...,x ∈ U and for

1 k

all y ,...,y ∈ R, not necessarily distinct,

1 k

c

P(h(x ) = y ∧...∧h(x ) = y ) ≤ .

1 1 k k rk

These properties aren’t quite as good as a uniformly random hash

function, but – as we will see on Friday – they can be good enough

to preserve the desired properties of a hash table.

(Also–unliketheuniformlyrandomhashfunction–wecanactuallycreatetheminpractice.)

14/17

Dictionaries – Hash Functions In Practice

QUICK EXERCISE 3. Suppose H is (c,k)-universal for some c

and some k ≤ |U|. Show that H is also (c,k −1)-universal.

Solution

Let x ,...,x ∈ U be distinct. Let y ,...,y ∈ R be arbitrary.

1 k−1 1 k−1

Choose x ∈ U that is distinct from all of x ,...,x . This is

k 1 k−1

possible since k ≤ |U|.

Let h ∈ H be chosen uniformly randomly. h(x ) must be one of

k

0,1,...,r −1, and these are mutually exclusive cases. Thus

P(h(x )=y , j =1,...,k −1)

j j

= P((h(x )=y , j =1,...,k −1)∧h(x )=0)

j j k

+P((h(x )=y , j =1,...,k −1)∧h(x )=1)

j j k

+ ...

+P((h(x )=y , j =1,...,k −1)∧h(x )=r −1)

j j k

15/17

Dictionaries – Hash Functions In Practice

There are r terms in the sum on the right-hand side. By the

assumption that H is (c,k)-universal, each one is ≤ c .

rk

Consequently,

c c

P(h(x ) = y , j = 1,...,k −1) ≤ r × = ,

j j rk rk−1

which proves that H is (c,k −1)-universal.

By induction, it is immediate that if H is (c,k)-universal, then H

is also (c,v)-universal for any v < k.

JUST FOR FUN EXERCISE. Why does (c,k)-universality imply

universality for smaller k, but not for larger k?

Let U = R, and make one of the functions in H the identity:

h (x) = x for all x ∈ U. Can you complete the definition of H so

0

that H is (1,1)-universal but not (1,2)-universal?

16/17

Upcoming: Let’s Read A Paper Friday

Before class on Fri Jan 10, please read sections 1 and 2 of the

paper “Cuckoo hashing” by Pagh and Rodler, which is posted on

UM Learn under Content - Papers. Sections 1 and 2 are located on

pages 1 – 9 of the pdf document (pages 122 – 130 of the journal).

Pagh also published a simplified description of cuckoo hashing,

which you might find to be a useful reference. This document can

be found on UM Learn in the same location.

You don’t have to follow every detail of the analysis. Just

familiarize yourself with the overall idea. We will discuss some of

the results in class on Friday.

17/17

51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228