Assignment 1 Digression #1
Recall the formula for the sum of a finite geometric series. For any
constant p > 0,
(cid:88)n pn+1−1
pj = .
p−1
j=0
n
(cid:88)
What if we need jpj?
j=0
There are various ways to derive the formula, but the quickest that
I’m aware of is to take d of both sides of the above equation:
dp
d (cid:88)n d pn+1−1
pj = .
dp dp p−1
j=0
On the left-hand side, the summand will be jpj−1. Fix this by
multiplying both sides by p.
1/12
Assignment 1 Digression #2
Let N be the discrete random variable in the Pagh and Rodler
paper that represents the number of iterations in one occurrence of
the insert algorithm.
The authors estimate P(N > t) for an arbitrary number of
iterations t. Then they calculate the expected number of iterations
using the equation
∞
(cid:88)
E(N) = P(N > t).
t=0
Question 3 on the assignment is asking you to show why this
formula is valid. The details have nothing to do with cuckoo
hashing or calculations in the paper. Start from the standard
∞
(cid:88)
formula E(N) = t ·P(N = t) and derive the equation above.
t=1
2/12
Back To Bit Arrays
We continue to assume that the universe U takes the form
U = {0,1,2,...,u−1}.
Continued standard notation: S ⊆ U is the collection of keys in a
dictionary, and |S| = n.
Operations of interest:
• The standard dictionary operations: insert, search and delete.
• The predecessor operation (by which we mean we want both
predecessor and successor, but it suffices to show the details
for one of them).
• The minimum operation (by which we mean both minimum
and maximum, but it suffices to show the details for one of
them).
3/12
Back To Bit Arrays
A very simple way to represent the set S is with a bit array. Here is
an example with u = 16 and S = {0,1,5,6,7,12,14,15}.
1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We have already discussed the following properties.
• Θ(u) memory (originally bits, at worst words)
• O(1) worst-case search, insert and delete
• O(u) worst-case predecessor and minimum
Moreover, we already know that these results are not particularly
good.
4/12
Back To Bit Arrays
Let us augment a bit array with a binary tree as follows.
1
1 1
1 1 0 1
1 0 1 1 0 0 1 1
1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The row of leaves corresponds to the bit array. Each internal node
contains a 1 if any descendant of it contains a 1.
In other words, the value in node a is the logical OR of the values
of all of its descendants.
5/12
Back To Bit Arrays
Let us augment a bit array with a binary tree as follows.
1
1 1
1 1 0 1
1 0 1 1 0 0 1 1
1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
QUICK EXERCISE. How do the operations of interest work on
this data structure? What are the time complexities?
Note that the bit array still exists in addition to the tree structure.
6/12
Back To Bit Arrays
Answers
• Search can still be performed in O(1) time, since we still have
an array. The tree structure is irrelevant.
• Note that the tree structure is essentially identical to a bitwise
trie: the only difference is that instead of omitting a node, we
include the node but give it the value 0.
Consequently, insert, search, delete, predecessor and minimum
all run in worst-case O(log(u)) time.
Is this any good?
Compared with a bit array, predecessor and minimum can be done
faster, but insert and delete have gotten slower.
7/12
Van Emde Boas Trees: A First Attempt
What’s taking so long? How can we fix it?
The problem is the height of the tree. Operations that traverse a
branch of the tree run in O(log(u)) time.
• An x-fast trie addressed this by introducing the hash tables at
each level.
• This time, we are going to make the tree shorter.
Simplifying assumption. Assume that u is an even power of 2.
√
Then u is an integer and a power of 2.
8/12
Van Emde Boas Trees: A First Attempt
√
Take the universe of size u and split it up into u clusters, each
√
of size u.
1 1 0 0 0 1 1 1 0 0 0 0 1 0 1 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
For each cluster, the keys in S that lie in that cluster can be
√
represented by a bit array of length u.
√
Define one more bit array of length u, called a summary. Bit i
in the summary is 1 if cluster i contains at least one nonzero entry,
and bit i is 0 otherwise.
In other words, bit i is the logical OR of the entries in cluster i.
I will (idiosyncratically) name the following object a proto-van
Emde Boas tree, or pVEB tree.
9/12
Van Emde Boas Trees: A First Attempt
cluster
summary 1 1 0 1
0 1 2 3
0 1
0 1 1 1
1 1
4 5 6 7
2 0
0 0 0 0
3 1
8 9 10 11
1 0 1 1
12 13 14 15
√
The u summary bits are stored in the array summary.
√ √ √
The u clusters are stored in the u× u array cluster.
Convention: cluster[i][j] refers to entry j in the cluster
corresponding to summary bit i.
10/12
Van Emde Boas Trees: A First Attempt
cluster
summary 1 1 0 1
0 1 2 3
0 1
0 1 1 1
1 1
4 5 6 7
2 0
0 0 0 0
3 1
8 9 10 11
1 0 1 1
12 13 14 15
√
QUICK EXERCISE. Given indices i and j, 0 ≤ i,j < u, what
number x ∈ U is represented by cluster[i][j]?
In the above example, cluster[0][3] represents 3 (0011 in
binary). cluster[3][2] represents 14 (1110 in binary).
11/12
Van Emde Boas Trees: A First Attempt
Answer
In binary, the index i gives the first 1 lg(u) digits of the
2
corresponding number, and index j gives the second 1 lg(u). Thus
2
√
x = i u+j.
Conversely, the number x ∈ U is represented by cluster[i][j],
where
√ √
i = x/ u, j = x mod u,
using integer arithmetic (or better yet, bit operations).
For any x ∈ U, define the operations
√
H(x) = x/ u = i,
√
L(x) = x mod u = j.
H is for “high” and L is for “low”. Assume that H(x) and L(x)
can be computed in O(1) time.
12/12