Van Emde Boas Trees: A First Attempt
Recall from Wednesday: we augmented a bit array with a binary
tree.
Except for search, our operations of interest ran in O(log(u)) time
on the resulting data structure due to the height of the tree.
Proposed strategy: make the tree shorter.
Simplifying assumption. Assume that u is an even power of 2.
√
Then u is an integer and a power of 2.
The binary representation of any x ∈ U requires lg(u) digits. The
√
binary representation of any x < u requires 1 lg(u) digits.
2
1/13
Van Emde Boas Trees: A First Attempt
Here is a proto-van Emde Boas tree, or pVEB tree.
cluster
summary 1 1 0 1
0 1 2 3
0 1
0 1 1 1
1 1
4 5 6 7
2 0
0 0 0 0
3 1
8 9 10 11
1 0 1 1
12 13 14 15
√
summary is an array of length u, and cluster is a 2D array of
√ √
dimensions u× u.
√
Summary bit summary[i] is the logical OR of the u bits in
cluster[i].
2/13
Van Emde Boas Trees: A First Attempt
Given indices i and j, the entry cluster[i][j] represents the
value
√
x = i u+j ∈ U.
Conversely, given x ∈ U, the corresponding cluster indices are
√ √
i = x/ u, j = x mod u.
For any x ∈ U, define the operations
√
H(x) = x/ u = i,
√
L(x) = x mod u = j.
H is for “high” and L is for “low”. Assume that H(x) and L(x)
have been implemented and require O(1) time.
3/13
Van Emde Boas Trees: A First Attempt
QUICK EXERCISE. How do all of our operations of interest work
on a proto-van Emde Boas tree? What are the time complexities?
We are looking for:
• search, insert, delete
• predecessor
• minimum
Answer
Start with the easy ones.
• To search for x, check if cluster[H(x)][L(x)] == 1.
• To insert x, set cluster[H(x)][L(x)] and summary[H(x)]
to 1.
These are O(1) operations.
4/13
Van Emde Boas Trees: A First Attempt
For the data structure as described, deleting x requires two steps.
• Set cluster[H(x)][L(x)] to 0.
• Check if cluster[H(x)] contains any 1’s. If not, set
summary[H(x)] to 0.
√
The latter step runs in O( u) time.
We could bring this down to O(1) by replacing each summary bit
with a summary counter that tracks the number of entries present
in the corresponding cluster.
5/13
Van Emde Boas Trees: A First Attempt
√
The remaining operations run in O( u) time.
• To find the minimum entry, find the leftmost 1 (or nonzero
counter) in the summary array, then find the leftmost 1 in the
corresponding cluster.
• To find the predecessor of x:
• If summary[H(x)] is 1, find the rightmost 1 in the subarray
cluster[H(x)][0:L(x)], if one exists.
• Otherwise, find the rightmost 1 in the subarray
summary[0:H(x)-1], then find the rightmost 1 in the
corresponding cluster.
6/13
Van Emde Boas Trees
Let’s slightly rephrase our operations.
• To search for x, search cluster[H(x)] for L(x).
• To insert x, insert H(x) into summary and insert L(x) into
cluster[H(x)].
• To delete x, delete L(x) from cluster[H(x)]. If the
resulting cluster is empty, delete H(x) from summary.
• To find the minimum value, find the minimum of summary,
then find the minimum in the corresponding cluster.
• To find the predecessor of x, find the predecessor of H(x) in
summary, then find the predecessor of L(x) in the
corresponding cluster.
7/13
Van Emde Boas Trees
Each operation on a universe of size u can be constructed from
√
identical operations on a universe of size u. This suggests a
recursive approach.
A van Emde Boas (vEB) tree representing a universe of size u
√
consists of a summary van Emde Boas tree and an array of u
cluster van Emde Boas trees, each representing a universe of size
√
u.
(Base case? We’ll get there.)
Notation. Let V be a vEB tree with universe size u.
• V.summary is the summary vEB tree
• V.cluster is the array of cluster vEB trees
• V.u is the size of the universe
8/13
Van Emde Boas Trees
Proposition. If T(u) satisfies a recurrence relation of the form
√
(cid:26)
T( u)+O(1), u > c
T(u) =
1, else,
then T(u) ∈ O(loglog(u)).
Proof. Various arguments are possible. For example, make the
change of variables u = 2y, define S(y) = T(2y), then apply the
Master Theorem.
The consequence for us is: if we want our recursively defined
operations to run in O(loglog(u)) time, then we are limited to one
recursive call, plus constant-time local costs.
9/13
Van Emde Boas Trees
The search operation
• To search for x, search cluster[H(x)] for L(x).
This one is fine!
The insert operation – original version
• To insert x, insert H(x) into summary and insert L(x) into
cluster[H(x)].
This one, as written, is not fine. We need to ensure that we make
√
at most one recursive call on a universe of size u.
10/13
Van Emde Boas Trees
Comments.
• The insertion of H(x) into summary is only required if L(x) is
the first entry to be put into cluster[H(x)].
• Starting with a search doesn’t help us, though. It would be
nice if we could check if a cluster is empty in O(1) time.
• If the insertion of H(x) into summary is necessary, then L(x)
will become the only entry in cluster[H(x)]. We don’t have
time to run a second insert operation. Can we just hold L(x)
until later?
New idea. Add the field V.min to the vEB tree V. V.min is null
iff V is empty. Otherwise, V.min holds the smallest value currently
in V (and this value is not duplicated in an entry in V.cluster).
Thus V.min acts as both an empty/nonempty flag and a buffer.
11/13
Van Emde Boas Trees
V.min obviously facilitates the minimum operation. To similarly
facilitate the maximum operation, introduce the V.max field.
If V contains more than one key, then V.max is the largest of the
values contained in V, and this value is also stored in the
appropriate cluster.
With these new fields, we can now define:
Base case. A vEB tree V with universe size u = 2 consists only of
V.min and V.max.
• If n = 0, then V.min and V.max are null.
• If n = 1, the single key x is stored in both V.min and V.max.
• If n = 2, the two keys are stored in V.min and V.max in the
appropriate order.
12/13
Van Emde Boas Trees
QUICK EXERCISE. Give a sketch in pseudocode of the insert
routine, making use of V.min and V.max. Include the base case(s).
Ensure that the time complexity of the result satisfies
√
T(u) = T( u)+O(1).
Answer (short, risky – do you agree with this?)
void insert(x)
if (min != null)
if (x < min) swap x and min
if (x > max) max = x
if (x > min && u > 2)
if (cluster[H(x)].min == null)
summary.insert(H(x))
cluster[H(x)].insert(L(x))
else
min = max = x
13/13