Van Emde Boas Trees
Summary to date
• van Emde Boas trees achieve O(loglog(u)) time complexity
on all operations of interest.
This is an improvement over y-fast tries, which had a
worst-case O(log(u)) time for the delete operation.
• However, van Emde Boas trees require ω(n) memory, whereas
we were able to attain a Θ(n) bound with y-fast tries.
This trade-off between space and time occurs frequently.
• The O(loglog(u)) bound is only an improvement on the
standard O(log(n)) performance of balanced search trees if
n > log(u). Neither of these structures makes sense for small
dictionaries.
1/13
Fusion Trees
Let’s see one more approach...
You have probably seen 2-3 trees, which are a special case/minor
variant of a more general structure, the B-tree.
11,22
2,8 18 26
1 4,5 10 13,15 19,21 25 27,30
2/13
Fusion Trees
A B-tree of minimum degree t has the following properties.
• Every node except the root node must have at least t −1
entries. (The root node might have fewer.)
• Every node, including the root node, must have no more than
2t −1 entries.
• An internal node with k entries has k +1 child nodes. All
entries are arranged in search tree order.
• All leaves have the same depth, namely the height of the tree.
3/13
Fusion Trees
QUICK EXERCISE. Show that if a B-tree of minimum degree t
contains n entries, then its height h satisfies h ∈ O(log (n)).
t
Answer
There is one node in row 0, namely the root. Then there are at
least 2 nodes in row 1, at least 2t nodes in row 2, at least 2t2
nodes in row 3, and so on.
Other than the root, each node contains at least t −1 entries.
Thus the total number of entries satisfies
h
(cid:88)
n ≥ 1+(t −1) 2tj−1 = 2th −1,
j=1
which rearranges to
(cid:18) (cid:19)
n+1
h ≤ log ∈ O(log (n)).
t 2 t
4/13
Fusion Trees
Continued standard assumption: the universe U takes the form
U = {0,1,2,...,u−1}.
For simplicity, assume that u = 2w for some integer w. That is,
w = lg(u).
For even more simplicity, assume that w is the word size in our
computational model. That is, it takes exactly one word to store
any one key from our universe.
Let S ⊆ U be a collection of keys to be stored in a dictionary,
where |S| = n.
New idea. Store the keys in S in a B-tree of whose minimum
degree has the form w1/6 = lg1/6(u), where 0 < f < 1.
5/13
Fusion Trees
The height of such a B-tree satisfies
(cid:18) (cid:19) (cid:18) (cid:19)
log(n) log(n)
h ∈ O(log (n)) = O = O .
w1/6 log(w1/6) loglog(u)
Consequently, IF we can implement our operations of interest in
O(h) time, then we can attain time complexities in o(log(n)).
But this will turn out to be difficult!
Suppose we are searching for the key x. At a certain stage of the
search algorithm, we are looking at an internal node with keys
y ,...,y , where k ∈ Θ(log1/6(u)).
1 k
To determine the next node to visit, we need to find the the rank
of x within the list y ,...,y . And if we want an O(h) overall
1 k
algorithm, we need to find this rank in O(1) time with respect to
both n and u.
6/13
Fusion Trees
Fun With Binary
The core idea is: we will compress the k keys in a node so that
they fit into a single word of w bits. By performing arithmetic
operations on that word, we will essentially be testing all k keys in
parallel.
Example. Let y = 4, y = 13, y = 74, y = 77. In binary, these
1 1 3 4
numbers are
y =00000100, y =00001101, y =01001010, y =01001101
1 2 3 4
Let Y = {y ,y ,y ,y }.
1 2 3 4
Convention. Number the bits in a word 1 to 8 from left to right
(i.e., from most significant to least significant). Then the bit in the
27 position is bit 1, ..., the bit in the 1’s position is bit 8.
7/13
Fusion Trees
y =00000100, y =00001101, y =01001010, y =01001101
1 2 3 4
These four numbers can be distinguished by examining no more
than three bits.
QUICK EXERCISE. Find b < b < b , the three most significant
1 2 3
bits that suffice to distingush between the four numbers in Y.
For all j, let y be the three-digit binary number that results from
(cid:98)j
writing only bits b ,b ,b of y . Find y ,y ,y ,y .
1 2 3 j (cid:98)1 (cid:98)2 (cid:98)3 (cid:98)4
8/13
Fusion Trees
Answer
The most significant bit where two members of Y differ is b = 2.
1
y and y tie at bit 2, and can be distinguished at b = 5.
1 2 2
y and y tie at bits 2 and 5, and can be distinguished at b = 6.
3 4 3
The numbers consisting only of bits 2,5,6 are
y =001, y =011, y =110, y =111
(cid:98)1 (cid:98)2 (cid:98)3 (cid:98)4
9/13
Fusion Trees
y =00000100, y =00001101, y =01001010, y =01001101
1 2 3 4
y =001, y =011, y =110, y =111
(cid:98)1 (cid:98)2 (cid:98)3 (cid:98)4
Let Y(cid:98) = {y (cid:98)1,y (cid:98)2,y (cid:98)3,y (cid:98)4}.
QUICK EXERCISE. Consider the numbers x = 01000100 and
1
x = 01100000. For each j, let x be the binary number that
2 (cid:98)j
results from writing only bits b ,b ,b of x .
1 2 3 j
Find x
(cid:98)1
and x (cid:98)2. Then find the ranks of x
(cid:98)1
and x
(cid:98)2
in Y(cid:98). Do your
answers match the ranks of x and x in Y?
1 2
10/13
Fusion Trees
y =00000100, y =00001101, y =01001010, y =01001101
1 2 3 4
y =001, y =011, y =110, y =111
(cid:98)1 (cid:98)2 (cid:98)3 (cid:98)4
Answer
x = 101, x = 100.
(cid:98)1 (cid:98)2
Both x
(cid:98)1
and x
(cid:98)2
are larger than two of the entries in Y(cid:98). The ranks
of both of these numbers in Y(cid:98) are 2.
The number x is larger than two of the entries in Y. Thus x also
1 1
has rank 2 in Y.
However, the number x is larger than all four of the entries in Y.
1
The ranks do not agree.
11/13
Fusion Trees
y =00000100, y =00001101, y =01001010, y =01001101
1 2 3 4
y =001, y =011, y =110, y =111
(cid:98)1 (cid:98)2 (cid:98)3 (cid:98)4
x =01000100, x =101, x =01100000, x =100
1 (cid:98)1 2 (cid:98)2
What went wrong with x ? Why didn’t it go wrong with x ?
2 1
We found that y < x < y , and also that y < x < y . This
(cid:98)2 (cid:98)1 (cid:98)3 2 1 3
second fact shows that the rank of x in Y is 2.
1
We found that y < x < y . However, when we go back to the
(cid:98)2 (cid:98)2 (cid:98)3
original numbers, we see that x > y . The most significant
2 3
difference occurs at bit 3, which is not one of the bits identified in
our set B. Our compressed value skipped the bit that mattered.
12/13
Fusion Trees
Luckily, this is fixable!
• The Fredman-Willard paper solves the problem with a look-up
table.
• More concretely, once we see that x > y and that the most
2 3
significant bit that distinguishes them is bit 3, we know that
x should win any comparison that occurs at or after bit 3.
2
Replace bits 3 – 8 in x
2
with 1’s, and find the rank in Y(cid:98) again
using the resulting new value for x . This will give the correct
(cid:98)2
rank of x in Y.
2
13/13