Bitwise Tries – Recall From Monday
Example. Let u = 8, and consider the set of keys
S = {0,1,5,6,7}.
Write each key using its 3-digit binary representation:
S = {000,001,101,110,111}.
The corresponding bitwise trie is:
0 1
00 10 11
000 001 101 110 111
1/16
Bitwise Tries
Comments
• A key in S is always located in a leaf of the trie. If |S| = n,
there are more than n nodes in the trie.
• The lowest common ancestor of any two nodes contains the
longest prefix common to those two sequences.
• The height of a nonempty trie is always log(u), independent
of n.
The insert, search and delete algorithms run very similarly to the
corresponding operations on a binary search tree. Minor
differences: insert(x) might add more than one node to a branch,
and delete(x) might remove more than one node from a branch.
These three operations run in O(log(u)) time.
2/16
Bitwise Tries
QUICK EXERCISE. What is the result of calling
predecessor(111) on this trie? What about predecessor(100)?
In general, how does the predecessor operation work, and what is
its time complexity?
0 1
00 01 10 11
000 001 011 101 110
3/16
Bitwise Tries
Answer. In the example trie, the predecessor of 111 is 110. The
predecessor of 100 is 011.
predecessor(x) starts by finding the node, say a, that contains the
longest common prefix between x and a node in the trie.
• If node a contains x itself, we are in the annoying convention
case and should just return x. Otherwise...
• If node a has a left child, then the predecessor is the largest
leaf in the left subtree of a. Otherwise...
• Travel back up the branch from child to parent until the first
time we find a node that has a left child that is not part of
the branch that we were backtracking. The predecessor is the
largest leaf in the subtree rooted at that left child.
In all cases, the run time is O(log(u)).
4/16
Bitwise Tries
Proposition. A bitwise trie with n keys contains O(nlog(u/n))
nodes.
Proof. Assignment 1.
Summary. All of our operations of interest run in O(log(u)) time.
This data structure requires O(nlog(u/n)) nodes. Counting
pointers (and not employing any succinctness tricks), this is
equivalent to O(nlog(u/n)) words.
Is this any good?
...Not really. log(u) > log(n), so the run times are all worse than
what we could attain on a balanced tree. nlog(u/n) > n, so the
space required is also worse than a balanced tree.
5/16
X-Fast Tries
What’s taking so long? How can we fix it?
• All of our operations on some input x ∈ U start by finding the
node that contains the longest common prefix between x and
a node in the trie.
We always have to start at the root and travel down the trie,
one row/one digit at a time.
• For predecessor, the above operation might send us to entirely
the wrong branch, in which case we have to backtrack up and
then go down again.
6/16
X-Fast Tries
Improvement #1. Associate each row with a hash table that
contains all of the entries that are present in that row.
Assume it’s a good hash table (e.g., cuckoo hashing) that allows
worst case O(1) search times.
We can now choose any row and, in O(1) time, ask whether a
node with a particular subsequence occurs in that row.
QUICK EXERCISE. How would you rewrite the search, insert
and delete operations? Which run times – if any – have improved?
7/16
X-Fast Tries
Answers
The search operation itself is now O(1): simply look up the search
key in the hash table corresponding to the bottom row.
Insert and delete begin with a search for the longest common
prefix. Instead of constructing the prefix digit by digit (i.e., linear
search), now we can employ a binary search.
If the input x is a d-digit binary number, take the first d/2 digits
and ask whether that sequence occurs in the appropriate row.
• If yes, append the next d/4 digits from x and repeat in a
lower row.
• If no, remove the last d/4 digits from the prefix and repeat in
a higher row.
Continue until the longest common prefix has been found.
8/16
X-Fast Tries
There are log(u) rows in the trie, and we are now searching them
using binary search.
Consequently, the longest common prefix can be found in
O(loglog(u)) time! This is as much of an improvement over
O(log(u)) time as O(log(n)) is over O(n).
However, insert and delete might make alterations to O(log(u))
rows in the trie, so the overall run times of these operations are
still O(logu).
9/16
X-Fast Tries
Improvement #2. To facilitate the predecessor operation, add
pointers to nodes in the trie as follows.
• Equip the lowest row (that is, the row of leaves) with the
structure of a doubly linked list.
• If an internal node a has only one child, equip a with a pointer
to its closest leaf, in the following sense.
• If node a has a left child, a.leaf points to the largest leaf in
its left subtree.
• If node a has a right child, a.leaf points to the smallest leaf
in its right subtree.
The data structure that results from applying Improvements #1
and #2 to a bitwise trie is called an x-fast trie.
10/16
X-Fast Tries
QUICK EXERCISE. Here is a bitwise trie. Draw all of the
pointers that Improvement #2 assigns to nodes in this trie.
0 1
00 10 11
000 001 100 110 111
0000 0010 0011 1001 1100 1101 1111
11/16
X-Fast Tries
QUICK EXERCISE. Here is a bitwise trie. Draw all of the
pointers that Improvement #2 assigns to nodes in this trie.
0 1
00 10 11
000 001 100 110 111
0000 0010 0011 1001 1100 1101 1111
Here is the rather cluttered result.
12/16
X-Fast Tries
QUICK EXERCISE. What is the outcome of calling
predecessor(0101) on the example trie? What about
predecessor(1000)? How would you rewrite the predecessor
operation on an x-fast trie? What is its run time?
0 1
00 10 11
000 001 100 110 111
0000 0010 0011 1001 1100 1101 1111
13/16
X-Fast Tries
Answer
The predecessor of 0101 is 0011. The predecessor of 1000 is also
0011.
To evaluate predecessor(x), start by finding the longest common
prefix as before. This can now be done in O(loglog(u)) time.
• If the longest common prefix is x itself, return x.
• Otherwise, the node containing the longest common prefix
must have exactly one child node, and therefore it also has a
leaf pointer. Follow the leaf pointer down to the bottom row.
If the entry in this leaf is smaller than x, return it; otherwise,
walk back one step in the doubly linked list and return the
result.
In all cases, the run time is O(loglog(u))+O(1) = O(loglog(u)).
14/16
X-Fast Tries
Summary
In an x-fast trie...
• Search runs in O(1) time, and predecessor runs in O(loglogu)
time. These are improvements over bitwise tries.
• The x-fast trie structure adds no more than O(1) pointers per
node, so the space required is still O(nlog(u/n)) words.
• Insert and delete run in O(log(u)) time, which is worse than a
balanced search tree.
We have improved some but not all of our operations of interest.
Can we reduce all of the run times to O(loglog(u))? Can we
reduce the space to O(n)? Stay tuned!
15/16
Upcoming
Before class on Fri Jan 17, please read sections 1 and 2 of the
Willard 1983 paper on x-fast and y-fast tries.
Regarding x-fast tries, be prepared to answer the following
questions.
• If a is a node in an x-fast trie, what is id(a)? What is
descendant(a)?
• If x is a key in the universe U, what is bottom(x)?
• How does Willard arrive at an upper bound of NlogM for the
number of nodes in a trie?
Regarding y-fast tries, here is an example to consider. In the
notation of the paper, let
S = {0,1,2,3,4,6,8,9,10,11,13,14,15}
and let L = 4. What is the set S ? What is the set S∗?
L 2
16/16