X-Fast Tries – Recall From Wednesday
An x-fast trie is a bitwise trie that has been augmented in two
ways.
• Each row in the trie has a corresponding hash table that
contains all of the entries that are present in that row.
• The leaves of the trie are equipped with a doubly linked list
structure, and any internal node that has only one child is
equipped with a pointer to its closest descendant leaf.
1/15
X-Fast Tries
Here is the example from Wednesday with all of the pointers
shown, along with my row numbering convention.
0
0 1 1
00 10 11 2
000 001 100 110 111 3
0000 0010 0011 1001 1100 1101 1111 4
2/15
X-Fast Tries
Good results
• The search operation can be run in O(1) time using the hash
table for the leaf row.
• The longest common prefix operation, which is how most of
our operations of interest begin, can make use of a binary
search pattern to run in O(loglog(u)) time.
• Consequently, the predecessor operation runs in O(loglog(u))
time.
3/15
X-Fast Tries
Mediocre results
• The insert and delete operations still require O(log(u)) time.
Can we reduce this to o(log(n))?
• Per a claim that you will verify on Assignment 1, a bitwise trie
with n keys contains O(nlog(u/n)) nodes.
• We assume that the size of hash table j is proportional to the
number of entries in row j, and therefore the total size due to
the hash tables is O(nlog(u/n)).
• The x-fast trie structure adds O(1) pointers per node, for a
total of O(nlog(u/n)) words.
Thus, in terms of space, an x-fast trie is asymptotically no
worse than a bitwise trie, but obviously no better either. Can
we reduce this to Θ(n)?
4/15
X-Fast Tries
Source. Willard. Log-Logarithmic Worst-Case Range Queries Are
Possible In Space Θ(n). IPL 17(2):81-84. 1983.
QUICK QUESTIONS
(1) If a is a node in an x-fast trie, what is id(a)? What is
descendant(a)?
(2) If x is a key in the universe U, what is bottom(x)?
(3) How does Willard arrive at an upper bound of NlogM for the
number of nodes in a trie?
5/15
X-Fast Tries
Answers
(1) id(a) is – or I believe it is intended to be – the binary
sequence contained in (or associated with) node a. In a leaf,
this value is a key; in an internal node, this value is a prefix of
a key.
However, the formula is a little off. I would argue that a node
at height j that contains the prefix i is an ancestor of keys
ranging from i ·2j to i ·2j +(2j −1) = (i +1)2j −1.
descendant(a) is the paper’s notation for what I called
a.leaf. For an internal node with only one child, it is the
pointer to the closest leaf among the descendents of that
node.
6/15
X-Fast Tries
(2) Given x ∈ U, bottom(x) is the node containing the longest
common prefix between x and a node in the trie.
In the paragraph that defines bottom(x), a node a with
height j is described as an ancestor of the integer K if
(cid:4) K/2j(cid:5)
=id(a). The calculation
(cid:4) K/2j(cid:5)
truncates the final j
digits from the binary representation of K. This supports the
reading of id(a) as the prefix associated with the node a.
7/15
X-Fast Tries
(3) In the notation of the paper, N is the number of keys
represented by the trie and M is the size of the universe. The
x-fast trie has height logM.
A quick way to get the NlogM upper bound is to observe
that each time a key is added to the trie, it creates no more
than logM new nodes (except for the very first key, which
creates logM +1 nodes, but let’s assume that we can absorb
that +1 into the count for some later node). Adding N keys
therefore adds no more than NlogM nodes.
8/15
Now Make It Faster! Y-Fast Tries
QUICK QUESTION. In the notation of the paper, let
S = {0,1,2,3,4,6,8,9,10,11,13,14,15}
and let L = 4. What is the set S ? What is the set S∗?
L 2
Draw a y-fast trie of order 4 representing S.
Answer. Per the definition in the paper, the 4-separator of S is
S = {0,4,10,15}. The set S∗ is the set of elements in S that are
4 2
between the second and third elements of S : that is,
4
S∗ = {4,6,8,9}.
2
9/15
Y-Fast Tries
Here is the idea.
x-fast trie
0 1
00 01 10 11
000 010 101 111
0000 0100 1010 1111
2 8 13
1 3 6 9 11 14
0 4 10
balanced search trees
10/15
Y-Fast Tries
A y-fast trie uses the above construction with L = log(u). That is:
• There are n/log(u) entries in the separator set.
• There are n/log(u) balanced search trees, each containing
log(u) values. Consequently, each balanced search tree has
height Θ(loglog(u)).
Minor variations
• It is not necessary to include the largest entry in S in the
separator set.
• It is also not necessary for entries in the separator set to be
duplicated in their corresponding search trees.
11/15
Y-Fast Tries
Here is a slightly cleaned up version.
x-fast trie
0 1
00 01 10
000 010 101
0000 0100 1010
2 8 13
1 3 6 9 11 14
15
balanced search trees
Note that keys in the dictionary are now in two distinct locations:
the search trees, and the leaf row of the trie.
12/15
Y-Fast Tries
QUICK EXERCISE. In terms of n and u, how much space is
required by a y-fast trie? Note that the construction only makes
sense if n > log(u). You can assume this property.
Answer
• Recall that an x-fast trie containing n keys requires
O(nlog(u/n)) memory.
In a y-fast trie, the x-fast trie piece contains only n/log(u)
keys. Since n > log(u) by assumption, we see that
(cid:18) (cid:19)
u
log < log(u),
n/log(u)
and
(cid:18) (cid:19)
n u n
log < log(u) = n.
log(u) n/log(u) log(u)
Thus the x-fast trie requires O(n) memory.
13/15
Y-Fast Tries
• There are Θ(n/log(u)) balanced search trees, each containing
Θ(log(u)) entries. The total memory required by the search
trees is
(cid:18) (cid:19)
n
Θ ×Θ(log(u)) = Θ(n).
log(u)
One of our goals is accomplished! The space of the data structure
has been reduced to Θ(n).
14/15
Y-Fast Tries
QUICK EXERCISE. How does predecessor(x) work on a y-fast
trie? What is its run time?
Answer
• Start by applying the predecessor algorithm to the x-fast trie.
This costs O(loglog(u)) time as before.
• If the resulting leaf in the x-fast trie contains x, return x.
Otherwise, let v be the value in this leaf.
• Search for the predecessor of x in the balanced search tree
corresponding to this leaf. Since the height of this tree is
O(loglog(u)), this operation also costs O(loglog(u)) time.
• If x has no predecessor in the search tree, return v; otherwise
return the result from the search tree.
Total run time: O(loglog(u)).
15/15