Succinct Data Structures
Recall from a very long time ago:
Definitions. Let A be a collection of data. Let opt be the
A
information theoretic lower bound on the number of bits required
to store A.
A data structure that is used to store the data A is called:
• compact if it requires O(opt ) bits;
A
• succinct if it requires opt +o(opt ) bits;
A A
• implicit if it requires opt +O(1) bits.
A
Any such data structure must use at least opt bits to store the
A
data. The extra bits may store indices, pointers, or other data
required to support operations.
1/8
Binary Rank and Select
The Scenario
We will consider a very simple universe: U = {0,1}. That is, each
individual piece of data is a single bit.
Let A be a bit array of length n. Assume that A uses n bits of
space, and that any entry A can be accessed in O(1) time. The
i
entries in A are not in sorted order.
Assume that A is static: the contents will not change.
A 0 0 1 1 1 0 1 1 0 1
0 1 2 3 4 5 6 7 8 9
We defined operations rank , rank , select , select on this array.
0 1 0 1
For the rest of the discussion, we focus on rank (i), which returns
1
the number of 1’s in entries A ,...,A .
0 i
2/8
Binary Rank and Select
Precomputing all of the answers guarantees that rank (i) can be
1
evaluated in O(1) time.
However, n precomputed entries are required, each of which is an
integer from 0 to n. The number of bits is Θ(nlog(n)).
Since only Θ(n) bits are required to store n binary digits, the data
structure in which the array A is augmented with a complete
look-up table is not compact (or succinct).
Question. Can we construct a succinct data structure that
supports the rank operation in O(1) time?
1
3/8
Succinct Binary Rank
Precomputing all of the ranks used too much space. To make the
total space smaller, we could make the answers smaller and/or
store fewer answers.
Initial attempt: break up A into blocks of length k.
• Precompute the 1-rank in A of a for those i such that
i
i mod k = 0. Such an a is the first entry in block i/k.
i
• Precompute the 1-ranks of the remaining entries in each block
relative to the first entry a .
i
QUICK EXERCISE. Use blocks of length k = 3. Write down the
precomputed 1-rank for each entry in the example array.
A 0 0 1 1 1 0 1 1 0 1
0 1 2 3 4 5 6 7 8 9
4/8
Succinct Binary Rank
Answer
block 0 block 1 block 2 bl. 3
0 0 1 2 1 1 4 1 1 6
A 0 0 1 1 1 0 1 1 0 1
0 1 2 3 4 5 6 7 8 9
Entries in red are 1-ranks in the overall list. Entries in blue are
1-ranks relative to the first entry in the block.
QUICK EXERCISE. In terms of n, how large does k have to be
to guarantee that storing the ranks of all of the a ’s requires only
i
o(n) space?
In terms of n and k, how much space is required to store one of
the remaining ranks? How about all of them?
5/8
Succinct Binary Rank
Answer
If each block has length k, then there are n blocks. The space
k
required for all of the rank (a ) values is
1 i
n
×⌈lg(n+1)⌉,
k
which is in o(n) provided
k ∈ ω(lg(n)).
There are
(cid:0)
n−
n(cid:1)
remaining entries in A. The rank of any one of
k
them relative to its block is an integer no larger than k, so any one
of these ranks requires ⌈lg(k +1)⌉ bits. Storing all of them requires
(cid:18) (cid:19)
1
n 1− ×⌈lg(k +1)⌉ = n×O(1)×Θ(lg(k))
k
space, which is in n×ω(lglg(n)) = ω(n). This is still too big.
6/8
Succinct Binary Rank
Next attempt: start from the bottom up.
QUICK EXERCISE. Consider a block of size k. This block
consists of k bits.
• How many possible blocks of size k are there in total?
• How many bits are required to store the rank of one entry
within a block of size k?
• How many possible inputs to the rank () function are there,
1
assuming we only calculate rank relative to a block of size k?
• Suppose we create a look-up table containing the answer to
every possible rank query in every possible block. How small
does k have to be to ensure that the total size of this lookup
table is o(n)?
7/8
Succinct Binary Rank
Answer
• There are 2k possible binary sequences of length k.
• A rank can take on any value from 1 to k. Storing one rank
requires ⌈lg(k +1)⌉ bits.
• Since the block has size k, there are k possible inputs to the
rank () function, namely 1,...,k.
1
• The space required by this look-up table is
2k ×k ×⌈lg(k +1)⌉.
We require a value of k such that this expression is in o(n).
8/8