程序代写案例-MTH6108

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
MTH6108 Coding theory
Contents
1 Introduction and definitions 2
2 Good codes 6
2.1 The main coding theory problem . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 6
2.2 The Singleton bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Hamming bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 The Plotkin bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Error probabilities and nearest-neighbour decoding 13
3.1 Noisy channels and decoding processes . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Rate, capacity and Shannon’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Linear codes 15
4.1 Revision of linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Finite fields and linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 The minimum distance of a linear code . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Bases and generator matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Equivalence of linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 Decoding with a linear code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Dual codes and parity-check matrices 28
5.1 The dual code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Syndrome decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6 Some examples of linear codes 33
6.1 Reed–Muller codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2 Hamming codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.3 The ternary Golay code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 Existence of codes and linear independence . . . . . . . . . . . . . . . . . . . . . . 40
6.5 MDS codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1
2 Coding Theory
1 Introduction and definitions
We begin with the most important definitions.
Definition. An alphabet is a finite set of symbols. If an alphabet A contains exactly q symbols,
we call it a q-ary alphabet.
In fact, we usually say binary rather than 2-ary, and ternary rather than 3-ary.
Definition. If A is an alphabet, a word of length n over A is a string a1 . . . an of elements of A.
The Hamming spaceAn is the set of all words of length n overA.
A code of length n overA is a subset ofAn, i.e. a set of words of length n overA.
IfA is a q-ary alphabet, we say that a code overA is a q-ary code.
Definition. Given two words x = x1 . . . xn and y = y1 . . . yn of length n over an alphabet A, we
define the distance d(x, y) between then to be the number of values of i for which xi , yi.
Lemma 1.1. d is a metric onAn, i.e.:
1. d(x, x) = 0 for all x ∈ An;
2. d(x, y) > 0 for all x , y ∈ An;
3. d(x, y) = d(y, x) for all x, y ∈ An;
4. (the triangle inequality) d(x, z) 6 d(x, y) + d(y, z) for all x, y, z ∈ An.
Proof. (1), (2) and (3) are very easy, so let’s do (4). Now d(x, z) is the number of values i for
which xi , zi. Note that if xi , zi, then either xi , yi or yi , zi. Hence
{i | xi , zi} ⊆ {i | xi , yi} ∪ {i | yi , zi}.
So
d(x, z) = {|i | xi , zi}|
6 |{i | xi , yi} ∪ {i | yi , zi}|
6 |{i | xi , yi}| + |{i | yi , zi}|
= d(x, y) + d(y, z).
Now we can talk about error detection and correction.
Definition. Suppose C is a code of length n over an alphabetA and t > 0.
• C is t-error-detecting if d(x, y) > t for any x , y in C.
• C is t-error-correcting if there do not exist distinct words x, y ∈ C and z ∈ An such that
d(x, z) 6 t and d(y, z) 6 t.
Introduction and definitions 3
Informally, a code is t-error-detecting if, whenever we take a word from the code and change
at most t of the symbols in it, we don’t reach a different word in the code. So if we send the
new word to someone, he’ll be able to tell that we’ve changed some symbols.
A code is t-error-correcting if whenever we take a word in the code and change at most t of
the symbols in it, we don’t reach a different word in the code, and we don’t even reach a word
which can be obtained from a different starting word by changing at most t of the symbols. So
if we send the new word to someone without telling him which symbols we changed, he will
be able to tell us which word from the code we started with.
In fact, rather than thinking about error detection and correction, we’ll talk about minimum
distance.
Definition. IfC is a code, we define its minimum distance d(C) to be the smallest distance between
distinct words in C:
d(C) = min{d(x, y) | x , y ∈ C}.
By definition, C is t-error-detecting if and only if d(C) > t. We can also express error
correction in terms of minimum distance.
Lemma 1.2. A code C is t-error-correcting if and only if d(C) > 2t.
Proof. First suppose C is not t-error-correcting. This means that we can find words x, y ∈ C
with x , y, and another word z (not necessarily in C) such that d(x, z) 6 t and d(y, z) 6 t. By the
triangle inequality, we get d(x, y) 6 2t, and therefore d(C) 6 2t.
Conversely, suppose d(C) 6 2t; we’ll show that C is not t-error-correcting. Take words
x , y ∈ C such that d(x, y) = d 6 2t. Let i1, i2, . . . , id be the positions where x and y are different.
Define a word z as follows: set zi = xi for i ∈ {i1, . . . , it}, set zi = yi for i ∈ {it+1, . . . , id}, and set
zi = xi = yi for all other i. Then x and z differ only in positions it+1, . . . , id, so d(x, z) = d− t 6 t; y
and z differ only in positions i1, . . . , it, so d(y, z) = t. And so C is not t-error-correcting.
Key example. Suppose A is any alphabet, and n > 0. The repetition code of length n over A
simply consists of all words aa . . . a, for a ∈ A. For this code, any two distinct words differ in
every position, and so d(x, y) = n for all x , y in C. So the code is t-error-detecting for every
t 6 n − 1, and is t-error-correcting for every t 6 n−12 .
Terminology. A q-ary (n,M, d)-code means a code of length n over a q-ary alphabet, containing
exactly M words, and with minimum distance at least d.
Equivalent codes
Definition. If C andD are codes of length n over an alphabet A, we say that C is equivalent to
D if we can get from C toD by a combination of the following operations.
Operation 1 – permutation of the positions Choose a permutation σ of {1, . . . ,n}, and for any
word v = v1 . . . vn inAn define
vσ = vσ(1) . . . vσ(n).
Now replace Cwith the code
Cσ = {vσ | v ∈ C}.
4 Coding Theory
Operation 2 – applying a permutation ofA in a fixed position Choose a permutation f ofA
and an integer i ∈ {1, . . . ,n}, and for v = v1 . . . vn define
v f ,i = v1 . . . vi−1 f (vi)vi+1 . . . vn.
Now replace Cwith the code
C f ,i = {v f ,i | v ∈ C}.
The point of equivalence is that equivalent codes have the same size and the same minimum
distance; we can often simplify both decoding procedures and some of our proofs by replacing
codes with equivalent codes.
Lemma 1.3. Suppose v and w are words of length n over an alphabetA, and suppose σ is a permutation
of {1, . . . ,n}. Define the words vσ and wσ as above. Then
d(vσ,wσ) = d(v,w).
Proof. Write v = v1 . . . vn, w = w1 . . .wn, so that vσ = vσ(1) . . . vσ(n) and w = wσ(1) . . .wσ(n). Now
d(v,w) is the number of positions in which v and w differ, i.e.
d(v,w) =
∣∣∣{ j | v j , w j}∣∣∣;
but σ is a permutation, so each j can be uniquely written in the form σ(i). Hence
d(v,w) =
∣∣∣{i | vσ(i) , wσ(i)}∣∣∣
=
∣∣∣{i | (vσ)i , (wσ)i}∣∣∣
= d(vσ,wσ).
Corollary 1.4. Suppose C is a code of length n over A and σ is a permutation of {1, . . . ,n}. Then
d(Cσ) = d(C) and |C| = |Cσ|.
Proof. First note that for words v,w ∈ C,
v , w⇐⇒ d(v,w) > 0 Lemma 1.3⇐⇒ d(vσ,wσ) > 0⇐⇒ vσ , wσ. (∗)
Hence
d(Cσ) = min{d(x, y) | x , y ∈ Cσ}
= min{d(vσ,wσ) | v , w ∈ C}
= min{d(v,w) | v , w ∈ C} by Lemma 1.3
= d(C).
To show that |Cσ| = |C|, we consider the function
φ : C −→ Cσ
v 7−→ vσ,
which we claim is a bijection. φ is surjective, because Cσ is defined to be the image of φ. φ is
also injective, because if v , w ∈ C, then by (∗) we have vσ , wσ, i.e. φ(v) , φ(w).
So φ is a bijection, so |C| = |Cσ|.
Introduction and definitions 5
Now we prove the same properties for Operation 2.
Lemma 1.5. Suppose v and w are words in An. Suppose f is a permutation of A and i ∈ {1, . . . ,n},
and define the words v f ,i and w f ,i as above. Then
d(v f ,i,w f ,i) = d(v,w).
Proof. Recall that (
v f ,i
)
j
=
v j ( j , i)f (vi) ( j = i).
We consider two cases.
• If vi = wi, then f (vi) = f (wi), so
d(v,w) =
∣∣∣{ j , i | v j , w j}∣∣∣ = d(v f ,i,w f ,i).
• If vi , wi, then (since f is injective) f (vi) , f (wi), and so
d(v,w) =
∣∣∣{ j , i | v j , w j}∣∣∣ + 1 = d(v f ,i,w f ,i).
Corollary 1.6. Suppose C is a code of length n overA, f is a permutation ofA and i ∈ {1, . . . ,n}. Then
d(C f ,i) = d(C) and |C f ,i| = |C|.
Proof. Copy the proof of Corollary 1.4.
Corollary 1.7. Suppose C is an (n,M, d)-code over A, and D is a code equivalent to C. Then D is an
(n,M, d)-code.
Proof. Obviously D is a code of length n. We get from C to D by repeatedly applying
Operation 1 and/or 2, and by Corollaries 1.4 and 1.6 these operations do not change the size or
the minimum distance of a code.
6 Coding Theory
2 Good codes
2.1 The main coding theory problem
The most basic question we might ask about codes is: given q, n, M and d, does a q-ary
(n,M, d)-code exist? More precisely, we make the following definition.
Definition. Suppose q,n, d > 0. Aq(n, d) is defined to be the maximum M such that a q-ary
(n,M, d)-code exists.
The numbers Aq(n, d) are unknown in general, and calculating them is often referred to as
the ‘main coding theory problem’. Here are two very special cases.
Theorem 2.1.
1. Aq(n, 1) = qn.
2. Aq(n,n) = q.
Proof.
1. We can take C = An, the set of all words of length n. Any two distinct words must differ
in at least one position, so the code has minimum distance at least 1. Obviously a q-ary
code of length n can’t be bigger than this.
2. Suppose we have a code of length n with at least q + 1 words. Then by the pigeonhole
principle there must be two words with the same first symbol. These two words can
therefore differ in at most n − 1 positions, and so the code has minimum distance less
than n. So Aq(n,n) 6 q. On the other hand, the repetition code described above is an
(n, q,n)-code.
From the above proof we observe the following general principle, which is very useful in
proofs.
• Aq(n, d) > A if and only if a q-ary (n,A, d)-code exists.
• Aq(n, d) 6 A if and only if there does not exist a q-ary (n,M, d)-code with M > A.
2.2 The Singleton bound
Now we prove our first substantial theorem.
Theorem 2.2 (Singleton bound).
1. Suppose n, d > 1. If a q-ary (n,M, d)-code exists, then a q-ary (n − 1,M, d − 1)-code exists.
2. Suppose n, d > 1. Then Aq(n, d) 6 Aq(n − 1, d − 1).
3. Suppose n, d > 1. Then Aq(n, d) 6 qn−d+1.
Proof.
1. Let C be a q-ary (n,M, d)-code, and for x ∈ C, let x be the word obtained by deleting the
last symbol. Let C = {x | x ∈ C}.
Claim. If x , y ∈ C, then d(x, y) > d − 1.
Good codes 7
Proof. We have d(x, y) > d, so x and y differ in at least d positions; so there are at
least d − 1 positions other than position n where x and y differ. Hence
d − 1 6
∣∣∣{i < n | xi , yi}∣∣∣ = d(x, y).
The first consequence of the claim is that, since d > 1, x and y are distinct when x and y are.
So |C| = M. The second consequence is that d(C) > d − 1. So C is an (n − 1,M, d − 1)-code.
2. To show that Aq(n, d) 6 Aq(n−1, d−1), take an (n,M, d)-codeCwith M = Aq(n, d). Then by
part (1) there is an (n− 1,M, d− 1)-code, which means that Aq(n− 1, d− 1) >M = Aq(n, d).
3. Applying part (2) repeatedly, we have
Aq(n, d) 6 Aq(n − 1, d − 1) 6 Aq(n − 2, d − 2) 6 · · · 6 Aq(n − d + 1, 1) = qn−d+1,
from Theorem 2.1.
Now we prove another result which only applies to binary codes. We begin with a lemma.
Lemma 2.3. Suppose x and y are words over the alphabet {0, 1}, both containing an even number of 1s.
Then d(x, y) is even.
Proof.
d(x, y) =
∣∣∣∣{i ∣∣∣ xi = 1 and yi = 0 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ xi = 0 and yi = 1 }∣∣∣∣
=
∣∣∣∣{i ∣∣∣ xi = 1 and yi = 0 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 1 }∣∣∣∣
+
∣∣∣∣{i ∣∣∣ xi = 0 and yi = 1 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 1 }∣∣∣∣
− 2
∣∣∣∣{i ∣∣∣ xi = 1 and yi = 1 }∣∣∣∣
=
∣∣∣∣{i ∣∣∣ xi = 1 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ yi = 1 }∣∣∣∣ − 2 ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 1 }∣∣∣∣
which is the sum of three even numbers, so is even.
Theorem 2.4. Suppose n, d > 0 and d is even. If a binary (n − 1,M, d − 1)-code exists, then a binary
(n,M, d)-code exists.
Hence if d is even, then A2(n, d) > A2(n − 1, d − 1).
Proof. Suppose we have a binary (n− 1,M, d− 1)-code C. Given a word x ∈ C, we form a word
xˆ of length n by adding an extra symbol, which we choose to be 0 or 1 in such a way that xˆ
contains an even number of 1s.
Now for any x , y ∈ C, we have d(x, y) > d − 1, and clearly this gives d(xˆ, yˆ) > d − 1. But
d − 1 is odd, and by Lemma 2.3 d(xˆ, yˆ) is even, so in fact we have d(xˆ, yˆ) > d. So the code
Cˆ = {xˆ | x ∈ C}
is an (n,M, d)-code.
If we do this with M = A2(n − 1, d − 1), we get
A2(n, d) >M = A2(n − 1, d − 1).
As a consequence of this theorem and the Singleton bound, we have A2(n, d) = A2(n−1, d−1)
whenever d is even.
8 Coding Theory
2.3 The Hamming bound
Lemma 2.5. Suppose v is a word of length n over a q-ary alphabet A, and d > 0. Then the number of
words w ∈ An such that d(v,w) = d is (q − 1)d(nd).
Proof. In order to write a word w such that d(v,w) = d, we must choose d positions where v
and w will be different, and then we must choose which symbols will appear in these positions
in w. We can choose the positions in
(n
d
)
ways; in each of these positions, we can choose any
symbol from A other than the symbol appearing in this position in v; this leaves q − 1 choices
for each position. So the total number of choices is (q − 1)d(nd).
Definition. If x ∈ An is a word, then the sphere of radius t and centre x is
S(x, t) =
{
y ∈ An
∣∣∣ d(x, y) 6 t} .
Lemma 2.6. If A is a q-ary alphabet, x is a word over A of length n and t 6 n, then the sphere S(x, t)
contains exactly (
n
0
)
+ (q − 1)
(
n
1
)
+ (q − 1)2
(
n
2
)
+ · · · + (q − 1)t
(
n
t
)
words.
Proof. This follows from Lemma 2.5, by summing over d = 0, . . . , t.
Lemma 2.7. Suppose C is a code of length n over A. C is t-error-correcting if and only if for all
x , y ∈ C, the spheres S(x, t) and S(y, t) are disjoint.
Proof.
C is t-error-correcting⇐⇒ @ x , y ∈ C, z ∈ An such that d(x, z) 6 t > d(y, z)
⇐⇒ @ x , y ∈ C, z ∈ An such that z ∈ S(x, t) and z ∈ S(y, t)
⇐⇒ @ x , y ∈ C, such that S(x, t) ∩ S(y, t) , ∅
⇐⇒ ∀ x , y ∈ C, S(x, t) ∩ S(y, t) = ∅.
Theorem 2.8. Suppose A is a q-ary alphabet, and C is a code of length n over A which is t-error-
correcting. Then
|C| 6 q
n(n
0
)
+ (q − 1)
(n
1
)
+ (q − 1)2
(n
2
)
+ · · · + (q − 1)t
(n
t
) .
Proof. Each word in C has a sphere of radius t around it, and by Lemma 2.7 these spheres are
disjoint. So the total number of words in all these spheres together is
|C| ×
((n
0
)
+ (q − 1)
(n
1
)
+ (q − 1)2
(n
2
)
+ · · · + (q − 1)t
(n
t
))
,
and this can’t be bigger than the total number of words in An, which is qn.
Corollary 2.9 (Hamming bound). For n, q, t > 0,
Aq(n, 2t + 1) 6
qn(n
0
)
+ (q − 1)
(n
1
)
+ (q − 1)2
(n
2
)
+ · · · + (q − 1)t
(n
t
) .
Good codes 9
Proof. If C is a q-ary (n,M, 2t + 1)-code, then by Lemma 1.2 C is t-error-correcting. So by
Theorem 2.8
|C| 6 q
n(n
0
)
+ (q − 1)
(n
1
)
+ (q − 1)2
(n
2
)
+ · · · + (q − 1)t
(n
t
) .

Definition. Suppose C is a q-ary code of length n, and t > 0. C is a perfect t-error-correcting code
if d(C) = 2t + 1 and
|C| = q
n(n
0
)
+ (q − 1)
(n
1
)
+ (q − 1)2
(n
2
)
+ · · · + (q − 1)t
(n
t
) .
2.4 The Plotkin bound
The Plotkin bound is more complicated, but more useful. There is a version for arbitrary q,
but we’ll address only binary codes.
First we need to recall some notation: remember that if x ∈ R, then bxc is the largest integer
which is less than or equal to x.
Lemma 2.10. If x ∈ R, then b2xc 6 2bxc + 1.
Proof. Let y = bxc; then x < y + 1, so 2x < 2y + 2, and so b2xc < 2y + 2. But both sides of this
inequality are integers, so we get b2xc 6 2y + 1.
Now we can state the Plotkin bound – there are two cases, depending on whether d is even
or odd.
Theorem 2.11 (Plotkin bound).
1. If d is even and d 6 n < 2d, then
A2(n, d) 6 2

d
2d − n

.
2. If d is odd and d 6 n < 2d + 1, then
A2(n, d) 6 2

d + 1
2d + 1 − n

.
We now begin the proof of this bound. The proof is not examinable, but you are encouraged
to read it.
Non-examinable material
Lemma 2.12. Suppose N,M are integers with 0 6 N 6M. Then
N(M −N) 6

M2
4
(if M is even)
M2 − 1
4
(if M is odd).
10 Coding Theory
Proof. The graph of N(M −N) is an unhappy quadratic with its turning point at N = M/2, so
to maximise it we want to make N as near as possible to this (remembering that N must be an
integer). If M is even, then we can take N = M/2, while if M is odd we take N = (M − 1)/2.
The proof is a double-counting argument. We suppose that our alphabet is {0, 1}, and
if v = (v1 . . . vn) and w = (w1 . . .wn) are words in C, then we define v + w to be the word
(v1 + w1)(v2 + w2) . . . (vn + wn), where we do addition modulo 2 (so 1 + 1 = 0).
Now given a binary (n,M, d)-code C, we write down an (M2 ) by n array B whose rows are all
the words v + w for pairs of distinct words v,w ∈ C. We’re going to count the number of 1s in
this array in two different ways.
A really useful feature of the addition operation we have defined on words is the following.
Lemma 2.13. Suppose v,w are binary words of length n. Then d(v,w) is the number of 1s in v + w.
Proof. By looking at the possibilities for vi and wi, we see that
(v + w)i =
0 (if vi = wi)1 (if vi , wi).
So
d(v,w) = |{i | vi , wi}|
= |{i | (v + w)i = 1}|.
Lemma 2.14. The number of 1s in B is at least d
(M
2
)
.
Proof. We look at the number of 1s in each row. Since d(C) > d, we have d(v,w) > d for each
v , w ∈ C, so by Lemma 2.13 there are at least d 1s in each row. Summing over all the rows
gives the result.
Now we count the 1s in B in a different way.
Lemma 2.15. The number of 1s in B is at mostn
M2
4
(if M is even)
nM
2 − 1
4
(if M is odd).
Proof. We count the number of 1s in each column. Choose j ∈ {1, . . . ,n}. The word v + w has
a 1 in the jth position if and only if one of v and w has a 1 in the jth position, and the other
has a 0. If we let N be the number of words in C which have a 1 in the jth position, then the
number ways of choosing a pair v,w such that v + w has a 1 in the jth position is N(M −N). So
the number of 1s in the jth column of our array is
N(M −N) 6

M2
4
(if M is even)
M2 − 1
4
(if M is odd)
by Lemma 2.12. This is true for every j, so by adding up we obtain the desired inequality.
Good codes 11
Proof of the Plotkin bound.
1. Suppose we have a binary (n,M, d)-code C with M = A2(n, d), and construct the array B
as above. There are two cases, according to whether M is even or odd.
Case 1: M even By combining Lemma 2.14 and Lemma 2.15, we get
d
(
M
2
)
6
nM2
4
=⇒ dM(M − 1)
2
6
nM2
4
.
Since M > 0, we get
2d(M − 1) 6 nM
=⇒ (2d − n)M 6 2d.
By assumption 2d − n > 0, so we divide both sides by 2d − n to get
M 6
2d
2d − n .
Since M is an integer, we can say
M 6

2d
2d − n

6 2

d
2d − n

+ 1.
But M is even and the right-hand side is odd, so
M 6 2

d
2d − n

.
Case 2: M odd Now we combine Lemma 2.14 and 2.15 to get
d
(
M
2
)
6
n(M2 − 1)
4
=⇒ dM(M − 1)
2
6
n(M + 1)(M − 1)
4
.
Assuming M > 2, we get
2dM 6 n(M + 1)
=⇒ (2d − n)M 6 n.
Again we have 2d − n > 0, so we get
M 6
n
2d − n =
2d
2d − n − 1.
12 Coding Theory
Since M is an integer, we get
M 6

2d
2d − n

− 1 6 2

d
2d − n

+ 1 − 1,
so
A2(n, d) = M 6 2

d
2d − n

.
2. Now we consider the case where d is odd; but this follows by Theorem 2.4. If d is odd
and n < 2d + 1, then d + 1 is even and (n + 1) < 2(d + 1). So by the even case of the Plotkin
bound we have
A2(n + 1, d + 1) 6

(d + 1)
2(d + 1) − (n + 1)

,
and by Theorem 2.4 this equals A2(n, d).
End of non-examinable material
Error probabilities and nearest-neighbour decoding 13
3 Error probabilities and nearest-neighbour decoding
3.1 Noisy channels and decoding processes
In this section, we consider the situations in which our codes might be used, and show why
we try to get a large distance between words. The idea is that we choose a code C, choose a
word v ∈ C, and transmit v along a ‘noisy channel’, which might introduce a few errors and
produce a distorted word. The person receiving this word knows what code C we’re using,
and ‘guesses’ which word in Cwe started from.
Definition. Given a code C of length n over the alphabetA, a decoding process for C is a function
fromAn to C. A nearest-neighbour decoding process for C is a function f : An → C such that
d(v, f (v)) 6 d(v,w) for all v ∈ An, w ∈ C.
What this means is that a nearest-neighbour decoding process send a word to the nearest
possible word in C.
We make certain assumptions about our noisy channel, namely that all errors are indepen-
dent and equally likely. This means that there is a fixed p (the symbol error probability) such that
any symbol will be transmitted correctly with probability 1 − p, or incorrectly with probability
p, and that if there is an error then all the incorrect symbols are equally likely. Moreover, errors
on different symbols are independent – whether an error occurs in one symbol has no effect on
whether errors occur in later symbols.
Given a code C and a decoding process f : An → C, we consider the word error probability
for each v ∈ C: this is the probability that if we transmit the word v through our channel, we
end up with a word w such that f (w) , v.
Example. Suppose A = {0, 1}, and C = {000, 011, 110}. A nearest-neighbour decoding process
for C is
f : 000 7−→ 000
001 7−→ 011
010 7−→ 011
011 7−→ 011
100 7−→ 000
101 7−→ 110
110 7−→ 110
111 7−→ 110
(This is not the only possible choice of a nearest-neighbour decoding process.)
Suppose out noisy channel has symbol error probability 15 . Let’s work out the word error
probability for the word 000. This is the probability that when we transmit 000, the word w
that comes out has f (w) , 000. This is
1 − P( f (w) = 000) = 1 − P(w = 000 or 100)
= 1 − 4
5
.
4
5
.
4
5
− 1
5
.
4
5
.
4
5
=
9
25
.
14 Coding Theory
In general, word error probability depends on the symbol error probability, the code, the
decoding process and the word chosen, and we seek a decoding process which minimises the
maximum word error probability. It can be shown that (as long as p is small) the best decoding
process in this respect is always a nearest-neighbour decoding process.
3.2 Rate, capacity and Shannon’s Theorem
Definition. Given a q-ary (n,M, d)-code C, we define the rate of C to be
logq M
n
.
The rate of a code can be interpreted as the ratio of ‘useful information’ to ‘total information’
transmitted. For example, the q-ary repetition code of length 3 is a (3, q, 3)-code, so has rate 13 .
The useful information in a word can be thought of as the first digit – the rest of the digits are
just redundant information included to reduce error probabilities.
Definition. The capacity of a symmetric binary channel with symbol error probability p is
1 + p log2 p + (1 − p) log2(1 − p).
The capacity of a channel is supposed to be a measure of how good the channel is for
communicating. Note that if p = 12 , then the capacity is 0. This reflects the fact that it is hopeless
transmitting through such a channel – given a received word, the word sent is equally likely
to be any word in the code.
Theorem 3.1 (Shannon’s Theorem). Suppose we have a noisy channel with capacity C. Suppose
and ρ are positive real numbers with ρ < C. Then for any sufficiently large n, there exists a binary code
of length n and rate at least ρ and a decoding process such that the word error probability is at most .
What does the theorem say? It says that as long as the rate of our code is less than the
capacity of the channel, we can make the word error probability as small as we like. The proof
of this theorem is well beyond the scope of this course.
Linear codes 15
4 Linear codes
For the rest of the course, we shall be restricting our attention to linear codes; these are
codes in which the alphabetA is a finite field, and the code itself forms a vector space overA.
These codes are of great interest because:
• they are easy to describe – we need only specify a basis for our code;
• it is easy to calculate the minimum distance of a linear code – we need only calculate the
distance of each word from the word 00 . . . 0;
• it is easy to decode an error-correcting linear code, via syndrome decoding;
• many of the best codes known are linear; in particular, every known non-trivial perfect
code has the same parameters (i.e. length, number of words and minimum distance) as
some linear code.
Non-examinable material
4.1 Revision of linear algebra
Definition. A field is a set F with distinguished elements 0 and 1 and binary operations + and
× such that:
• F is an abelian group under +, with identity element 0 (that is, we have
a + b = b + a,
(a + b) + c = a + (b + c),
a + 0 = a,
there exists an element −a of F such that −a + a = 0
for all a, b, c ∈ F);
• F \ {0} is an abelian group under ×, with identity element 1 (that is, we have
a × b = b × a,
(a × b) × c = a × (b × c),
a × 1 = a,
there exists an element a−1 of F such that a−1 × a = 1
for all a, b, c ∈ F \ {0});
• a × (b + c) = (a × b) + (a × c) for all a, b, c ∈ F.
Definition. If F is a field, a vector space over F is a set V with a distinguished element 0, a binary
operation + and a function × : (F × V)→ V (that is, a function which, given an element λ of F
and an element v of V, produces a new element λ × v of V) such that:
• V is an abelian group under + with identity 0;
16 Coding Theory
• for all λ, µ ∈ F and u, v ∈ V, we have
(λ × µ) × v = λ × (µ × v),
(λ + µ) × v = (λ × v) + (µ × v),
λ × (u + v) = (λ × u) + (λ × v),
1 × v = v.
We make all the familiar notational conventions: we may write a × b as ab; we write a × b−1
as a/b; we write a + (−b) as a − b.
Definition. If V is a vector space over F, then a subspace is a subset of V which is also a vector
space under the same operations; we write W 6 V to mean that W is a subspace of V.
To check whether a subset W ⊆ V is a subspace, we can use the Subspace Test: W is a
subspace if and only if
• 0 ∈W,
• v + w ∈W for all v,w ∈W, and
• λv ∈W for all v ∈W, λ ∈ F.
Definition. Suppose V is a vector space over F and v1, . . . , vn ∈ V. Then we say that v1, . . . , vn
are linearly independent if there do not not exist λ1, . . . , λn ∈ Fwhich are not all zero such that
λ1v1 + · · · + λnvn = 0.
A linear combination of v1, . . . , vn is an expression λ1v1 + · · · + λnvn, where λ1, . . . , λn ∈ F. The
set of all linear combinations of v1, . . . , vn is called the span of v1, . . . , vn, written as 〈v1, . . . , vn〉.
If 〈v1, . . . , vn〉 = V, we say that v1, . . . , vn span V.
A basis for V is a set {v1, . . . , vn} of vectors in V which are linearly independent and span V.
Fact. If {v1, . . . , vn} is a basis of a vector space V, then every v ∈ V can be uniquely written in the
form λ1v1 + · · · + λnvn, for λ1, . . . , λn ∈ V.
Fact. Any vector space V has at least one basis, and all bases of V have the same size.
Definition. If V is a vector space, the dimension of V (written dim(V)) is the number of elements
in a basis of V.
Useful fact. If V is a vector space of dimension n and v1, . . . , vn are vectors in V which are
linearly independent or span V, then {v1, . . . , vn} is a basis of V.
Definition. Suppose V,W are vector spaces over F. A linear map from V to W is a function
α : V →W such that
α(λu + µv) = λα(u) + µα(v)
for all λ, µ ∈ F and u, v ∈ V. If α is a linear map, the kernel of α is the subset
ker(α) = {v ∈ V | α(v) = 0}
Linear codes 17
of V, and the image of α is the subset
Im(α) = {α(v) | v ∈ V}
of W. ker(α) is a subspace of V, and we refer to its dimension as the nullity of α. Im(α) is a
subspace of W, and we refer to its dimension as the rank of α. The Rank–nullity Theorem says
that if α is a linear map from V to W, then
nullity(α) + rank(α) = dim(V).
We shall need the following familiar property of fields.
Lemma 4.1. Let F be a field, and a, b ∈ F.
1. a0 = 0.
2. If ab = 0, then a = 0 or b = 0.
Proof.
1. We have 0 = 0 + 0, so a0 = a(0 + 0) = a0 + a0. Adding −(a0) to both sides gives 0 = a0.
2. Suppose ab = 0 but b , 0. Then we have bb−1 = 1. So
0 = 0b−1 = abb−1 = a1 = a.
We shall only be interested in one particular type of vector space. For a non-negative integer
n, we consider the set Fn, which we think of as the set of column vectors of length n over F,
with operations 
x1
...
xn
 +

y1
...
yn
 =

x1 + y1
...
xn + yn

and
λ

x1
...
xn
 =

λx1
...
λxn
 .
Fn is a vector space over F of dimension n. Sometimes we will think of the elements of Fn as
row vectors rather than column vectors, or as words of length n over F.
Given m,n and an n ×m matrix A over F, we can define a linear map α : Fm → Fn by
α

x1
...
xm
 = A

x1
...
xm
 =

A11x1 + · · · + A1mxm
...
An1x1 + · · · + Anmxm
 .
Every linear map from Fm to Fn arises in this way. The rank of A is defined to be the rank of this
linear map. The column rank of A is defined to be the dimension of 〈c1, . . . , cm〉, where c1, . . . , cm
are the columns of A regarded as vectors in Fn, and the row rank is defined to be the dimension
of 〈r1, . . . , rn〉, where r1, . . . , rn are the rows of A regarded as (row) vectors in Fm.
Fact. If A is an n ×m matrix and α : Fm → Fn is the corresponding linear map, then
rank(α) = column rank(A) = row rank(A).
End of non-examinable material
18 Coding Theory
4.2 Finite fields and linear codes
The examples of fields you are most familiar with are Q, R and C. But these are of no
interest in this course – we are concerned with finite fields. The classification of finite fields goes
back to Galois.
Theorem 4.2. Let q be an integer greater than 1. Then a field of order q exists if and only if q is a prime
power, and this field is unique.
If q is a prime power, then we refer to the unique field of order q as Fq. For example, if q is
actually a prime, then Fq simply consists of set {0, 1, . . . , q−1}, with addition and multiplication
mod q. If q is a prime power but not a prime, then the field Fq is not of this form – you don’t
get a field if you define it like this. Actually, it’s awkward to describe what the field Fq looks
like without developing lots of theory. But this doesn’t matter – all that matters for us is that
Fq is a field.
Definition. Suppose q is a prime power. A linear code over Fq is a subspace of Fnq .
4.3 The minimum distance of a linear code
One of the advantages of linear codes that we mentioned earlier is that it’s easy to find the
minimum distance of a linear code.
Definition. Suppose q is a prime power and w ∈ Fnq . The weight weight(w) of w is defined to be
the number of non-zero symbols in w.
Lemma 4.3. Suppose q is a prime power, w, x, y ∈ Fnq and λ is a non-zero element of Fq. Then:
1. d(w + y, x + y) = d(w, x);
2. d(λw, λx) = d(w, x);
3. d(w, x) = weight(w − x).
Proof.
1. d(w + y, x + y) = |{i | (w + y)i , (x + y)i}|
= |{i | wi + yi , xi + yi}|.
We have wi + yi = xi + yi if and only if wi = xi (because we can add ±yi to both sides), so
d(w + y, x + y) = |{i | wi , xi}|
= d(w, x).
2. d(λw, λx) = |{i | (λw)i , (λx)i}|
= |{i | λwi , λxi}|.
We have λwi = λxi if and only if wi = xi (because we can multiply both sides by λ or λ−1),
so
d(λw, λx) = |{i | wi , xi}|
= d(w, x).
Linear codes 19
3. By definition weight(w − x) = d(w − x, 00 . . . 0), which equals d(w − x, x − x), and by part
(1) this equals d(w, x).
Corollary 4.4. Suppose C is a linear code over Fq. The minimum distance of C equals the minimum
weight of a non-zero word in C.
Proof. Let D be the minimum weight of a non-zero word in C. If we let w be a word in C of
weight D, then we have D = weight(w) = d(w, 00 . . . 0). Since w, 00 . . . 0 are different words in
C, we must have d(C) 6 D.
Now let x, y be words in C such that d(x, y) = d(C). Then we have x − y ∈ C because C is
closed under subtraction, and weight(x − y) = d(x, y) = d(C). So D 6 d(C).
So we have D > d(C) and D 6 d(C), so D = d(C).
Key example. Let q be any prime power, n > 2 an integer, and define the parity-check code of
length n over Fq to be
C =
{
v ∈ Fnq
∣∣∣ v1 + · · · + vn = 0} .
Then C is linear (apply the Subspace Test), and C has minimum distance 2. This is beacue C
contains word of weight 2, namely 1(−1)00 . . . 0, but C does not contains any words of weight
1: if v ∈ Fnq has weight 1, then there is a unique i sch that vi , 0. But then v1 + · · · + vn = vi , 0,
so v < C.
4.4 Bases and generator matrices
Lemma 4.5. A vector space V of dimension k over Fq contains exactly qk elements.
Proof. Suppose {e1, . . . , ek} is a basis for V. Then and v ∈ V can be written uniquely in the form
λ1e1 + λ2e2 + · · · + λkek,
for some choice of λ1, . . . , λk ∈ Fq. So the number of vectors in V is the number of expressions
of this form, i.e. the number of choices of λ1, . . . , λk. Now there are q ways to choose each λi
(since there are q elements of Fq to choose from), and so the total number of choices of these
scalars is qk.
Notation. Suppose q is a prime power. A linear [n, k]-code over Fq is a linear code over Fq of
length n and dimension k. If in addition the code has minimum distance at least d, we say that
C is a linear [n, k, d]-code.
As a consequence of Lemma 4.5, we see that a linear [n, k, d]-code over Fq is a q-ary (n, qk, d)-
code.
Definition. Suppose C is a linear [n, k]-code over Fq. A generator matrix for C is a k × n matrix
with entries in Fq, whose rows form a basis for C.
4.5 Equivalence of linear codes
Recall the definition of equivalent codes from earlier: codes C and D over an alphabet A
are equivalent if we can get from C toD via a sequence of Operations 1 and 2.
There’s a slight problem with applying this to linear codes, which is that if C is linear and
D is equivalent to C, thenD need not be linear. Operation 1 is OK, as we shall now show.
20 Coding Theory
Lemma 4.6. Suppose C is a linear [n, k, d]-code over Fq, and σ is a permutation of {1, . . . ,n}. Then the
map
φ : C −→ Fnq
v 7−→ vσ
is linear, and Cσ is a linear [n, k, d]-code.
Proof. Suppose v,w ∈ C and λ, µ ∈ Fq. We need to show that
φ(λv + µw) = λφ(v) + µφ(w),
i.e. (
φ(λv + µw)
)
j
=
(
λφ(v) + µφ(w)
)
j
for every j ∈ {1, . . . ,n}. We have(
φ(λv + µw)
)
j
=
(
(λv + µw)σ
)
j
= (λv + µw)σ( j)
= (λv)σ( j) + (µw)σ( j)
= λ(vσ( j)) + µ(wσ( j))
= λ(vσ) j + µ(wσ) j
= λ(φ(v)) j + µ(φ(w)) j
=
(
λφ(v)
)
j
+
(
µφ(w)
)
j
=
(
λφ(v) + µφ(w)
)
j
,
as required.
Now Cσ is by definition the image of φ, and so is a subspace of Fnq , i.e. a linear code.
We know that d(Cσ) = d(C) from before, and that |Cσ| = |C|. By Lemma 4.5 this implies that
dimCσ = dimC = k, so Cσ is an [n, k, d]-code.
Unfortunately, Operation 2 does not preserve linearity. So we define the following.
Operation 2′ SupposeC is a linear code of length n overFq. Choose i ∈ {1, . . . ,n} and a ∈ Fq\{0}.
For v = v1 . . . vn ∈ Fnq define
va,i = v1 . . . vi−1(avi)vi+1 . . . vn.
Now replace Cwith the code
Ca,i = {va,i | v ∈ C}.
We want to show that Operation 2′ preserves linearity, dimension and minimum distance. We
begin by showing that it’s a special case of Operation 2.
Linear codes 21
Lemma 4.7. If F is a field and a ∈ F \ {0}, then the map
f : F −→ F
x 7−→ ax
is a bijection, i.e. a permutation of F.
Proof. Since a , 0 and F is a field, a has an inverse a−1. So f has an inverse
x 7−→ a−1x,
and hence is a bijection.
Now we show that the operation which sends v to va,i is linear, which will mean that it
sends linear codes to linear codes.
Lemma 4.8. Suppose C is a linear [n, k, d]-code over Fq, i ∈ {1, . . . ,n} and 0 , a ∈ Fq. Then the map
φ : C −→ Fnq
v 7−→ va,i
is linear, and Ca,i is a linear [n, k, d]-code over Fq.
Proof. Take v,w ∈ C and λ, µ ∈ Fq. We must show that(
φ(λv + µw)
)
j
=
(
λφ(v) + µφ(w)
)
j
for each j ∈ {1, . . . ,n}. For j , i we have(
φ(λv + µw)
)
j
= (λv + µw) j
= (λv) j + (µw) j
= λv j + µw j
= λ(φ(v)) j + µ(φ(w)) j
= (λφ(v)) j + (µφ(w)) j,
=
(
λφ(v) + µφ(w)
)
j
,
while for j = i we have (
φ(λv + µw)
)
j
= a
(
λv + µw
)
j
= a
(
(λv) j + (µw) j
)
= a(λv j + µw j)
= aλv j + aµw j
= λav j + µaw j
= λ(φ(v) j) + µ(φ(w) j)
= (λφ(v)) j + (µφ(w)) j
=
(
λφ(v) + µφ(w)
)
j
,
22 Coding Theory
as required.
Now Ca,i is by definition the image of φ, and this is a subspace of Fnq , i.e. a linear code. We
know from before (since Operation 2′ is a special case of Operation 2) that d(Ca,i) = d(C) and
|Ca,i| = |C|, and this gives dimCa,i = dimC = k, so that Ca,i is a linear [n, k, d]-code.
In view of these results we re-define equivalence for linear codes.
Definition. Suppose C andD are linear codes over Fq. C andD are equivalent (as linear codes)
if we can get from one to the other by applying Operations 1 and 2′ repeatedly.
As a consequence of Lemma 4.6 and Lemma 4.8, we see that if two linear codes are equiva-
lent, then they have the same dimension and minimum distance.
Now we consider the relationship between equivalence and generator matrices.
We define the following operations on matrices over Fq:
MO1. permuting the rows;
MO2. multiplying a row by a non-zero element of Fq;
MO3. adding a multiple of a row to another row.
You should recognise these as the ‘elementary row operations’ from Linear Algebra. Their
importance is as follows.
Lemma 4.9. Suppose C is a linear [n, k]-code with generator matrix G. If the matrix H can be obtained
from G by applying one of MO1–MO3, then H is also a generator matrix for C.
Proof. Since G is a generator matrix forC, we know that the rows of G are linearly independent
and span C. So G has rank k (the number of rows) and row space C. We know from linear
algebra that elementary row operations do not affect the rank or the row space of a matrix, so
H also has rank k and row space C. So the rows of H are linearly independent and span C, so
form a basis for C, i.e. H is a generator matrix for C.
Now we define two more matrix operations:
MO4. permuting the columns;
MO5. multiplying a column by a non-zero element of Fq.
Lemma 4.10. Suppose C is a linear [n, k]-code over Fq, with generator matrix G. If the matrix H is
obtained from G by applying MO4 or MO5, then H is a generator matrix for a codeD equivalent to C.
Proof. Suppose G has entries g jl, for 1 6 j 6 k and 1 6 l 6 n. Let r1, . . . , rk be the rows of G, i.e.
r j = g j1g j2 . . . g jn.
By assumption {r1, . . . , rk} is a basis for C; in particular, the rank of G is k.
• Suppose H is obtained using MO4, applying a permutation σ to the columns. This means
that
h jl = g jσ(l),
so row j of H is the word
g jσ(1)g jσ(2) . . . g jσ(n).
But this is the word (r j)σ as defined in equivalence operation 1. So the rows of H lie in the
code Cσ, which is equivalent to C.
Linear codes 23
• Now suppose instead that we obtain H by applying MO5, multiplying column i by
a ∈ Fq \ {0}. This means that
h jl =
g jl (l , i)ag jl (l = i) ,
so that row j of H is the word
g j1g j2 . . . g j(i−1)(ag ji)g j(i+1) . . . g jn.
But this is the word (r j)a,i as defined in equivalence operation 2. So the rows of H lie in
the code Ca,i, which is equivalent to C.
For either matrix operation, we have seen that the rows of H lie in a codeD equivalent toC. We
need to know that they form a basis forD. Since there are k rows and dim(D) = dim(C) = k, it
suffices to show that the rows of H are linearly independent, i.e. to show that H has rank k. But
matrix operations 4 and 5 are elementary column operations, and we know from linear algebra
that these don’t affect the rank of a matrix. So rank(H) = rank(G) = k.
We can summarise these results as follows.
Proposition 4.11. Suppose C is a linear [n, k]-code with a generator matrix G, and that the matrix H
is obtained by applying a combination of matrix operations MO1–5. Then H is a generator matrix for a
codeD equivalent to C.
Proof. Each time we apply MO1–3 we get a new generator matrix for for the same code, by
Lemma 4.9. Each time we apply MO4 or MO5, we get a generator matrix for an equivalent
code, by Lemma 4.10. So a combination of these operations will yield a generator matrix for a
code equivalent to the one we started with.
Note that in the list of matrix operations 1–5, there is a sort of symmetry between rows and
columns. In fact, you might expect that you can do another operation
MO6. adding a multiple of a column to another column
but you can’t. Doing this can take you to a code with a different minimum distance. For
example, suppose q = 2, and that C is the parity-check code of length 3:
C = {000, 011, 101, 110}.
We have seen that d(C) = 2 and that C has a generator matrix
G =
(
101
011
)
.
If we applied MO6 above, adding column 1 to column 2, we’d get the matrix
H =
(
111
011
)
.
This is a generator matrix for the code
{000, 111, 011, 100},
24 Coding Theory
which has minimum distance 1, so is not equivalent to C. So the difference between ‘row
operations’ and ‘column operations’ is critical, and we do not allow MO6.
Armed with these operations, we can define a standard way in which we can write generator
matrices.
Definition. Let G be a k × n matrix over Fq, with k 6 n. We say that G is in standard form if
G = (Ik|B) ,
where Ik is the k × k identity matrix, and B is some k × (n − k) matrix.
Transforming a matrix into standard form
Now we show how to transform a generator matrix into standard form, using MO1–5. So
suppose G is a k × n matrix over Fq, whose rows are linearly independent. Since the rows of
G are linearly independent, G has rank k. MO1–5 do not affect the rank of a matrix, so at any
point in the following procedure the matrix obtained will have rank k, and will therefore have
linearly independent rows.
For i = 1, . . . , k we want to transform column i into
0
...
0
1
0
...
0

,
where the 1 is in position i. Suppose we have already done this for columns 1, . . . , i − 1.
Step 1. Since the rows of our matrix are linearly independent, they are non-zero, so there must
be some non-zero entry in the ith row. Furthermore, by what we know about columns
1, . . . , i − 1, this non-zero entry must occur in one of columns i, . . . ,n. So we apply MO4,
permuting columns i, . . . ,n to get a non-zero entry in the (i, i)-position of our matrix.
Step 2. Suppose the (i, i)-entry of our matrix is a , 0. Then we apply MO5, multiplying column
i of our matrix by a−1, to get a 1 in the (i, i)-position.
Step 3. We now apply MO3, adding multiples of row i to the other rows in order to ‘kill’ the
remaining non-zero entries in column i. Note that this operation does not affect columns
1, . . . , i − 1.
By applying Steps 1–3 for i = 1, . . . , k in turn, we get a matrix in standard form. Note that it is
automatic from the proof that k 6 n.
Proposition 4.12. Suppose C is a linear [n, k]-code over Fq. Then C is equivalent to a code with a
generator matrix in standard form.
Proof. Let G be a generator matrix for C. G has linearly independent rows, so using the
procedure described above we can transform G into a matrix H in standard form using MO1–5.
By Proposition 4.11, H is a generator matrix for a code equivalent to C.
Linear codes 25
4.6 Decoding with a linear code
Definition. Suppose C is a linear [n, k]-code over Fq. For v ∈ Fnq , define
v + C = {v + w | w ∈ C}.
The set v + C is a called a coset of C.
The crucial properties of cosets are as follows.
Proposition 4.13. Suppose C is an [n, k]-code over Fq. Then:
1. every coset of C contains exactly qk words;
2. every word in Fnq is contained in some coset of C;
3. if the word v is contained in the coset u + C, then v + C = u + C;
4. any word in Fnq is contained in at most one coset of C;
5. there are exactly qn−k cosets of C.
Proof.
1. Given a coset v + C, we define
φ : C −→ v + C
w 7−→ v + w,
and we claim that φ is a bijection. φ is certainly surjective, since by definition v + C is
the image of φ. And φ is injective, since if w, x ∈ C with φ(w) = φ(x), then v + w = v + x,
which gives w = x.
So φ is a bijection, so |v + C| = |C| = qk, by Lemma 4.5.
2. Suppose v ∈ Fnq . Since C is linear, we have 00 . . . 0 ∈ C, and so
v = v + 00 . . . 0 ∈ v + C.
3. Since v ∈ u + C, we have v = u + x for some x ∈ C. Now for every w ∈ C, we have
v + w = u + (x + w) ∈ u + C;
so every word in v + C is in u + C, i.e. v + C ⊆ u + C. But |v + C| = |u + C| by (1), so
v + C = u + C.
4. Suppose u ∈ v + C and u ∈ w + C, where u, v,w ∈ Fnq . Then by part (3) we have
v + C = u + C = w + C.
5. There are qn words altogether in Fnq , and by parts (2) and (4) each of them is contained
in exactly one coset of C. Each coset has size qk, and so the number of cosets must be
qn/qk = qn−k.
26 Coding Theory
Definition. Given a linear [n, k]-code C over Fq and a coset w +C, we define a leader of w +C to
be a word of minimal weight in w + C.
Now suppose we are given a linear [n, k]-code C over Fq. We define a Slepian array for C to
be an array of words, constructed as follows:
• choose one leader from each coset (note that the word 00 . . . 0 must be the leader chosen
from the coset 00 . . . 0 + C = C, since it has smaller weight than any other word);
• in the first row of the array, put all the words in C, with 00 . . . 0 at the left and the other
words in any order;
• in the first column put all the chosen coset leaders – the word 00 . . . 0 is at the top, and
the remaining leaders go in any order;
• now fill in the remaining entries by letting the entry in row i and column j be
(leader at the start of row i) + (word at the top of column j).
Lemma 4.14. Suppose C is a linear [n, k]-code over Fq, and S is a Slepian array for C. Then every word
in Fnq appears once in S.
Proof. Take v ∈ Fnq . The v lies in some coset D, by Proposition 4.13(2). Let y be the chosen
leader for D; then y appears in column 1, in row i, say. Since y ∈ D, we have D = y + C, by
Proposition 4.13(3). So v ∈ y + C, and so we can write v = y + u, where u ∈ C. u lies in row 1 of
the array, in column j, say, and so v lies in row i and column j of the array.
Definition. Let C be an [n, k]-code over Fq, and let S be a Slepian array for C. We define a
decoding process f : Fnq → C as follows. For v ∈ Fnq , we find v in the array S (which we can do,
by Lemma 4.14). Now we let f (v) be the word at the top of the same column as v.
Theorem 4.15. Suppose C is a linear [n, k]-code over Fq and S is a Slepian array for C, and let f be the
decoding process constructed above. Then f is a nearest-neighbour decoding process.
Proof. We need to show that for any v ∈ Fnq and any w ∈ C,
d(v, f (v)) 6 d(v,w).
v appears somewhere in S, and by construction f (v) is the word at the top of the same column.
If we let u be the word at the start of the same row as v, then by the construction of S we have
v = u + f (v).
This gives
v − w = u + ( f (v) − w) ∈ u + C.
We also have u = u + 00 . . . 0 ∈ u + C, and u was chosen to be a leader for this coset, which
means that
weight(u) 6 weight(v − w).
So
d(v, f (v)) = weight(v − f (v))
= weight(u)
6 weight(v − w)
= d(v,w).
Linear codes 27
Quick algorithm for constructing a Slepian array
• Row 1: write the words ofC in a row, with the word 00 . . . 0 at the start, and the remaining
words in any order.
• Subsequent rows: choose a word u that you haven’t written yet of minimal weight. Put u
at the start of the row, and then for the rest of the row, put
u + (word at the top of column j)
in column j.
• Stop when every word appears in the array.
28 Coding Theory
5 Dual codes and parity-check matrices
5.1 The dual code
Definition. Suppose q is a prime power, and n > 0. For v,w ∈ Fnq , define the dot product
v.w = v1w1 + . . . vnwn.
Lemma 5.1. The dot product . is symmetric and bilinear, i.e.
v.w = w.v
and
v.(λw + µx) = λ(v.w) + µ(v.x)
for all words v,w,w and all λ, µ ∈ Fq.
Proof. Exercise (non-examinable).
Definition. Suppose C is a linear [n, k]-code over Fq. The dual code is defined to be
C⊥ =
{
w ∈ Fnq
∣∣∣ v.w = 0 for all v ∈ C} .
Lemma 5.2. Suppose C is a linear [n, k]-code and G is a generator matrix for C. Then
w ∈ C⊥ ⇐⇒ GwT = 0.
Note that we think of the elements of Fnq as row vectors; if w is the row vector (w1 . . .wn),
then wT is the column vector 
w1
...
wn
 .
G is a k × n matrix, so GwT is a column vector of length k.
Proof. Suppose G has entries gi j, for 1 6 i 6 k and 1 6 j 6 n. Let g(i) denote the ith row of G.
Then g(1), . . . , g(k) are words which form a basis for C, and
g(i) = gi1 . . . gin.
Now (
GwT
)
i
=
n∑
j=1
gi jw j
= g(i).w,
so GwT = 0 if and only if g(i).w = 0 for all i.
Suppose w ∈ C⊥. Then v.w = 0 for all v ∈ C. In particular, g(i).w = 0 for i = 1, . . . , k. So
GwT = 0. Conversely, suppose GwT = 0. Given v ∈ C, we can write
v = λ1g(1) + · · · + λkg(k)
Dual codes and parity-check matrices 29
for λ1, . . . , λk ∈ Fq, since g(1), . . . , g(k) span C. So
v.w = (λ1g(1) + · · · + λkg(k)).w
= λ(g(1).w) + · · · + λk(g(k).w) by Lemma 5.1
= 0 + · · · + 0 = 0.
This applies for any v ∈ C, so w ∈ C⊥.
This lemma gives us another way to think of C⊥: it is the kernel of any generator matrix of
C.
Theorem 5.3. If C is a linear [n, k]-code over Fq, then C⊥ is a linear [n,n − k]-code.
Proof. Let G be a generator matrix ofC. Then Lemma 5.2 says thatC⊥ is the kernel of the linear
map
φ : Fnq −→ Fkq
w 7−→ GwT,
so it is a subspace of Fnq , i.e. a linear code of length n. The Rank–nullity Theorem says that
rank(φ) + nullity(φ) = dimFnq ,
so
rank(G) + dim(C⊥) = n.
G has k rows, which are linearly independent, and so rank(G) = k. Hence
dimC⊥ = n − k.
Theorem 5.4. Suppose C is a linear [n, k]-code over Fq. Then (C⊥)⊥ = C.
Proof. First note that because C⊥ = {w ∈ Fnq | v.w = 0 for all v ∈ C}, we have v.w = 0 whenever
v ∈ C and w ∈ C⊥. This means that any v ∈ V is contained in
{x ∈ Fnq | w.x = 0 for all w ∈ C⊥}.
But this is by definition (C⊥)⊥. So C ⊆ (C⊥)⊥. On the other hand C and (C⊥)⊥ are both vector
spaces, and by Theorem 5.3
dim((C⊥)⊥) = n − dim(C⊥) = n − (n − k) = k = dimC,
so C = (C⊥)⊥.
Now we make a very important definition.
Definition. Suppose C be a linear code. A parity-check matrix for C is a generator matrix for C⊥.
We will see in the rest of the course that parity-check matrices are generally more useful
than generator matrices. Here is an instance of this.
Lemma 5.5. Suppose C is a linear [n, k]-code over Fq, H is a parity-check matrix for C, and v is a word
in Fnq . Then v ∈ C if and only if HvT = 0.
30 Coding Theory
Proof. By Theorem 5.4 we have v ∈ C if and only if v ∈ (C⊥)⊥. Now H is a generator matrix for
C⊥, and so by Lemma 5.2 we have w ∈ (C⊥)⊥ if and only if HwT = 0.
Non-examinable material
But can we find a parity-check matrix? The following lemma provides a start.
Lemma 5.6. Suppose C is a linear [n, k]-code over Fq with generator matrix G, and H is a matrix over
Fq. Then H is a parity-check matrix for C if and only if
• H is an (n − k) × n matrix,
• the rows of H are linearly independent, and
• GHT = 0.
In order to prove this lemma, we think about how we multiply matrices. Suppose A is an
r × s matrix, and B is an s × t matrix, and let b(1), . . . , b(t) deonte the columns of B. Then the
columns of AB are Ab(1), . . . ,Ab(t).
Proof of Lemma 5.6. (⇒) Suppose H is a parity-check matrix forC, then it is a generator matrix
forC⊥, which is a linear [n,n−k]-code, so H is an (n−k)×n matrix. If we let h(1), . . . , h(n−k)
denote the rows of H, then these rows form a basis for C⊥; in particular, they are linearly
independent. Now the columns of HT are h(1)T, . . . , h(n − k)T, so the columns of GHT are
Gh(1)T, . . . ,Gh(n − k)T.
Since each h(i) lies in C⊥, we have Gh(i)T = 0 by Lemma 5.2. So GHT = 0.
(⇐) Suppose H is an (n − k) × n matrix with linearly independent rows, and that GHT = 0.
Again, we let h(1), . . . , h(n − k) denote the rows of H. Since GHT = 0, we have Gh(i)T = 0
for each i, so each h(i) lies in C⊥ by Lemma 5.2. So the rows of H are linearly independent
words in C⊥. But the number of rows is n − k, which is the dimension of C⊥, so in fact
the rows form a basis for C⊥, and hence H is a generator matrix for C⊥, i.e. a parity-check
matrix for C.
This helps us to find a parity-check matrix if we already have a generator matrix. If the
generator matrix is in standard form, then a parity-check matrix is particularly easy to find.
First we need a lemma about multiplying block matrices.
Lemma 5.7. Suppose F is a field, and
A is an m × n matrix over F,
B is an m × s matrix over F,
C is an n × t matrix over F,
D is an s × t matrix over F.
Let
E =
(
A B
)
, F =
(
C
D
)
.
Then
EF = AC + BD.
Dual codes and parity-check matrices 31
Proof. Omitted (non-examinable).
Now we can find a parity-check matrix from a standard-form generator matrix.
End of non-examinable material
Proposition 5.8. Suppose C is a linear [n, k]-code over Fq, and that
G =
(
Ik B
)
is a standard-form generator matrix for C. Then the matrix
H =
(
−BT In−k
)
is a parity-check matrix for C.
Proof. (Non examinable) Certainly H is an n − k by n matrix. The last n − k columns of H are
the standard basis vectors, and and so are linearly independent. So H has rank at least n − k,
and hence the rows of H must be linearly independent. By Lemma 5.7 we have
GHT =
(
Ik B
) ( −B
In−k
)
= −B + B = 0,
and so H is a parity-check matrix, by Lemma 5.6.
5.2 Syndrome decoding
Definition. Suppose C is a linear [n, k]-code over Fq, and H is a parity-check matrix for C. Then
for any word w ∈ Fnq , the syndrome of w is the vector
HwT ∈ Fn−kq .
Note that although HwT is strictly speaking a column vector, we’ll often write it as a row
vector (i.e. word). We saw above that w ∈ C if and only if HwT = 0. So the syndrome of a word
tells us whether it lies in our code. In fact, the syndrome tells us which coset of our code the
word lies in.
Lemma 5.9. Suppose C is a linear [n, k]-code over Fq with parity-check matrix H, and v,w are words
in Fnq . Then v and w lie in the same coset of C if and only if they have the same syndrome.
Proof. The coset containing w is w + C, so
v and w lie in the same coset⇐⇒ v ∈ w + C
⇐⇒ v = w + x, for some x ∈ C
⇐⇒ v − w ∈ C
⇐⇒ H(vT − wT) = 0 by Lemma 5.5
⇐⇒ HvT −HwT = 0
⇐⇒ HvT = HwT.
32 Coding Theory
The point of this is that we can use syndromes to decode a linear code without having to
write out a Slepian array.
Definition. Suppose C is a linear code, and H is a parity-check matrix for C. We construct a
syndrome look-up table for C as follows.
1. Choose a leader from each coset.
2. Draw a table with two columns; in the first column put the chosen coset leaders, and next
to each leader put its syndrome.
Remark. In a syndrome look-up table for an [n, k]-code over Fq, there are qn−k rows, because
there are qn−k cosets (Proposition 4.13). Each coset leader has a different syndrome, so qn−k
different syndromes appear. The syndromes lie in Fn−kq , so there are qn−k possible syndromes.
So every possible syndrome must appear in the table.
Definition. Suppose C is a linear [n, k]-code over Fq, and we have a syndrome look-up table
for C. We define a decoding process g for C as follows.
• Given a word w ∈ Fnq , calculate its syndrome.
• Find this syndrome in the syndrome look-up table; let v be the coset leader with this
syndrome.
• Define g(w) = w − v.
Note that because w and v have the same syndrome, they lie in the same coset of C. So
w − v ∈ C, so g really is a decoding process.
Lemma 5.10. Suppose C is a linear [n, k]-code over Fq, T is a syndrome look-up table for C, and S is a
Slepian array for C with the same chosen coset leaders. Then the decoding process g for C obtained
from T is the same as the decoding process f obtained from S; in particular, g is a nearest-neighbour
decoding process.
Proof. Take w ∈ Fnq , and let v be the chosen leader with the same syndrome as w. Then
g(w) = w − v.
Since v is one of the chosen coset leaders, it appears at the start of some row of S, say row
i. Since g(w) ∈ C, it appears at the top of a column of S, say column j. So the word in row i
and column j is v + g(w) = w. Hence f (w) is the word at the top of column j, which is g(w). So
f = g.
Just as with Slepian arrays, there’s a quicker way to find a syndrome look-up table than
writing out all the cosets and choosing leaders.
Quick method for finding a syndrome look-up table
• Examine all words of weight 0, 1, 2, . . . in turn, computing the syndrome of each one.
• Each time you get a new syndrome, add a row to the table.
• Stop when all possible syndromes appear in the table.
Some examples of linear codes 33
6 Some examples of linear codes
6.1 Reed–Muller codes
Now we define a family of binary linear codes which have quite large minimum distance.
Definition. Suppose v,w are words of the same length over F2. Define v||w to be the word
obtained by joining v and w together.
Definition. Suppose 0 6 s 6 t. The binary Reed–Muller code R(s, t) is a code of length 2t defined
recursively as follows:
• if s = 0, then R(s, t) is just the repetition code of length 2t;
• if s = t, then R(s, t) consists of all words of length 2t;
• if 0 < s < t, then
R(s, t) = {v||(v + w) | v ∈ R(s, t − 1), w ∈ R(s − 1, t − 1) } .
Proposition 6.1. R(s, t) is a linear code.
Proof. We use induction on t. If s = 0, then we’re done, becase we know that the repetition
code is linear. If s = t, then R(s, t) = F2t2 , which is a subspace of itself, so is a linear code. So
suppose 0 < s < t. then by induction we may assume that R(s, t − 1) and R(s − 1, t − 1) are both
linear. We apply the Subspace Test.
R(s, t) contains the zero word: By assumptionR(s, t−1) andR(s−1, t−1) both contain the zero
word 00 . . . 0 of length 2t−1. So R(s, t) contains the word
00 . . . 0||(00 . . . 0 + 00 . . . 0)
which is the zero word of length 2t.
R(s, t) is closed under addition: Suppose we have two words in R(s, t). These can be written
in the form v||(v + w) and x||(x + y) for some v, x ∈ R(s, t − 1), w, y ∈ R(s − 1, t − 1). So the
sum of these two words is
(v + x)||(v + x + w + y).
By assumption R(s, t − 1) is closed under addition, so contains v + x, and R(s − 1, t − 1)
is closed under addition, so contains the word w + y. So R(s, t) contains the word
(v + x)||(v + x + w + y).
R(s, t) is closed under scalar multiplication: Since q = 2, there is nothing to check.
Next we’ll work out the dimension of R(s, t).
Lemma 6.2. Suppose 0 < s < t. Then |R(s, t)| = |R(s, t − 1)| × |R(s − 1, t − 1)|.
Proof. We know that every word in R(s, t) is of the form v||(v + w) for v ∈ R(s, t − 1) and
w ∈ R(s− 1, t− 1). The number of choices for v is |R(s, t− 1)|, and the number of choices for w is
|R(s−1, t−1)|. So as long as different choices of v,w give us different words inR(s, t), we’re done.
So suppose we have v, v′ ∈ R(s, t− 1) and w,w′ ∈ R(s− 1, t− 1) such that v||(v + w) = v′||(v′ + w′).
Looking at the first half of each of these words, we see that v = v′ And looking at the second
half, we see that v + w = v′ + w′, and hence w = w′.
34 Coding Theory
Now we need to recall a couple of results about binomial coefficients.
Lemma 6.3.
1. If t > 0, then ( t
0
)
+
( t
1
)
+ · · · +
( t
t
)
= 2t.
2. If 0 < u < t, then ( t
u
)
=
( t − 1
u
)
+
( t − 1
u − 1
)
.
Proof. Exercse (non-examinable).
Proposition 6.4. If 0 6 s 6 t, then
dim(R(s, t)) =
( t
0
)
+
( t
1
)
+ · · · +
( t
s
)
.
Proof. We use induction on t. If s = 0, then R(s, t) is the repetition code, which has dimension
1 =
(t
0
)
. If s = t, then R(s, t) = F2t2 , which has dimension 2t, which by lemma 6.3 equals(t
0
)
+
(t
1
)
+ · · · + (tt).
Now suppose 0 < s < t. By induction, we can assume that
dim(R(s, t − 1)) =
( t − 1
0
)
+
( t − 1
1
)
+ · · · +
( t − 1
s
)
,
dim(R(s − 1, t − 1)) =
( t − 1
0
)
+
( t − 1
1
)
+ · · · +
( t − 1
s − 1
)
.
Hence by Lemma 4.5,
|R(s, t)| = |R(s, t − 1)| × |R(s − 1, t − 1)| = 2(t−10 )+···+(t−1s ) × 2(t−10 )+···+(t−1s−1),
so
dimR(s, t) =
( t − 1
0
)
+
( t − 1
1
)
+
( t − 1
2
)
+ · · ·+
( t − 1
s
)
+
( t − 1
0
)
+
( t − 1
1
)
+ · · ·+
( t − 1
s − 1
)
=
( t
0
)
+
( t
1
)
+
( t
2
)
+ · · ·+
( t
s
)
.

How to construct a basis for R(s, t)
The proof of the last proposition shows that for 0 < s < t,
dim(R(s, t)) = dim(R(s, t − 1)) + dim(R(s − 1, t − 1)).
This gives us a way to construct a basis. Take a basis B forR(s, t−1), and a basis C forR(s−1, t−1);
then the set
{v||v | v ∈ B } ∪ {00 . . . 0||w | w ∈ C}
is a basis for R(s, t).
Next we want to work out the minimum distance of R(s, t). Before we can do this, we need
to know the following result.
Some examples of linear codes 35
Lemma 6.5. For any t > 0,
R(0, t) ⊂ R(1, t) ⊂ · · · ⊂ R(t, t).
Proof. We prove that R(s, t) ⊂ R(s + 1, t) for all 0 6 s < t, using induction on t.
• If s = t − 1, then R(s + 1, t) is the whole of F2t2 , so obviously R(s, t) ⊂ R(s + 1, t). So assume
s < t − 1.
• Now suppose s = 0. Then R(s, t) = {00 . . . 0, 11 . . . 1}. By induction we may assume that
R(0, t− 1) ⊂ R(1, t− 1), i.e. R(1, t− 1) contains the words 00 . . . 0 and 11 . . . 1 of length 2t−1.
By construction R(1, t) contains all words v||v for v ∈ R(1, t − 1), so in particular includes
the words 00 . . . 0 and 11 . . . 1 of length 2t.
• Now suppose 0 < s < t− 1. By induction we may assume that R(s− 1, t− 1) ⊂ R(s, t− 1) ⊂
R(s+1, t−1). Any word inR(s, t) has the form v||(v+w) for v ∈ R(s, t−1) andR(s−1, t−1).
by assumption, this means v ∈ R(s+1, t−1) and w ∈ R(s, t−1), so v||(v+w) ∈ R(s+1, t).
Proposition 6.6. Suppose 0 6 s 6 t. Then d(R(s, t)) = 2t−s.
Proof. We use induction on t.
• If s = 0, then R(s, t) is the repetition code of length 2t, and we know this has minimum
distance 2t.
• if s = t, then R(s, t) = F2t2 , which obviously has minimum distance 1.
• Now suppose 0 < s < t. By induction we may assume that
d(R(s, t − 1)) = 2t−s−1, d(R(s − 1, t − 1)) = 2t−s.
Since R(s, t) is linear, we just need to show that the minimum weight of a non-zero word
in R(s, t) is 2t−s. Since d(R(s − 1, t − 1)) = 2t−s, R(s − 1, t − 1) certainly contains a word
w of weight 2t−s.Then R(s, t) contains the word 00 . . . 0||w of weight 2t−s. Conversely,
suppose we have a non-zero word x ∈ R(s, t), and write x = v||(v + w) for v ∈ R(s, t − 1),
w ∈ R(s − 1, t − 1). We consider three cases:
v = 00 . . . 0: In this case, x = 00 . . . 0||w, so w is non-zero. So weight(x) = weight(w) >
d(R(s − 1, t − 1)) = 2t−s.
v = w: In this case, x = v||00 . . . 0, so v is non-zero. And v = w ∈ R(s − 1, t − 1), so
weight(x) = weight(v) > d(R(s − 1, t − 1)) = 2t−s.
w , v , 00 . . . 0: SinceR(s−1, t−1) ⊂ R(s, t−1), we have w ∈ R(s, t−1), so v+w ∈ R(s, t−1).
So weight(x) = weight(v)+weight(v+w) > d(R(s, t−1))+d(R(s, t−1)) = 2t−s−1+2t−s−1 =
2t−s.
6.2 Hamming codes
We begin with binary Hamming codes, which are slightly easier to describe.
Definition. Suppose r > 2, and let Hr,2 be an r × (2r − 1) matrix whose columns are all the
different non-zero vectors of length r over F2, in any order. Define the binary Hamming code
Ham(r, 2) to be the binary [2r − 1, 2r − r − 1]-code with Hr,2 as its parity-check matrix, i.e.
Ham(r, 2) =
{
v ∈ F2r−12
∣∣∣ Hr,2vT = 0} .
36 Coding Theory
Note that Ham(r, 2) is not uniquely defined – it depends on the order you choose for the
columns of Hr,2. But choosing different orders still gives equivalent codes, so we talk about the
Hamming code Ham(r, 2).
Now we consider q-ary Hamming codes for an arbitrary prime power q. We impose a
relation on the non-zero vectors in Frq by saying that v ≡ w if and only if v = λw for some
non-zero λ ∈ Fq.
Lemma 6.7. ≡ is an equivalence relation.
Proof. We need to check three things.
• ≡ is reflexive: given a non-zero vector v, we have v = 1v, and 1 is a non-zero element of
Fq, so v ≡ v.
• ≡ is symmetric: suppose v ≡ w. Then v = λw for some non-zero λ ∈ Fq. Since λ , 0, it
has an inverse λ−1. So w = λ−1v, so w ≡ v.
• ≡ is transitive: suppose v ≡ w and w ≡ x. Then v = λw and w = µx for some non-zero
λ, µ ∈ Fq. Then v = (λµ)x, so v ≡ x.
Now we count the equivalence classes of non-zero vectors.
Lemma 6.8. Under the equivalence relation ≡, each equivalence class contains exactly q − 1 vectors,
and there are exactly
qr − 1
q − 1 equivalence classes.
Proof. Take v ∈ Frq \ {0}. The equivalence class containing v consists of all vectors w such that
w ≡ v, i.e. w = λv for non-zero λ ∈ Fq. There are q − 1 possible choices of λ, and these give
distinct words: if λ , µ and λv = µv, then we get (λ − µ)v = 0. λ − µ is non-zero, so it has an
inverse. So we get v = 0, a contradiction.
So there are exactly q − 1 vectors in each equivalence class. The number of equivalence
classes is the number of non-zero vectors divided by the size of an equivalence class, i.e.
qr − 1
q − 1 .
Definition. Suppose q is a prime power and r > 0, and let N =
qr − 1
q − 1 . Choose non-zero
vectors v1, . . . , vN ∈ Frq such that exactly one lies in each equivalence class. Define Hr,q to be
an r × N matrix with columns v1, . . . , vN, and define the q-ary Hamming code Ham(r, q) to the
[N,N − r]-code over Fq with parity-check matrix Hr,q.
Again, the actual code depends on the order of v1, . . . , vN (and on the choice of v1, . . . , vN),
but different choices give equivalent codes.
It may not be obvious how to choose the vectors v1, . . . , vN, but there is a trick: choose all
non-zero vectors whose first non-zero entry is a 1. You might like to prove as an exercise that
this always works, but you can use this trick without justification in the exam.
Lemma 6.9. Ham(r, q) is an [N,N − r]-code over Fq, where as above N = q
r − 1
q − 1 .
Proof. Hr,q is an r × N matrix, and is a generator matrix for Ham(r, q)⊥, so Ham(r, q)⊥ is an
[N, r]-code. So Ham(r, q) is an [N,N − r]-code, by Theorem 5.3.
Some examples of linear codes 37
Now we’ll see the key property of Hamming codes.
Theorem 6.10. Ham(r, q) has minimum distance 3.
Proof. Since Ham(r, q) is linear, we want to show that the minimum weight of a non-zero word
in Ham(r, q) is 3, i.e. that there are no words of weight 1 or 2 in Ham(r, q), but there is a word of
weight 3. We have a parity-check matrix H = Hr,q, and by Lemma 5.5 a word w lies in Ham(r, q)
if and only if HwT = 0. Writing v1, . . . , vN for the columns of H(r, q), and writing w as w1 . . .wN,
we have
HwT = w1v1 + · · · + wNvN.
Ham(r, q) contains no words of weight 1 Suppose w is a word of weight 1. Letting wi be the
unique non-zero symbol in w, we find that
HwT = wivi.
wi , 0 and vi , 0, so wivi , 0; so w < Ham(r, q).
Ham(r, q) contains no words of weight 2 Suppose w is a word of weight 2, with non-zero
symbols wi,w j. Then
HwT = wivi + w jv j.
If this is zero, we get
vi = −w jw−1i v j,
(n.b. w−1i exists because wi , 0) which means that vi ≡ v j. But v1, . . . , vN were chosen to
be inequivalent; contradiction. So HwT is non-zero, so w < Ham(r, q).
Ham(r, q) contains a word of weight 3 Let v = v1 + v2. Then v , 0, because v1 , −v2. Now we
claim that v . v1. If v ≡ v1, then we have v = λv1, and this means that v2 = (λ − 1)v1,
so that v2 ≡ v1; but this contradicts the choice of v1, . . . , vN. So v . v1; similarly, we find
v . v2. But v must be equivalent to some v j, because v1, . . . , vN are chosen so that one lies
in each equivalence class. So for some j > 3 we have v ≡ v j; this means v = λv j for some
λ ∈ Fq.
Now define w by
w1 = 1,
w2 = 1,
w j = −λ,
wk = 0 for all other k.
Then w has weight 3, and we claim that w ∈ Ham(r, q): we have
HwT = 1.v1 + 1.v2 − λ.v j = v1 + v2 − v = 0,
so w ∈ Ham(r, q).
Theorem 6.11. Ham(r, q) is a perfect 1-error-correcting code.
38 Coding Theory
Proof. Since Ham(r, q) has minimum distance 3, it is 1-error-correcting, by Lemma 1.2. To show
that it is perfect, we have to show that equality holds in the Hamming bound, i.e.
∣∣∣Ham(r, q)∣∣∣ = qN(N
0
)
+ (q − 1)
(N
1
) ,
where N =
qr − 1
q − 1 .
Since Ham(r, q) is an [N,N − r]-code, the left-hand side equals qN−r by Lemma 4.5. The
right-hand side equals
qN
1 + (q − 1)N =
qN
1 + (q − 1) qr−1q−1
=
qN
1 + qr − 1
= qN−r,
and the equation follows.
Non-examinable material
6.3 The ternary Golay code
In this section, we’ll look at one particular very special code. We begin by taking a parity-
check matrix for the Hamming code Ham(2, 3):
K =
(
1 0 1 1
0 1 1 2
)
Notice that KKT = 0, which means that the rows of K lie in Ham(2, 3) (in fact, K is a generator
matrix for Ham(2, 3).
Here’s a lemma about Ham(2, 3).
Lemma 6.12. Suppose u is a non-zero word in Ham(2, 3) and x ∈ F43. Then weight(x+u)+weight(x−
u) > 3.
Proof.
weight(x + u) + weight(x − u) = d(x + u, 0000) + d(x − u, 0000)
= d(x,−u) + d(x,u) by Lemma 4.3
> d(u,−u) by the triangle inequality.
Now u,−u are distinct words in Ham(2, 3), so d(u,−u) > d(Ham(2, 3)) = 3.
Now let I denote the 4 × 4 identity matrix over F3, let J be the 4 × 4 matrix in which every
entry is a 1. Notice that J2 = J, and hence (I+J)2 = I.
Some examples of linear codes 39
Now let H be the following 6 × 12 matrix over F3:
H =

I I+J I+J
0 K −K

Lemma 6.13. The rows of H are linearly independent.
Proof. Let r1, . . . , r6 be the rows, and suppose we have a linear relation
λ1r1 + · · · + λ6r6 = 000000000000 (λ1, . . . , λ6 ∈ F3).
Then for i = 1, . . . , 6 in turn we can look at the ith entry of the word on the left-hand side and
deduce that λi = 0.
As a consequence, H is a generator matrix for a linear code.
Definition. Define the extended ternary Golay code G12 to be the linear [12, 6]-code over F3 with
generator matrix H.
Lemma 6.14. G⊥12 = G12.
Proof. We want to show that H is a generator matrix for G⊥12, i.e. a parity-check matrix for H.
We’ll use Lemma 5.6. Certainly the rows of H are linearly independent and H is a (12 − 6) × 12
matrix, so we just need to check that HHT = 0: multiplying matrices in block form, we have
HHT =
 I I+J I+J0 K −K


I 0
I+J KT
I+J −KT
 =
 I2 + 2(I+J)2 (I+J)KT − (I+J)KTK(I+J) − K(I+J) KKT + (−K)(−KT)

which is zero because KKT = 0 and I2 = (I+J)2 = I.
So H is a generator matrix for G⊥12; since H is also a generator matrix for G12, we must haveG⊥12 = G12.
In particular, this result implies that v.v = 0 for any v ∈ G12. This has an important
consequence.
Lemma 6.15. Suppose v ∈ Fn3 . If v.v = 0, then the weight of v is divisible by 3.
Proof. We have v.v =
∑n
i=1 v
2
i . Since we’re working over F3, we have v
2
i = 0 if vi = 0, and vi = 1
if vi , 0. So v.v is just the weight of v mod 3.
Proposition 6.16. The minimum distance of G12 is 6.
Proof. Since G12 is linear, we just need to show that the minimum weight of a non-zero word
is 6. Obviously there are words in G12 of weight 6, so by Lemma 6.15 we just need to show
40 Coding Theory
that there are no words of weight 3. So suppose that v ∈ G12 has weight 3, and write v as
λ1r1 + · · · + λ6r6. We write v as a sum of two words:
v = (λ1r1 + λ2r2 + λ3r3 + λ4r4) + (λ5r5 + λ6r6)
= w||x||x + 0000||u||(−u)
= w||(x + u)||(x − u),
where w, x ∈ F43 and u ∈ Ham(2, 3).
Suppose u , 0000. Then by Lemma 6.12 we have weight(x + u) + weight(x − u) > 3, so we
must have w = 0000. But w is the word λ1λ2λ3λ4, so this means that λ1 = λ2 = λ3 = λ4 = 0,
and hence v = 0000||u||(−u) and so has even weight; contradiction.
So we can assume u = 0000, so that v = w||x||x. Since v has weight 3, w must have odd
weight, and hence has weight 1 or 3. If w has weight 1, then only one ofλ1, λ2, λ3, λ4 is non-zero,
so v is just a scalar multiple of one of the rows r1, r2, r3, r4, so has weight 9; contradiction. So we
can assume that w has weight 3. But then we must have x = 0000. But
x = λ1λ2λ3λ4(I + J) = w(I + J),
so w(I + J) = 0000. Hence w = wI = w(I + J)2 = 0000(I + J) = 0000, contradiction.
Now we can define the ternary Golay code.
Definition. Define the ternary Golay code G11 be the code of length 11 over F3 obtained by
deleting the last symbol of each word in G12.
Theorem 6.17. G11 is a perfect 2-error-correcting code.
Proof. Since d(G12) = 6, we have d(G11) > 5 by the proof of the Singleton bound. So G11 is
2-error-correcting. To show that G11 is perfect, we just need to show that
|G11| = 3
11
1 + (3 − 1)(111 ) + (3 − 1)2(112 ) .
We have |G11| = |G12| = 36, by Lemma 4.5.
The right-hand side is
311
1 + 2 × 11 + 4 × 55 =
311
243
=
311
35
= 36,
as required.
End of non-examinable material
6.4 Existence of codes and linear independence
The proof that the Hamming codes have distance 3 relied on the following fact about their
parity-check matrices: if we take a set consisting of at most 2 columns of H, then this set is
linearly independent. This leads to a more general theorem on the existence of linear codes.
Some examples of linear codes 41
Theorem 6.18. Suppose C is an [n, k]-code with parity-check matrix H, and d 6 n. Then C has
minimum distance at least d if and only if every d − 1 columns of H are linearly independent.
Proof. Let c1, . . . , cn be the columns of H. Suppose first that there are columns ci1 , . . . , cid−1 which
are linearly dependent. This means there are scalars λ1, . . . , λd−1 (not all zero) such that
λ1ci1 + · · · + λd−1cid−1 = 0.
Let w be the word which has λ1, . . . , λd−1 in positions i1, . . . , id−1, and 0s elsewhere. Then
HwT = w1c1 + · · · + wncn
= λ1ci1 + · · · + λd−1cid−1
= 0,
so w ∈ C by Lemma 5.5. But w is a non-zero word of weight at most d − 1, so C has minimum
distance less than d.
Conversely, suppose C has minimum distance less than d, and take a non-zero word w ∈ C
of weight less than d. Then there are positions i1, . . . , id−1 such that wi = 0 for i , i1, . . . , id−1.
Hence
0 = HwT
= w1c1 + · · · + wncn
= wi1ci1 + · · · + wid−1cid−1 ,
so ci1 , . . . , cid−1 are linearly dependent.
Corollary 6.19. Suppose d, k 6 n. A linear [n, k, d]-code exists if and only if there is a sequence of n
vectors in Fn−kq such that every d − 1 of them are linearly independent.
Proof. Suppose C is such a code, and let H be a parity-check matrix for C. Then the columns
of H are vectors in Fn−kq , and by Theorem 6.18 every d − 1 of them are linearly independent.
Conversely, suppose c1, . . . , cn are vectors in Fn−kq such that every d − 1 of them are linearly
independent. Let H be the matrix with clumns c1, . . . , cn, and letC be the code with parity-check
matrix H. Then C is a linear [n, k]-code, and by Theorem 6.18 C has minimum distance at least
d.
Notice that we say ‘sequence’ rather than ‘set’, since two columns of a matrix might be the
same. However, if there are two equal columns, then they are linearly dependent, and so the
minimum distance of our code would be at most 2.
6.5 MDS codes
We begin by recalling the Singleton bound, and applying it to linear codes.
Theorem 6.20 (Singleton bound for linear codes). If C is a linear [n, k, d]-code over Fq, then
d 6 n − k + 1.
42 Coding Theory
Proof. C is a q-ary (n,M, d)-code, where M = qk by Lemma 4.5. So
qk = M 6 Aq(n, d) 6 qn−d+1 by Theorem 2.2(3).
Hence k 6 n + 1 − d, so d 6 n − k + 1.
The aim of this section is to look at codes which give equality in this bound.
Definition. Suppose C is a linear [n, k]-code. The number r = n− k is called the redundancy of C.
What Theorem 6.20 says is that if C is a linear code of redundancy r, then the minimum
distance of C is at most r + 1.
Definition. A maximum distance separable code (or MDS code) of length n and redundancy r is a
linear [n,n − r, r + 1]-code.
Theorem 6.18 says that an MDS code of length n and redundancy r exists if and only if we
can find a sequence of n vectors in Frq of which any r are linearly independent. Clearly if we
can do this for a given value of n, then we can do it for any smaller value of n, just by deleting
some of the vectors. This implies that for each r (and q) there is a maximum n (possibly infinite)
for which an MDS code of length n and redundancy r exists.
Definition. Suppose r > 0. We define max(r, q) to be the largest n such that an MDS code over
Fq of length n and redundancy r exists.
The main question about MDS codes is to find the values of max(r, q). Note that max(r, q)
may be infinite, even though there are only finitely many vectors in Fnq , because we are allowed
repeats in our sequence of vectors. Note, though, that if we have a repeated vector in our
sequence, then the code cannot possibly have minimum distance more than 2. Here is our
main theorem on the values max(r, q).
Theorem 6.21.
1. max(1, q) = ∞.
2. If r > q, then max(r, q) = r + 1.
3. If 2 6 r 6 q, then max(r, q) > q + 1.
Proof of Theorem 6.21(1). We must show that for for any n and q, there exists an MDS code of
length n and redundancy 1. Using Theorem 6.18, we just need to construct a 1 × n matrix such
that any one column is linearly independent. Saying that one vector is linearly independent
just means that it is non-zero, so we just need to find a 1 × n matrix in which every entry is
non-zero; we could take the matrix
(
1 1 . . . 1
)
.
Proof of Theorem 6.21(2). To show that max(r, q) > r + 1, we need to show that an MDS code
of length r + 1 and redundancy r exists. But this is an [r + 1, 1, r + 1]-code, and the repetition
code is such a code.
To show that max(r, q) 6 r + 1, we have to show that we can’t find an MDS code of length
r + 2 and redundancy r, i.e. an [r + 2, 2, r + 1]-code. Let’s suppose that we can find such a code
C, and take a generator matrix. Applying MO1–5, we put this generator matrix in standard
Some examples of linear codes 43
form; this gives a generator matrix for a codeD which is equivalent to C and therefore is also
an [r + 2, 2, r + 1]-code. Let’s write our generator matrix as
G =
(
1 0
0 1 B
)
,
where B is a 2 × r matrix over Fq. The rows of G are non-zero words in D and d(D) = r + 1,
so the rows must both have weight at least r + 1; hence all the entries of B are non-zero. This
means that we can define
µi =
b1i
b2i
for i = 1, . . . , r, and µi is a non-zero element of Fq. There are only q − 1 < r non-zero elements
in Fq, so there must be i , j such that µi = µ j. Now define
x = (row 1 of G) − µi(row 2 of G).
Then x ∈ D and x , 0 because the rows of G are linearly independent, so x has weight at least
r + 1. But
xi+2 = b1i − µib2i = b1i − b1ib2i b2i = 0
and
x j = b1 j − µib2 j = b1 j − µ jb2 j = 0,
so x contains at least two zeroes, so has weight at most r; contradiction.
For the proof of Theorem 6.21(3), we need to look at determinants. We use the following
fact: if A is an r × r matrix, then the columns of A are linearly independent if and only if
det(A) , 0.
We need to look at a particular type of determinant, called a ‘Vandermonde determinant’.
Proposition 6.22. Suppose x1, . . . , xr are distinct elements of a field F. Then the determinant
det

1 1 . . . 1
x1 x2 . . . xr
x21 x
2
2 . . . x
2
r
...
...
...
xr−11 x
r−1
2 . . . x
r−1
r

is non-zero.
Proof. Not given. In fact, the determinant equals

i< j(x j − xi), and this is non-zero, because
each x j − xi is non-zero.
Now we can prove the last part of the theorem.
44 Coding Theory
Proof of Theorem 6.21(3). Assume 2 6 r 6 q + 1. We need to show that there is an MDS code
of length q + 1 and redundancy r, i.e. a [q + 1, q + 1 − r, r + 1]-code. Label the elements of Fq as
λ1, . . . , λq in some order, and let
H =

1 1 . . . 1 0
λ1 λ2 . . . λq 0
λ21 λ
2
2 . . . λ
2
q 0
...
...
...
...
λr−21 λ
r−2
2 . . . λ
r−2
q 0
λr−11 λ
r−1
2 . . . λ
r−1
q 1

.
Let C be the code with H as its parity-check matrix. Then C is a [q + 1, q + 1 − r], code, and
we claim that C has minimum distance at least r + 1. Recall from Theorem 6.18 that this
happens if and only if any r columns of H are linearly independent. H has r rows, and so any r
columns together will form a square matrix, and we can check whether the columns are linearly
independent by evaluating the determinant. So choose r columns c1, . . . , cr of H, and let A be
the matrix formed by them.
If the columns c1, . . . , cr do not include the last column of H, then A has the form
1 1 . . . 1
x1 x2 . . . xr
x21 x
2
2 . . . x
2
r
...
...
...
xr−11 x
r−1
2 . . . x
r−1
r

for some distinct x1, . . . , xr, and so det(A) , 0 by Proposition 6.22.
If c1, . . . , cr do include the last column of H, then we have
A =

1 1 . . . 1 0
x1 x2 . . . xr−1 0
x21 x
2
2 . . . x
2
r−1 0
...
...
...
...
xr−21 x
r−2
2 . . . x
r−2
r−1 0
xr−11 x
r−1
2 . . . x
r−1
r−1 1

for some distinct x1, . . . , xr−1. Let A′ be the matrix formed by the first r − 1 rows and the first
r − 1 columns. Then det(A′) , 0 by Proposition 6.22, and so the columns c′1, . . . , c′r−1 of A′ are
linearly independent. To show that c1, . . . , cr are linearly independent, suppose we have
µ1c1 + · · · + µrcr = 0, (∗)
where µ1, . . . , µr ∈ Fq; we must show that µ1 = · · · = µr = 0.
If we take all the vectors in (∗) and delete the last entry from each one, we get
µ1c′1 + · · · + µr−1c′r−1 + µr0 = 0.
Since c′1, . . . , c

r−1 are linearly independent, this gives µ1 = . . . µr−1 = 0. Hence we have µrcr = 0;
since cr , 0 we get µr = 0 and we are done.
Index
(n,M, d)-code, 3, 19
Aq(n, d), 6
Hr,q, 36
trick for constructing, 36
[n, k, d]-code, 19
[n, k]-code, 19
C⊥, 28
dimension of, 29
G11, 40
G12
minimum distance of, 39
Fn, 17
Fq, 18
≡, 36
Ham(r, q), 36
dimension of, 36
minimum distance of, 37
bxc, 9
G12, 39
max(r, q), 42
R(r,n), 33
dimension of, 34
minimum distance of, 35
q-ary
alphabet, 2
code, 2
alphabet, 2
basis, 16
binary, 2
binary Hamming code, 35
capacity, 14
code, 2
coset, 25, 31
leader, 26
decoding process, 13, 26, 32
dimension, 16
distance, 2
dot product, 28
dual code, 28
dimension of, 29
equivalent codes, 3, 19
equivalent linear codes, 22
error-correcting, 2, 8
error-detecting, 2
extended ternary Golay code, 39
minimum distance of, 39
field, 15
finite fields, 18
generator matrix, 19, 28
Hamming bound, 8
Hamming code, 36
dimension of, 36
minimum distance of, 37
Hamming space, 2
leader, 26
linear code, 18
linear map, 16
defined by a matrix, 17
main coding theory problem, 6
matrix operations, 22, 24
maximum distance separable code, 42
MDS code, 42
minimum distance, 3
and linear independence, 40
of a linear code, 18
nearest-neighbour decoding process, 13, 26, 32
parity-check matrix, 29
perfect code, 9, 37, 40
Plotkin bound, 9
Rank–nullity Theorem, 17, 29
rate, 14
redundancy, 42
Reed–Muller code, 33
dimension of, 34
minimum distance of, 35
repetition code, 3, 6
Shannon’s Theorem, 14
45
46 Coding Theory
Singleton bound, 6
for linear codes, 41
Slepian array, 26
quick method, 27
sphere, 8
standard form, 24, 31
subspace, 16
subspace test, 16
symbol error probability, 13
syndrome, 31
syndrome decoding, 31
syndrome look-up table, 32
quick method, 32
ternary, 2
ternary Golay code, 40
triangle inequality, 2
Vandermonde determinant, 43
vector space, 15
weight, 18
word, 2
word error probability, 13

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468