MTH6108 Coding theory Contents 1 Introduction and definitions 2 2 Good codes 6 2.1 The main coding theory problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 The Singleton bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 The Hamming bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.4 The Plotkin bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Error probabilities and nearest-neighbour decoding 13 3.1 Noisy channels and decoding processes . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Rate, capacity and Shannon’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Linear codes 15 4.1 Revision of linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Finite fields and linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.3 The minimum distance of a linear code . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Bases and generator matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.5 Equivalence of linear codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.6 Decoding with a linear code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 Dual codes and parity-check matrices 28 5.1 The dual code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 Syndrome decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 6 Some examples of linear codes 33 6.1 Reed–Muller codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 6.2 Hamming codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.3 The ternary Golay code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 6.4 Existence of codes and linear independence . . . . . . . . . . . . . . . . . . . . . . 40 6.5 MDS codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 1 2 Coding Theory 1 Introduction and definitions We begin with the most important definitions. Definition. An alphabet is a finite set of symbols. If an alphabet A contains exactly q symbols, we call it a q-ary alphabet. In fact, we usually say binary rather than 2-ary, and ternary rather than 3-ary. Definition. If A is an alphabet, a word of length n over A is a string a1 . . . an of elements of A. The Hamming spaceAn is the set of all words of length n overA. A code of length n overA is a subset ofAn, i.e. a set of words of length n overA. IfA is a q-ary alphabet, we say that a code overA is a q-ary code. Definition. Given two words x = x1 . . . xn and y = y1 . . . yn of length n over an alphabet A, we define the distance d(x, y) between then to be the number of values of i for which xi , yi. Lemma 1.1. d is a metric onAn, i.e.: 1. d(x, x) = 0 for all x ∈ An; 2. d(x, y) > 0 for all x , y ∈ An; 3. d(x, y) = d(y, x) for all x, y ∈ An; 4. (the triangle inequality) d(x, z) 6 d(x, y) + d(y, z) for all x, y, z ∈ An. Proof. (1), (2) and (3) are very easy, so let’s do (4). Now d(x, z) is the number of values i for which xi , zi. Note that if xi , zi, then either xi , yi or yi , zi. Hence {i | xi , zi} ⊆ {i | xi , yi} ∪ {i | yi , zi}. So d(x, z) = {|i | xi , zi}| 6 |{i | xi , yi} ∪ {i | yi , zi}| 6 |{i | xi , yi}| + |{i | yi , zi}| = d(x, y) + d(y, z). Now we can talk about error detection and correction. Definition. Suppose C is a code of length n over an alphabetA and t > 0. • C is t-error-detecting if d(x, y) > t for any x , y in C. • C is t-error-correcting if there do not exist distinct words x, y ∈ C and z ∈ An such that d(x, z) 6 t and d(y, z) 6 t. Introduction and definitions 3 Informally, a code is t-error-detecting if, whenever we take a word from the code and change at most t of the symbols in it, we don’t reach a different word in the code. So if we send the new word to someone, he’ll be able to tell that we’ve changed some symbols. A code is t-error-correcting if whenever we take a word in the code and change at most t of the symbols in it, we don’t reach a different word in the code, and we don’t even reach a word which can be obtained from a different starting word by changing at most t of the symbols. So if we send the new word to someone without telling him which symbols we changed, he will be able to tell us which word from the code we started with. In fact, rather than thinking about error detection and correction, we’ll talk about minimum distance. Definition. IfC is a code, we define its minimum distance d(C) to be the smallest distance between distinct words in C: d(C) = min{d(x, y) | x , y ∈ C}. By definition, C is t-error-detecting if and only if d(C) > t. We can also express error correction in terms of minimum distance. Lemma 1.2. A code C is t-error-correcting if and only if d(C) > 2t. Proof. First suppose C is not t-error-correcting. This means that we can find words x, y ∈ C with x , y, and another word z (not necessarily in C) such that d(x, z) 6 t and d(y, z) 6 t. By the triangle inequality, we get d(x, y) 6 2t, and therefore d(C) 6 2t. Conversely, suppose d(C) 6 2t; we’ll show that C is not t-error-correcting. Take words x , y ∈ C such that d(x, y) = d 6 2t. Let i1, i2, . . . , id be the positions where x and y are different. Define a word z as follows: set zi = xi for i ∈ {i1, . . . , it}, set zi = yi for i ∈ {it+1, . . . , id}, and set zi = xi = yi for all other i. Then x and z differ only in positions it+1, . . . , id, so d(x, z) = d− t 6 t; y and z differ only in positions i1, . . . , it, so d(y, z) = t. And so C is not t-error-correcting. Key example. Suppose A is any alphabet, and n > 0. The repetition code of length n over A simply consists of all words aa . . . a, for a ∈ A. For this code, any two distinct words differ in every position, and so d(x, y) = n for all x , y in C. So the code is t-error-detecting for every t 6 n − 1, and is t-error-correcting for every t 6 n−12 . Terminology. A q-ary (n,M, d)-code means a code of length n over a q-ary alphabet, containing exactly M words, and with minimum distance at least d. Equivalent codes Definition. If C andD are codes of length n over an alphabet A, we say that C is equivalent to D if we can get from C toD by a combination of the following operations. Operation 1 – permutation of the positions Choose a permutation σ of {1, . . . ,n}, and for any word v = v1 . . . vn inAn define vσ = vσ(1) . . . vσ(n). Now replace Cwith the code Cσ = {vσ | v ∈ C}. 4 Coding Theory Operation 2 – applying a permutation ofA in a fixed position Choose a permutation f ofA and an integer i ∈ {1, . . . ,n}, and for v = v1 . . . vn define v f ,i = v1 . . . vi−1 f (vi)vi+1 . . . vn. Now replace Cwith the code C f ,i = {v f ,i | v ∈ C}. The point of equivalence is that equivalent codes have the same size and the same minimum distance; we can often simplify both decoding procedures and some of our proofs by replacing codes with equivalent codes. Lemma 1.3. Suppose v and w are words of length n over an alphabetA, and suppose σ is a permutation of {1, . . . ,n}. Define the words vσ and wσ as above. Then d(vσ,wσ) = d(v,w). Proof. Write v = v1 . . . vn, w = w1 . . .wn, so that vσ = vσ(1) . . . vσ(n) and w = wσ(1) . . .wσ(n). Now d(v,w) is the number of positions in which v and w differ, i.e. d(v,w) = ∣∣∣{ j | v j , w j}∣∣∣; but σ is a permutation, so each j can be uniquely written in the form σ(i). Hence d(v,w) = ∣∣∣{i | vσ(i) , wσ(i)}∣∣∣ = ∣∣∣{i | (vσ)i , (wσ)i}∣∣∣ = d(vσ,wσ). Corollary 1.4. Suppose C is a code of length n over A and σ is a permutation of {1, . . . ,n}. Then d(Cσ) = d(C) and |C| = |Cσ|. Proof. First note that for words v,w ∈ C, v , w⇐⇒ d(v,w) > 0 Lemma 1.3⇐⇒ d(vσ,wσ) > 0⇐⇒ vσ , wσ. (∗) Hence d(Cσ) = min{d(x, y) | x , y ∈ Cσ} = min{d(vσ,wσ) | v , w ∈ C} = min{d(v,w) | v , w ∈ C} by Lemma 1.3 = d(C). To show that |Cσ| = |C|, we consider the function φ : C −→ Cσ v 7−→ vσ, which we claim is a bijection. φ is surjective, because Cσ is defined to be the image of φ. φ is also injective, because if v , w ∈ C, then by (∗) we have vσ , wσ, i.e. φ(v) , φ(w). So φ is a bijection, so |C| = |Cσ|. Introduction and definitions 5 Now we prove the same properties for Operation 2. Lemma 1.5. Suppose v and w are words in An. Suppose f is a permutation of A and i ∈ {1, . . . ,n}, and define the words v f ,i and w f ,i as above. Then d(v f ,i,w f ,i) = d(v,w). Proof. Recall that ( v f ,i ) j = v j ( j , i)f (vi) ( j = i). We consider two cases. • If vi = wi, then f (vi) = f (wi), so d(v,w) = ∣∣∣{ j , i | v j , w j}∣∣∣ = d(v f ,i,w f ,i). • If vi , wi, then (since f is injective) f (vi) , f (wi), and so d(v,w) = ∣∣∣{ j , i | v j , w j}∣∣∣ + 1 = d(v f ,i,w f ,i). Corollary 1.6. Suppose C is a code of length n overA, f is a permutation ofA and i ∈ {1, . . . ,n}. Then d(C f ,i) = d(C) and |C f ,i| = |C|. Proof. Copy the proof of Corollary 1.4. Corollary 1.7. Suppose C is an (n,M, d)-code over A, and D is a code equivalent to C. Then D is an (n,M, d)-code. Proof. Obviously D is a code of length n. We get from C to D by repeatedly applying Operation 1 and/or 2, and by Corollaries 1.4 and 1.6 these operations do not change the size or the minimum distance of a code. 6 Coding Theory 2 Good codes 2.1 The main coding theory problem The most basic question we might ask about codes is: given q, n, M and d, does a q-ary (n,M, d)-code exist? More precisely, we make the following definition. Definition. Suppose q,n, d > 0. Aq(n, d) is defined to be the maximum M such that a q-ary (n,M, d)-code exists. The numbers Aq(n, d) are unknown in general, and calculating them is often referred to as the ‘main coding theory problem’. Here are two very special cases. Theorem 2.1. 1. Aq(n, 1) = qn. 2. Aq(n,n) = q. Proof. 1. We can take C = An, the set of all words of length n. Any two distinct words must differ in at least one position, so the code has minimum distance at least 1. Obviously a q-ary code of length n can’t be bigger than this. 2. Suppose we have a code of length n with at least q + 1 words. Then by the pigeonhole principle there must be two words with the same first symbol. These two words can therefore differ in at most n − 1 positions, and so the code has minimum distance less than n. So Aq(n,n) 6 q. On the other hand, the repetition code described above is an (n, q,n)-code. From the above proof we observe the following general principle, which is very useful in proofs. • Aq(n, d) > A if and only if a q-ary (n,A, d)-code exists. • Aq(n, d) 6 A if and only if there does not exist a q-ary (n,M, d)-code with M > A. 2.2 The Singleton bound Now we prove our first substantial theorem. Theorem 2.2 (Singleton bound). 1. Suppose n, d > 1. If a q-ary (n,M, d)-code exists, then a q-ary (n − 1,M, d − 1)-code exists. 2. Suppose n, d > 1. Then Aq(n, d) 6 Aq(n − 1, d − 1). 3. Suppose n, d > 1. Then Aq(n, d) 6 qn−d+1. Proof. 1. Let C be a q-ary (n,M, d)-code, and for x ∈ C, let x be the word obtained by deleting the last symbol. Let C = {x | x ∈ C}. Claim. If x , y ∈ C, then d(x, y) > d − 1. Good codes 7 Proof. We have d(x, y) > d, so x and y differ in at least d positions; so there are at least d − 1 positions other than position n where x and y differ. Hence d − 1 6 ∣∣∣{i < n | xi , yi}∣∣∣ = d(x, y). The first consequence of the claim is that, since d > 1, x and y are distinct when x and y are. So |C| = M. The second consequence is that d(C) > d − 1. So C is an (n − 1,M, d − 1)-code. 2. To show that Aq(n, d) 6 Aq(n−1, d−1), take an (n,M, d)-codeCwith M = Aq(n, d). Then by part (1) there is an (n− 1,M, d− 1)-code, which means that Aq(n− 1, d− 1) >M = Aq(n, d). 3. Applying part (2) repeatedly, we have Aq(n, d) 6 Aq(n − 1, d − 1) 6 Aq(n − 2, d − 2) 6 · · · 6 Aq(n − d + 1, 1) = qn−d+1, from Theorem 2.1. Now we prove another result which only applies to binary codes. We begin with a lemma. Lemma 2.3. Suppose x and y are words over the alphabet {0, 1}, both containing an even number of 1s. Then d(x, y) is even. Proof. d(x, y) = ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 0 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ xi = 0 and yi = 1 }∣∣∣∣ = ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 0 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 1 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ xi = 0 and yi = 1 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 1 }∣∣∣∣ − 2 ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 1 }∣∣∣∣ = ∣∣∣∣{i ∣∣∣ xi = 1 }∣∣∣∣ + ∣∣∣∣{i ∣∣∣ yi = 1 }∣∣∣∣ − 2 ∣∣∣∣{i ∣∣∣ xi = 1 and yi = 1 }∣∣∣∣ which is the sum of three even numbers, so is even. Theorem 2.4. Suppose n, d > 0 and d is even. If a binary (n − 1,M, d − 1)-code exists, then a binary (n,M, d)-code exists. Hence if d is even, then A2(n, d) > A2(n − 1, d − 1). Proof. Suppose we have a binary (n− 1,M, d− 1)-code C. Given a word x ∈ C, we form a word xˆ of length n by adding an extra symbol, which we choose to be 0 or 1 in such a way that xˆ contains an even number of 1s. Now for any x , y ∈ C, we have d(x, y) > d − 1, and clearly this gives d(xˆ, yˆ) > d − 1. But d − 1 is odd, and by Lemma 2.3 d(xˆ, yˆ) is even, so in fact we have d(xˆ, yˆ) > d. So the code Cˆ = {xˆ | x ∈ C} is an (n,M, d)-code. If we do this with M = A2(n − 1, d − 1), we get A2(n, d) >M = A2(n − 1, d − 1). As a consequence of this theorem and the Singleton bound, we have A2(n, d) = A2(n−1, d−1) whenever d is even. 8 Coding Theory 2.3 The Hamming bound Lemma 2.5. Suppose v is a word of length n over a q-ary alphabet A, and d > 0. Then the number of words w ∈ An such that d(v,w) = d is (q − 1)d(nd). Proof. In order to write a word w such that d(v,w) = d, we must choose d positions where v and w will be different, and then we must choose which symbols will appear in these positions in w. We can choose the positions in (n d ) ways; in each of these positions, we can choose any symbol from A other than the symbol appearing in this position in v; this leaves q − 1 choices for each position. So the total number of choices is (q − 1)d(nd). Definition. If x ∈ An is a word, then the sphere of radius t and centre x is S(x, t) = { y ∈ An ∣∣∣ d(x, y) 6 t} . Lemma 2.6. If A is a q-ary alphabet, x is a word over A of length n and t 6 n, then the sphere S(x, t) contains exactly ( n 0 ) + (q − 1) ( n 1 ) + (q − 1)2 ( n 2 ) + · · · + (q − 1)t ( n t ) words. Proof. This follows from Lemma 2.5, by summing over d = 0, . . . , t. Lemma 2.7. Suppose C is a code of length n over A. C is t-error-correcting if and only if for all x , y ∈ C, the spheres S(x, t) and S(y, t) are disjoint. Proof. C is t-error-correcting⇐⇒ @ x , y ∈ C, z ∈ An such that d(x, z) 6 t > d(y, z) ⇐⇒ @ x , y ∈ C, z ∈ An such that z ∈ S(x, t) and z ∈ S(y, t) ⇐⇒ @ x , y ∈ C, such that S(x, t) ∩ S(y, t) , ∅ ⇐⇒ ∀ x , y ∈ C, S(x, t) ∩ S(y, t) = ∅. Theorem 2.8. Suppose A is a q-ary alphabet, and C is a code of length n over A which is t-error- correcting. Then |C| 6 q n(n 0 ) + (q − 1) (n 1 ) + (q − 1)2 (n 2 ) + · · · + (q − 1)t (n t ) . Proof. Each word in C has a sphere of radius t around it, and by Lemma 2.7 these spheres are disjoint. So the total number of words in all these spheres together is |C| × ((n 0 ) + (q − 1) (n 1 ) + (q − 1)2 (n 2 ) + · · · + (q − 1)t (n t )) , and this can’t be bigger than the total number of words in An, which is qn. Corollary 2.9 (Hamming bound). For n, q, t > 0, Aq(n, 2t + 1) 6 qn(n 0 ) + (q − 1) (n 1 ) + (q − 1)2 (n 2 ) + · · · + (q − 1)t (n t ) . Good codes 9 Proof. If C is a q-ary (n,M, 2t + 1)-code, then by Lemma 1.2 C is t-error-correcting. So by Theorem 2.8 |C| 6 q n(n 0 ) + (q − 1) (n 1 ) + (q − 1)2 (n 2 ) + · · · + (q − 1)t (n t ) . Definition. Suppose C is a q-ary code of length n, and t > 0. C is a perfect t-error-correcting code if d(C) = 2t + 1 and |C| = q n(n 0 ) + (q − 1) (n 1 ) + (q − 1)2 (n 2 ) + · · · + (q − 1)t (n t ) . 2.4 The Plotkin bound The Plotkin bound is more complicated, but more useful. There is a version for arbitrary q, but we’ll address only binary codes. First we need to recall some notation: remember that if x ∈ R, then bxc is the largest integer which is less than or equal to x. Lemma 2.10. If x ∈ R, then b2xc 6 2bxc + 1. Proof. Let y = bxc; then x < y + 1, so 2x < 2y + 2, and so b2xc < 2y + 2. But both sides of this inequality are integers, so we get b2xc 6 2y + 1. Now we can state the Plotkin bound – there are two cases, depending on whether d is even or odd. Theorem 2.11 (Plotkin bound). 1. If d is even and d 6 n < 2d, then A2(n, d) 6 2 ⌊ d 2d − n ⌋ . 2. If d is odd and d 6 n < 2d + 1, then A2(n, d) 6 2 ⌊ d + 1 2d + 1 − n ⌋ . We now begin the proof of this bound. The proof is not examinable, but you are encouraged to read it. Non-examinable material Lemma 2.12. Suppose N,M are integers with 0 6 N 6M. Then N(M −N) 6 M2 4 (if M is even) M2 − 1 4 (if M is odd). 10 Coding Theory Proof. The graph of N(M −N) is an unhappy quadratic with its turning point at N = M/2, so to maximise it we want to make N as near as possible to this (remembering that N must be an integer). If M is even, then we can take N = M/2, while if M is odd we take N = (M − 1)/2. The proof is a double-counting argument. We suppose that our alphabet is {0, 1}, and if v = (v1 . . . vn) and w = (w1 . . .wn) are words in C, then we define v + w to be the word (v1 + w1)(v2 + w2) . . . (vn + wn), where we do addition modulo 2 (so 1 + 1 = 0). Now given a binary (n,M, d)-code C, we write down an (M2 ) by n array B whose rows are all the words v + w for pairs of distinct words v,w ∈ C. We’re going to count the number of 1s in this array in two different ways. A really useful feature of the addition operation we have defined on words is the following. Lemma 2.13. Suppose v,w are binary words of length n. Then d(v,w) is the number of 1s in v + w. Proof. By looking at the possibilities for vi and wi, we see that (v + w)i = 0 (if vi = wi)1 (if vi , wi). So d(v,w) = |{i | vi , wi}| = |{i | (v + w)i = 1}|. Lemma 2.14. The number of 1s in B is at least d (M 2 ) . Proof. We look at the number of 1s in each row. Since d(C) > d, we have d(v,w) > d for each v , w ∈ C, so by Lemma 2.13 there are at least d 1s in each row. Summing over all the rows gives the result. Now we count the 1s in B in a different way. Lemma 2.15. The number of 1s in B is at mostn M2 4 (if M is even) nM 2 − 1 4 (if M is odd). Proof. We count the number of 1s in each column. Choose j ∈ {1, . . . ,n}. The word v + w has a 1 in the jth position if and only if one of v and w has a 1 in the jth position, and the other has a 0. If we let N be the number of words in C which have a 1 in the jth position, then the number ways of choosing a pair v,w such that v + w has a 1 in the jth position is N(M −N). So the number of 1s in the jth column of our array is N(M −N) 6 M2 4 (if M is even) M2 − 1 4 (if M is odd) by Lemma 2.12. This is true for every j, so by adding up we obtain the desired inequality. Good codes 11 Proof of the Plotkin bound. 1. Suppose we have a binary (n,M, d)-code C with M = A2(n, d), and construct the array B as above. There are two cases, according to whether M is even or odd. Case 1: M even By combining Lemma 2.14 and Lemma 2.15, we get d ( M 2 ) 6 nM2 4 =⇒ dM(M − 1) 2 6 nM2 4 . Since M > 0, we get 2d(M − 1) 6 nM =⇒ (2d − n)M 6 2d. By assumption 2d − n > 0, so we divide both sides by 2d − n to get M 6 2d 2d − n . Since M is an integer, we can say M 6 ⌊ 2d 2d − n ⌋ 6 2 ⌊ d 2d − n ⌋ + 1. But M is even and the right-hand side is odd, so M 6 2 ⌊ d 2d − n ⌋ . Case 2: M odd Now we combine Lemma 2.14 and 2.15 to get d ( M 2 ) 6 n(M2 − 1) 4 =⇒ dM(M − 1) 2 6 n(M + 1)(M − 1) 4 . Assuming M > 2, we get 2dM 6 n(M + 1) =⇒ (2d − n)M 6 n. Again we have 2d − n > 0, so we get M 6 n 2d − n = 2d 2d − n − 1. 12 Coding Theory Since M is an integer, we get M 6 ⌊ 2d 2d − n ⌋ − 1 6 2 ⌊ d 2d − n ⌋ + 1 − 1, so A2(n, d) = M 6 2 ⌊ d 2d − n ⌋ . 2. Now we consider the case where d is odd; but this follows by Theorem 2.4. If d is odd and n < 2d + 1, then d + 1 is even and (n + 1) < 2(d + 1). So by the even case of the Plotkin bound we have A2(n + 1, d + 1) 6 ⌊ (d + 1) 2(d + 1) − (n + 1) ⌋ , and by Theorem 2.4 this equals A2(n, d). End of non-examinable material Error probabilities and nearest-neighbour decoding 13 3 Error probabilities and nearest-neighbour decoding 3.1 Noisy channels and decoding processes In this section, we consider the situations in which our codes might be used, and show why we try to get a large distance between words. The idea is that we choose a code C, choose a word v ∈ C, and transmit v along a ‘noisy channel’, which might introduce a few errors and produce a distorted word. The person receiving this word knows what code C we’re using, and ‘guesses’ which word in Cwe started from. Definition. Given a code C of length n over the alphabetA, a decoding process for C is a function fromAn to C. A nearest-neighbour decoding process for C is a function f : An → C such that d(v, f (v)) 6 d(v,w) for all v ∈ An, w ∈ C. What this means is that a nearest-neighbour decoding process send a word to the nearest possible word in C. We make certain assumptions about our noisy channel, namely that all errors are indepen- dent and equally likely. This means that there is a fixed p (the symbol error probability) such that any symbol will be transmitted correctly with probability 1 − p, or incorrectly with probability p, and that if there is an error then all the incorrect symbols are equally likely. Moreover, errors on different symbols are independent – whether an error occurs in one symbol has no effect on whether errors occur in later symbols. Given a code C and a decoding process f : An → C, we consider the word error probability for each v ∈ C: this is the probability that if we transmit the word v through our channel, we end up with a word w such that f (w) , v. Example. Suppose A = {0, 1}, and C = {000, 011, 110}. A nearest-neighbour decoding process for C is f : 000 7−→ 000 001 7−→ 011 010 7−→ 011 011 7−→ 011 100 7−→ 000 101 7−→ 110 110 7−→ 110 111 7−→ 110 (This is not the only possible choice of a nearest-neighbour decoding process.) Suppose out noisy channel has symbol error probability 15 . Let’s work out the word error probability for the word 000. This is the probability that when we transmit 000, the word w that comes out has f (w) , 000. This is 1 − P( f (w) = 000) = 1 − P(w = 000 or 100) = 1 − 4 5 . 4 5 . 4 5 − 1 5 . 4 5 . 4 5 = 9 25 . 14 Coding Theory In general, word error probability depends on the symbol error probability, the code, the decoding process and the word chosen, and we seek a decoding process which minimises the maximum word error probability. It can be shown that (as long as p is small) the best decoding process in this respect is always a nearest-neighbour decoding process. 3.2 Rate, capacity and Shannon’s Theorem Definition. Given a q-ary (n,M, d)-code C, we define the rate of C to be logq M n . The rate of a code can be interpreted as the ratio of ‘useful information’ to ‘total information’ transmitted. For example, the q-ary repetition code of length 3 is a (3, q, 3)-code, so has rate 13 . The useful information in a word can be thought of as the first digit – the rest of the digits are just redundant information included to reduce error probabilities. Definition. The capacity of a symmetric binary channel with symbol error probability p is 1 + p log2 p + (1 − p) log2(1 − p). The capacity of a channel is supposed to be a measure of how good the channel is for communicating. Note that if p = 12 , then the capacity is 0. This reflects the fact that it is hopeless transmitting through such a channel – given a received word, the word sent is equally likely to be any word in the code. Theorem 3.1 (Shannon’s Theorem). Suppose we have a noisy channel with capacity C. Suppose and ρ are positive real numbers with ρ < C. Then for any sufficiently large n, there exists a binary code of length n and rate at least ρ and a decoding process such that the word error probability is at most . What does the theorem say? It says that as long as the rate of our code is less than the capacity of the channel, we can make the word error probability as small as we like. The proof of this theorem is well beyond the scope of this course. Linear codes 15 4 Linear codes For the rest of the course, we shall be restricting our attention to linear codes; these are codes in which the alphabetA is a finite field, and the code itself forms a vector space overA. These codes are of great interest because: • they are easy to describe – we need only specify a basis for our code; • it is easy to calculate the minimum distance of a linear code – we need only calculate the distance of each word from the word 00 . . . 0; • it is easy to decode an error-correcting linear code, via syndrome decoding; • many of the best codes known are linear; in particular, every known non-trivial perfect code has the same parameters (i.e. length, number of words and minimum distance) as some linear code. Non-examinable material 4.1 Revision of linear algebra Definition. A field is a set F with distinguished elements 0 and 1 and binary operations + and × such that: • F is an abelian group under +, with identity element 0 (that is, we have a + b = b + a, (a + b) + c = a + (b + c), a + 0 = a, there exists an element −a of F such that −a + a = 0 for all a, b, c ∈ F); • F \ {0} is an abelian group under ×, with identity element 1 (that is, we have a × b = b × a, (a × b) × c = a × (b × c), a × 1 = a, there exists an element a−1 of F such that a−1 × a = 1 for all a, b, c ∈ F \ {0}); • a × (b + c) = (a × b) + (a × c) for all a, b, c ∈ F. Definition. If F is a field, a vector space over F is a set V with a distinguished element 0, a binary operation + and a function × : (F × V)→ V (that is, a function which, given an element λ of F and an element v of V, produces a new element λ × v of V) such that: • V is an abelian group under + with identity 0; 16 Coding Theory • for all λ, µ ∈ F and u, v ∈ V, we have (λ × µ) × v = λ × (µ × v), (λ + µ) × v = (λ × v) + (µ × v), λ × (u + v) = (λ × u) + (λ × v), 1 × v = v. We make all the familiar notational conventions: we may write a × b as ab; we write a × b−1 as a/b; we write a + (−b) as a − b. Definition. If V is a vector space over F, then a subspace is a subset of V which is also a vector space under the same operations; we write W 6 V to mean that W is a subspace of V. To check whether a subset W ⊆ V is a subspace, we can use the Subspace Test: W is a subspace if and only if • 0 ∈W, • v + w ∈W for all v,w ∈W, and • λv ∈W for all v ∈W, λ ∈ F. Definition. Suppose V is a vector space over F and v1, . . . , vn ∈ V. Then we say that v1, . . . , vn are linearly independent if there do not not exist λ1, . . . , λn ∈ Fwhich are not all zero such that λ1v1 + · · · + λnvn = 0. A linear combination of v1, . . . , vn is an expression λ1v1 + · · · + λnvn, where λ1, . . . , λn ∈ F. The set of all linear combinations of v1, . . . , vn is called the span of v1, . . . , vn, written as 〈v1, . . . , vn〉. If 〈v1, . . . , vn〉 = V, we say that v1, . . . , vn span V. A basis for V is a set {v1, . . . , vn} of vectors in V which are linearly independent and span V. Fact. If {v1, . . . , vn} is a basis of a vector space V, then every v ∈ V can be uniquely written in the form λ1v1 + · · · + λnvn, for λ1, . . . , λn ∈ V. Fact. Any vector space V has at least one basis, and all bases of V have the same size. Definition. If V is a vector space, the dimension of V (written dim(V)) is the number of elements in a basis of V. Useful fact. If V is a vector space of dimension n and v1, . . . , vn are vectors in V which are linearly independent or span V, then {v1, . . . , vn} is a basis of V. Definition. Suppose V,W are vector spaces over F. A linear map from V to W is a function α : V →W such that α(λu + µv) = λα(u) + µα(v) for all λ, µ ∈ F and u, v ∈ V. If α is a linear map, the kernel of α is the subset ker(α) = {v ∈ V | α(v) = 0} Linear codes 17 of V, and the image of α is the subset Im(α) = {α(v) | v ∈ V} of W. ker(α) is a subspace of V, and we refer to its dimension as the nullity of α. Im(α) is a subspace of W, and we refer to its dimension as the rank of α. The Rank–nullity Theorem says that if α is a linear map from V to W, then nullity(α) + rank(α) = dim(V). We shall need the following familiar property of fields. Lemma 4.1. Let F be a field, and a, b ∈ F. 1. a0 = 0. 2. If ab = 0, then a = 0 or b = 0. Proof. 1. We have 0 = 0 + 0, so a0 = a(0 + 0) = a0 + a0. Adding −(a0) to both sides gives 0 = a0. 2. Suppose ab = 0 but b , 0. Then we have bb−1 = 1. So 0 = 0b−1 = abb−1 = a1 = a. We shall only be interested in one particular type of vector space. For a non-negative integer n, we consider the set Fn, which we think of as the set of column vectors of length n over F, with operations x1 ... xn + y1 ... yn = x1 + y1 ... xn + yn and λ x1 ... xn = λx1 ... λxn . Fn is a vector space over F of dimension n. Sometimes we will think of the elements of Fn as row vectors rather than column vectors, or as words of length n over F. Given m,n and an n ×m matrix A over F, we can define a linear map α : Fm → Fn by α x1 ... xm = A x1 ... xm = A11x1 + · · · + A1mxm ... An1x1 + · · · + Anmxm . Every linear map from Fm to Fn arises in this way. The rank of A is defined to be the rank of this linear map. The column rank of A is defined to be the dimension of 〈c1, . . . , cm〉, where c1, . . . , cm are the columns of A regarded as vectors in Fn, and the row rank is defined to be the dimension of 〈r1, . . . , rn〉, where r1, . . . , rn are the rows of A regarded as (row) vectors in Fm. Fact. If A is an n ×m matrix and α : Fm → Fn is the corresponding linear map, then rank(α) = column rank(A) = row rank(A). End of non-examinable material 18 Coding Theory 4.2 Finite fields and linear codes The examples of fields you are most familiar with are Q, R and C. But these are of no interest in this course – we are concerned with finite fields. The classification of finite fields goes back to Galois. Theorem 4.2. Let q be an integer greater than 1. Then a field of order q exists if and only if q is a prime power, and this field is unique. If q is a prime power, then we refer to the unique field of order q as Fq. For example, if q is actually a prime, then Fq simply consists of set {0, 1, . . . , q−1}, with addition and multiplication mod q. If q is a prime power but not a prime, then the field Fq is not of this form – you don’t get a field if you define it like this. Actually, it’s awkward to describe what the field Fq looks like without developing lots of theory. But this doesn’t matter – all that matters for us is that Fq is a field. Definition. Suppose q is a prime power. A linear code over Fq is a subspace of Fnq . 4.3 The minimum distance of a linear code One of the advantages of linear codes that we mentioned earlier is that it’s easy to find the minimum distance of a linear code. Definition. Suppose q is a prime power and w ∈ Fnq . The weight weight(w) of w is defined to be the number of non-zero symbols in w. Lemma 4.3. Suppose q is a prime power, w, x, y ∈ Fnq and λ is a non-zero element of Fq. Then: 1. d(w + y, x + y) = d(w, x); 2. d(λw, λx) = d(w, x); 3. d(w, x) = weight(w − x). Proof. 1. d(w + y, x + y) = |{i | (w + y)i , (x + y)i}| = |{i | wi + yi , xi + yi}|. We have wi + yi = xi + yi if and only if wi = xi (because we can add ±yi to both sides), so d(w + y, x + y) = |{i | wi , xi}| = d(w, x). 2. d(λw, λx) = |{i | (λw)i , (λx)i}| = |{i | λwi , λxi}|. We have λwi = λxi if and only if wi = xi (because we can multiply both sides by λ or λ−1), so d(λw, λx) = |{i | wi , xi}| = d(w, x). Linear codes 19 3. By definition weight(w − x) = d(w − x, 00 . . . 0), which equals d(w − x, x − x), and by part (1) this equals d(w, x). Corollary 4.4. Suppose C is a linear code over Fq. The minimum distance of C equals the minimum weight of a non-zero word in C. Proof. Let D be the minimum weight of a non-zero word in C. If we let w be a word in C of weight D, then we have D = weight(w) = d(w, 00 . . . 0). Since w, 00 . . . 0 are different words in C, we must have d(C) 6 D. Now let x, y be words in C such that d(x, y) = d(C). Then we have x − y ∈ C because C is closed under subtraction, and weight(x − y) = d(x, y) = d(C). So D 6 d(C). So we have D > d(C) and D 6 d(C), so D = d(C). Key example. Let q be any prime power, n > 2 an integer, and define the parity-check code of length n over Fq to be C = { v ∈ Fnq ∣∣∣ v1 + · · · + vn = 0} . Then C is linear (apply the Subspace Test), and C has minimum distance 2. This is beacue C contains word of weight 2, namely 1(−1)00 . . . 0, but C does not contains any words of weight 1: if v ∈ Fnq has weight 1, then there is a unique i sch that vi , 0. But then v1 + · · · + vn = vi , 0, so v < C. 4.4 Bases and generator matrices Lemma 4.5. A vector space V of dimension k over Fq contains exactly qk elements. Proof. Suppose {e1, . . . , ek} is a basis for V. Then and v ∈ V can be written uniquely in the form λ1e1 + λ2e2 + · · · + λkek, for some choice of λ1, . . . , λk ∈ Fq. So the number of vectors in V is the number of expressions of this form, i.e. the number of choices of λ1, . . . , λk. Now there are q ways to choose each λi (since there are q elements of Fq to choose from), and so the total number of choices of these scalars is qk. Notation. Suppose q is a prime power. A linear [n, k]-code over Fq is a linear code over Fq of length n and dimension k. If in addition the code has minimum distance at least d, we say that C is a linear [n, k, d]-code. As a consequence of Lemma 4.5, we see that a linear [n, k, d]-code over Fq is a q-ary (n, qk, d)- code. Definition. Suppose C is a linear [n, k]-code over Fq. A generator matrix for C is a k × n matrix with entries in Fq, whose rows form a basis for C. 4.5 Equivalence of linear codes Recall the definition of equivalent codes from earlier: codes C and D over an alphabet A are equivalent if we can get from C toD via a sequence of Operations 1 and 2. There’s a slight problem with applying this to linear codes, which is that if C is linear and D is equivalent to C, thenD need not be linear. Operation 1 is OK, as we shall now show. 20 Coding Theory Lemma 4.6. Suppose C is a linear [n, k, d]-code over Fq, and σ is a permutation of {1, . . . ,n}. Then the map φ : C −→ Fnq v 7−→ vσ is linear, and Cσ is a linear [n, k, d]-code. Proof. Suppose v,w ∈ C and λ, µ ∈ Fq. We need to show that φ(λv + µw) = λφ(v) + µφ(w), i.e. ( φ(λv + µw) ) j = ( λφ(v) + µφ(w) ) j for every j ∈ {1, . . . ,n}. We have( φ(λv + µw) ) j = ( (λv + µw)σ ) j = (λv + µw)σ( j) = (λv)σ( j) + (µw)σ( j) = λ(vσ( j)) + µ(wσ( j)) = λ(vσ) j + µ(wσ) j = λ(φ(v)) j + µ(φ(w)) j = ( λφ(v) ) j + ( µφ(w) ) j = ( λφ(v) + µφ(w) ) j , as required. Now Cσ is by definition the image of φ, and so is a subspace of Fnq , i.e. a linear code. We know that d(Cσ) = d(C) from before, and that |Cσ| = |C|. By Lemma 4.5 this implies that dimCσ = dimC = k, so Cσ is an [n, k, d]-code. Unfortunately, Operation 2 does not preserve linearity. So we define the following. Operation 2′ SupposeC is a linear code of length n overFq. Choose i ∈ {1, . . . ,n} and a ∈ Fq\{0}. For v = v1 . . . vn ∈ Fnq define va,i = v1 . . . vi−1(avi)vi+1 . . . vn. Now replace Cwith the code Ca,i = {va,i | v ∈ C}. We want to show that Operation 2′ preserves linearity, dimension and minimum distance. We begin by showing that it’s a special case of Operation 2. Linear codes 21 Lemma 4.7. If F is a field and a ∈ F \ {0}, then the map f : F −→ F x 7−→ ax is a bijection, i.e. a permutation of F. Proof. Since a , 0 and F is a field, a has an inverse a−1. So f has an inverse x 7−→ a−1x, and hence is a bijection. Now we show that the operation which sends v to va,i is linear, which will mean that it sends linear codes to linear codes. Lemma 4.8. Suppose C is a linear [n, k, d]-code over Fq, i ∈ {1, . . . ,n} and 0 , a ∈ Fq. Then the map φ : C −→ Fnq v 7−→ va,i is linear, and Ca,i is a linear [n, k, d]-code over Fq. Proof. Take v,w ∈ C and λ, µ ∈ Fq. We must show that( φ(λv + µw) ) j = ( λφ(v) + µφ(w) ) j for each j ∈ {1, . . . ,n}. For j , i we have( φ(λv + µw) ) j = (λv + µw) j = (λv) j + (µw) j = λv j + µw j = λ(φ(v)) j + µ(φ(w)) j = (λφ(v)) j + (µφ(w)) j, = ( λφ(v) + µφ(w) ) j , while for j = i we have ( φ(λv + µw) ) j = a ( λv + µw ) j = a ( (λv) j + (µw) j ) = a(λv j + µw j) = aλv j + aµw j = λav j + µaw j = λ(φ(v) j) + µ(φ(w) j) = (λφ(v)) j + (µφ(w)) j = ( λφ(v) + µφ(w) ) j , 22 Coding Theory as required. Now Ca,i is by definition the image of φ, and this is a subspace of Fnq , i.e. a linear code. We know from before (since Operation 2′ is a special case of Operation 2) that d(Ca,i) = d(C) and |Ca,i| = |C|, and this gives dimCa,i = dimC = k, so that Ca,i is a linear [n, k, d]-code. In view of these results we re-define equivalence for linear codes. Definition. Suppose C andD are linear codes over Fq. C andD are equivalent (as linear codes) if we can get from one to the other by applying Operations 1 and 2′ repeatedly. As a consequence of Lemma 4.6 and Lemma 4.8, we see that if two linear codes are equiva- lent, then they have the same dimension and minimum distance. Now we consider the relationship between equivalence and generator matrices. We define the following operations on matrices over Fq: MO1. permuting the rows; MO2. multiplying a row by a non-zero element of Fq; MO3. adding a multiple of a row to another row. You should recognise these as the ‘elementary row operations’ from Linear Algebra. Their importance is as follows. Lemma 4.9. Suppose C is a linear [n, k]-code with generator matrix G. If the matrix H can be obtained from G by applying one of MO1–MO3, then H is also a generator matrix for C. Proof. Since G is a generator matrix forC, we know that the rows of G are linearly independent and span C. So G has rank k (the number of rows) and row space C. We know from linear algebra that elementary row operations do not affect the rank or the row space of a matrix, so H also has rank k and row space C. So the rows of H are linearly independent and span C, so form a basis for C, i.e. H is a generator matrix for C. Now we define two more matrix operations: MO4. permuting the columns; MO5. multiplying a column by a non-zero element of Fq. Lemma 4.10. Suppose C is a linear [n, k]-code over Fq, with generator matrix G. If the matrix H is obtained from G by applying MO4 or MO5, then H is a generator matrix for a codeD equivalent to C. Proof. Suppose G has entries g jl, for 1 6 j 6 k and 1 6 l 6 n. Let r1, . . . , rk be the rows of G, i.e. r j = g j1g j2 . . . g jn. By assumption {r1, . . . , rk} is a basis for C; in particular, the rank of G is k. • Suppose H is obtained using MO4, applying a permutation σ to the columns. This means that h jl = g jσ(l), so row j of H is the word g jσ(1)g jσ(2) . . . g jσ(n). But this is the word (r j)σ as defined in equivalence operation 1. So the rows of H lie in the code Cσ, which is equivalent to C. Linear codes 23 • Now suppose instead that we obtain H by applying MO5, multiplying column i by a ∈ Fq \ {0}. This means that h jl = g jl (l , i)ag jl (l = i) , so that row j of H is the word g j1g j2 . . . g j(i−1)(ag ji)g j(i+1) . . . g jn. But this is the word (r j)a,i as defined in equivalence operation 2. So the rows of H lie in the code Ca,i, which is equivalent to C. For either matrix operation, we have seen that the rows of H lie in a codeD equivalent toC. We need to know that they form a basis forD. Since there are k rows and dim(D) = dim(C) = k, it suffices to show that the rows of H are linearly independent, i.e. to show that H has rank k. But matrix operations 4 and 5 are elementary column operations, and we know from linear algebra that these don’t affect the rank of a matrix. So rank(H) = rank(G) = k. We can summarise these results as follows. Proposition 4.11. Suppose C is a linear [n, k]-code with a generator matrix G, and that the matrix H is obtained by applying a combination of matrix operations MO1–5. Then H is a generator matrix for a codeD equivalent to C. Proof. Each time we apply MO1–3 we get a new generator matrix for for the same code, by Lemma 4.9. Each time we apply MO4 or MO5, we get a generator matrix for an equivalent code, by Lemma 4.10. So a combination of these operations will yield a generator matrix for a code equivalent to the one we started with. Note that in the list of matrix operations 1–5, there is a sort of symmetry between rows and columns. In fact, you might expect that you can do another operation MO6. adding a multiple of a column to another column but you can’t. Doing this can take you to a code with a different minimum distance. For example, suppose q = 2, and that C is the parity-check code of length 3: C = {000, 011, 101, 110}. We have seen that d(C) = 2 and that C has a generator matrix G = ( 101 011 ) . If we applied MO6 above, adding column 1 to column 2, we’d get the matrix H = ( 111 011 ) . This is a generator matrix for the code {000, 111, 011, 100}, 24 Coding Theory which has minimum distance 1, so is not equivalent to C. So the difference between ‘row operations’ and ‘column operations’ is critical, and we do not allow MO6. Armed with these operations, we can define a standard way in which we can write generator matrices. Definition. Let G be a k × n matrix over Fq, with k 6 n. We say that G is in standard form if G = (Ik|B) , where Ik is the k × k identity matrix, and B is some k × (n − k) matrix. Transforming a matrix into standard form Now we show how to transform a generator matrix into standard form, using MO1–5. So suppose G is a k × n matrix over Fq, whose rows are linearly independent. Since the rows of G are linearly independent, G has rank k. MO1–5 do not affect the rank of a matrix, so at any point in the following procedure the matrix obtained will have rank k, and will therefore have linearly independent rows. For i = 1, . . . , k we want to transform column i into 0 ... 0 1 0 ... 0 , where the 1 is in position i. Suppose we have already done this for columns 1, . . . , i − 1. Step 1. Since the rows of our matrix are linearly independent, they are non-zero, so there must be some non-zero entry in the ith row. Furthermore, by what we know about columns 1, . . . , i − 1, this non-zero entry must occur in one of columns i, . . . ,n. So we apply MO4, permuting columns i, . . . ,n to get a non-zero entry in the (i, i)-position of our matrix. Step 2. Suppose the (i, i)-entry of our matrix is a , 0. Then we apply MO5, multiplying column i of our matrix by a−1, to get a 1 in the (i, i)-position. Step 3. We now apply MO3, adding multiples of row i to the other rows in order to ‘kill’ the remaining non-zero entries in column i. Note that this operation does not affect columns 1, . . . , i − 1. By applying Steps 1–3 for i = 1, . . . , k in turn, we get a matrix in standard form. Note that it is automatic from the proof that k 6 n. Proposition 4.12. Suppose C is a linear [n, k]-code over Fq. Then C is equivalent to a code with a generator matrix in standard form. Proof. Let G be a generator matrix for C. G has linearly independent rows, so using the procedure described above we can transform G into a matrix H in standard form using MO1–5. By Proposition 4.11, H is a generator matrix for a code equivalent to C. Linear codes 25 4.6 Decoding with a linear code Definition. Suppose C is a linear [n, k]-code over Fq. For v ∈ Fnq , define v + C = {v + w | w ∈ C}. The set v + C is a called a coset of C. The crucial properties of cosets are as follows. Proposition 4.13. Suppose C is an [n, k]-code over Fq. Then: 1. every coset of C contains exactly qk words; 2. every word in Fnq is contained in some coset of C; 3. if the word v is contained in the coset u + C, then v + C = u + C; 4. any word in Fnq is contained in at most one coset of C; 5. there are exactly qn−k cosets of C. Proof. 1. Given a coset v + C, we define φ : C −→ v + C w 7−→ v + w, and we claim that φ is a bijection. φ is certainly surjective, since by definition v + C is the image of φ. And φ is injective, since if w, x ∈ C with φ(w) = φ(x), then v + w = v + x, which gives w = x. So φ is a bijection, so |v + C| = |C| = qk, by Lemma 4.5. 2. Suppose v ∈ Fnq . Since C is linear, we have 00 . . . 0 ∈ C, and so v = v + 00 . . . 0 ∈ v + C. 3. Since v ∈ u + C, we have v = u + x for some x ∈ C. Now for every w ∈ C, we have v + w = u + (x + w) ∈ u + C; so every word in v + C is in u + C, i.e. v + C ⊆ u + C. But |v + C| = |u + C| by (1), so v + C = u + C. 4. Suppose u ∈ v + C and u ∈ w + C, where u, v,w ∈ Fnq . Then by part (3) we have v + C = u + C = w + C. 5. There are qn words altogether in Fnq , and by parts (2) and (4) each of them is contained in exactly one coset of C. Each coset has size qk, and so the number of cosets must be qn/qk = qn−k. 26 Coding Theory Definition. Given a linear [n, k]-code C over Fq and a coset w +C, we define a leader of w +C to be a word of minimal weight in w + C. Now suppose we are given a linear [n, k]-code C over Fq. We define a Slepian array for C to be an array of words, constructed as follows: • choose one leader from each coset (note that the word 00 . . . 0 must be the leader chosen from the coset 00 . . . 0 + C = C, since it has smaller weight than any other word); • in the first row of the array, put all the words in C, with 00 . . . 0 at the left and the other words in any order; • in the first column put all the chosen coset leaders – the word 00 . . . 0 is at the top, and the remaining leaders go in any order; • now fill in the remaining entries by letting the entry in row i and column j be (leader at the start of row i) + (word at the top of column j). Lemma 4.14. Suppose C is a linear [n, k]-code over Fq, and S is a Slepian array for C. Then every word in Fnq appears once in S. Proof. Take v ∈ Fnq . The v lies in some coset D, by Proposition 4.13(2). Let y be the chosen leader for D; then y appears in column 1, in row i, say. Since y ∈ D, we have D = y + C, by Proposition 4.13(3). So v ∈ y + C, and so we can write v = y + u, where u ∈ C. u lies in row 1 of the array, in column j, say, and so v lies in row i and column j of the array. Definition. Let C be an [n, k]-code over Fq, and let S be a Slepian array for C. We define a decoding process f : Fnq → C as follows. For v ∈ Fnq , we find v in the array S (which we can do, by Lemma 4.14). Now we let f (v) be the word at the top of the same column as v. Theorem 4.15. Suppose C is a linear [n, k]-code over Fq and S is a Slepian array for C, and let f be the decoding process constructed above. Then f is a nearest-neighbour decoding process. Proof. We need to show that for any v ∈ Fnq and any w ∈ C, d(v, f (v)) 6 d(v,w). v appears somewhere in S, and by construction f (v) is the word at the top of the same column. If we let u be the word at the start of the same row as v, then by the construction of S we have v = u + f (v). This gives v − w = u + ( f (v) − w) ∈ u + C. We also have u = u + 00 . . . 0 ∈ u + C, and u was chosen to be a leader for this coset, which means that weight(u) 6 weight(v − w). So d(v, f (v)) = weight(v − f (v)) = weight(u) 6 weight(v − w) = d(v,w). Linear codes 27 Quick algorithm for constructing a Slepian array • Row 1: write the words ofC in a row, with the word 00 . . . 0 at the start, and the remaining words in any order. • Subsequent rows: choose a word u that you haven’t written yet of minimal weight. Put u at the start of the row, and then for the rest of the row, put u + (word at the top of column j) in column j. • Stop when every word appears in the array. 28 Coding Theory 5 Dual codes and parity-check matrices 5.1 The dual code Definition. Suppose q is a prime power, and n > 0. For v,w ∈ Fnq , define the dot product v.w = v1w1 + . . . vnwn. Lemma 5.1. The dot product . is symmetric and bilinear, i.e. v.w = w.v and v.(λw + µx) = λ(v.w) + µ(v.x) for all words v,w,w and all λ, µ ∈ Fq. Proof. Exercise (non-examinable). Definition. Suppose C is a linear [n, k]-code over Fq. The dual code is defined to be C⊥ = { w ∈ Fnq ∣∣∣ v.w = 0 for all v ∈ C} . Lemma 5.2. Suppose C is a linear [n, k]-code and G is a generator matrix for C. Then w ∈ C⊥ ⇐⇒ GwT = 0. Note that we think of the elements of Fnq as row vectors; if w is the row vector (w1 . . .wn), then wT is the column vector w1 ... wn . G is a k × n matrix, so GwT is a column vector of length k. Proof. Suppose G has entries gi j, for 1 6 i 6 k and 1 6 j 6 n. Let g(i) denote the ith row of G. Then g(1), . . . , g(k) are words which form a basis for C, and g(i) = gi1 . . . gin. Now ( GwT ) i = n∑ j=1 gi jw j = g(i).w, so GwT = 0 if and only if g(i).w = 0 for all i. Suppose w ∈ C⊥. Then v.w = 0 for all v ∈ C. In particular, g(i).w = 0 for i = 1, . . . , k. So GwT = 0. Conversely, suppose GwT = 0. Given v ∈ C, we can write v = λ1g(1) + · · · + λkg(k) Dual codes and parity-check matrices 29 for λ1, . . . , λk ∈ Fq, since g(1), . . . , g(k) span C. So v.w = (λ1g(1) + · · · + λkg(k)).w = λ(g(1).w) + · · · + λk(g(k).w) by Lemma 5.1 = 0 + · · · + 0 = 0. This applies for any v ∈ C, so w ∈ C⊥. This lemma gives us another way to think of C⊥: it is the kernel of any generator matrix of C. Theorem 5.3. If C is a linear [n, k]-code over Fq, then C⊥ is a linear [n,n − k]-code. Proof. Let G be a generator matrix ofC. Then Lemma 5.2 says thatC⊥ is the kernel of the linear map φ : Fnq −→ Fkq w 7−→ GwT, so it is a subspace of Fnq , i.e. a linear code of length n. The Rank–nullity Theorem says that rank(φ) + nullity(φ) = dimFnq , so rank(G) + dim(C⊥) = n. G has k rows, which are linearly independent, and so rank(G) = k. Hence dimC⊥ = n − k. Theorem 5.4. Suppose C is a linear [n, k]-code over Fq. Then (C⊥)⊥ = C. Proof. First note that because C⊥ = {w ∈ Fnq | v.w = 0 for all v ∈ C}, we have v.w = 0 whenever v ∈ C and w ∈ C⊥. This means that any v ∈ V is contained in {x ∈ Fnq | w.x = 0 for all w ∈ C⊥}. But this is by definition (C⊥)⊥. So C ⊆ (C⊥)⊥. On the other hand C and (C⊥)⊥ are both vector spaces, and by Theorem 5.3 dim((C⊥)⊥) = n − dim(C⊥) = n − (n − k) = k = dimC, so C = (C⊥)⊥. Now we make a very important definition. Definition. Suppose C be a linear code. A parity-check matrix for C is a generator matrix for C⊥. We will see in the rest of the course that parity-check matrices are generally more useful than generator matrices. Here is an instance of this. Lemma 5.5. Suppose C is a linear [n, k]-code over Fq, H is a parity-check matrix for C, and v is a word in Fnq . Then v ∈ C if and only if HvT = 0. 30 Coding Theory Proof. By Theorem 5.4 we have v ∈ C if and only if v ∈ (C⊥)⊥. Now H is a generator matrix for C⊥, and so by Lemma 5.2 we have w ∈ (C⊥)⊥ if and only if HwT = 0. Non-examinable material But can we find a parity-check matrix? The following lemma provides a start. Lemma 5.6. Suppose C is a linear [n, k]-code over Fq with generator matrix G, and H is a matrix over Fq. Then H is a parity-check matrix for C if and only if • H is an (n − k) × n matrix, • the rows of H are linearly independent, and • GHT = 0. In order to prove this lemma, we think about how we multiply matrices. Suppose A is an r × s matrix, and B is an s × t matrix, and let b(1), . . . , b(t) deonte the columns of B. Then the columns of AB are Ab(1), . . . ,Ab(t). Proof of Lemma 5.6. (⇒) Suppose H is a parity-check matrix forC, then it is a generator matrix forC⊥, which is a linear [n,n−k]-code, so H is an (n−k)×n matrix. If we let h(1), . . . , h(n−k) denote the rows of H, then these rows form a basis for C⊥; in particular, they are linearly independent. Now the columns of HT are h(1)T, . . . , h(n − k)T, so the columns of GHT are Gh(1)T, . . . ,Gh(n − k)T. Since each h(i) lies in C⊥, we have Gh(i)T = 0 by Lemma 5.2. So GHT = 0. (⇐) Suppose H is an (n − k) × n matrix with linearly independent rows, and that GHT = 0. Again, we let h(1), . . . , h(n − k) denote the rows of H. Since GHT = 0, we have Gh(i)T = 0 for each i, so each h(i) lies in C⊥ by Lemma 5.2. So the rows of H are linearly independent words in C⊥. But the number of rows is n − k, which is the dimension of C⊥, so in fact the rows form a basis for C⊥, and hence H is a generator matrix for C⊥, i.e. a parity-check matrix for C. This helps us to find a parity-check matrix if we already have a generator matrix. If the generator matrix is in standard form, then a parity-check matrix is particularly easy to find. First we need a lemma about multiplying block matrices. Lemma 5.7. Suppose F is a field, and A is an m × n matrix over F, B is an m × s matrix over F, C is an n × t matrix over F, D is an s × t matrix over F. Let E = ( A B ) , F = ( C D ) . Then EF = AC + BD. Dual codes and parity-check matrices 31 Proof. Omitted (non-examinable). Now we can find a parity-check matrix from a standard-form generator matrix. End of non-examinable material Proposition 5.8. Suppose C is a linear [n, k]-code over Fq, and that G = ( Ik B ) is a standard-form generator matrix for C. Then the matrix H = ( −BT In−k ) is a parity-check matrix for C. Proof. (Non examinable) Certainly H is an n − k by n matrix. The last n − k columns of H are the standard basis vectors, and and so are linearly independent. So H has rank at least n − k, and hence the rows of H must be linearly independent. By Lemma 5.7 we have GHT = ( Ik B ) ( −B In−k ) = −B + B = 0, and so H is a parity-check matrix, by Lemma 5.6. 5.2 Syndrome decoding Definition. Suppose C is a linear [n, k]-code over Fq, and H is a parity-check matrix for C. Then for any word w ∈ Fnq , the syndrome of w is the vector HwT ∈ Fn−kq . Note that although HwT is strictly speaking a column vector, we’ll often write it as a row vector (i.e. word). We saw above that w ∈ C if and only if HwT = 0. So the syndrome of a word tells us whether it lies in our code. In fact, the syndrome tells us which coset of our code the word lies in. Lemma 5.9. Suppose C is a linear [n, k]-code over Fq with parity-check matrix H, and v,w are words in Fnq . Then v and w lie in the same coset of C if and only if they have the same syndrome. Proof. The coset containing w is w + C, so v and w lie in the same coset⇐⇒ v ∈ w + C ⇐⇒ v = w + x, for some x ∈ C ⇐⇒ v − w ∈ C ⇐⇒ H(vT − wT) = 0 by Lemma 5.5 ⇐⇒ HvT −HwT = 0 ⇐⇒ HvT = HwT. 32 Coding Theory The point of this is that we can use syndromes to decode a linear code without having to write out a Slepian array. Definition. Suppose C is a linear code, and H is a parity-check matrix for C. We construct a syndrome look-up table for C as follows. 1. Choose a leader from each coset. 2. Draw a table with two columns; in the first column put the chosen coset leaders, and next to each leader put its syndrome. Remark. In a syndrome look-up table for an [n, k]-code over Fq, there are qn−k rows, because there are qn−k cosets (Proposition 4.13). Each coset leader has a different syndrome, so qn−k different syndromes appear. The syndromes lie in Fn−kq , so there are qn−k possible syndromes. So every possible syndrome must appear in the table. Definition. Suppose C is a linear [n, k]-code over Fq, and we have a syndrome look-up table for C. We define a decoding process g for C as follows. • Given a word w ∈ Fnq , calculate its syndrome. • Find this syndrome in the syndrome look-up table; let v be the coset leader with this syndrome. • Define g(w) = w − v. Note that because w and v have the same syndrome, they lie in the same coset of C. So w − v ∈ C, so g really is a decoding process. Lemma 5.10. Suppose C is a linear [n, k]-code over Fq, T is a syndrome look-up table for C, and S is a Slepian array for C with the same chosen coset leaders. Then the decoding process g for C obtained from T is the same as the decoding process f obtained from S; in particular, g is a nearest-neighbour decoding process. Proof. Take w ∈ Fnq , and let v be the chosen leader with the same syndrome as w. Then g(w) = w − v. Since v is one of the chosen coset leaders, it appears at the start of some row of S, say row i. Since g(w) ∈ C, it appears at the top of a column of S, say column j. So the word in row i and column j is v + g(w) = w. Hence f (w) is the word at the top of column j, which is g(w). So f = g. Just as with Slepian arrays, there’s a quicker way to find a syndrome look-up table than writing out all the cosets and choosing leaders. Quick method for finding a syndrome look-up table • Examine all words of weight 0, 1, 2, . . . in turn, computing the syndrome of each one. • Each time you get a new syndrome, add a row to the table. • Stop when all possible syndromes appear in the table. Some examples of linear codes 33 6 Some examples of linear codes 6.1 Reed–Muller codes Now we define a family of binary linear codes which have quite large minimum distance. Definition. Suppose v,w are words of the same length over F2. Define v||w to be the word obtained by joining v and w together. Definition. Suppose 0 6 s 6 t. The binary Reed–Muller code R(s, t) is a code of length 2t defined recursively as follows: • if s = 0, then R(s, t) is just the repetition code of length 2t; • if s = t, then R(s, t) consists of all words of length 2t; • if 0 < s < t, then R(s, t) = {v||(v + w) | v ∈ R(s, t − 1), w ∈ R(s − 1, t − 1) } . Proposition 6.1. R(s, t) is a linear code. Proof. We use induction on t. If s = 0, then we’re done, becase we know that the repetition code is linear. If s = t, then R(s, t) = F2t2 , which is a subspace of itself, so is a linear code. So suppose 0 < s < t. then by induction we may assume that R(s, t − 1) and R(s − 1, t − 1) are both linear. We apply the Subspace Test. R(s, t) contains the zero word: By assumptionR(s, t−1) andR(s−1, t−1) both contain the zero word 00 . . . 0 of length 2t−1. So R(s, t) contains the word 00 . . . 0||(00 . . . 0 + 00 . . . 0) which is the zero word of length 2t. R(s, t) is closed under addition: Suppose we have two words in R(s, t). These can be written in the form v||(v + w) and x||(x + y) for some v, x ∈ R(s, t − 1), w, y ∈ R(s − 1, t − 1). So the sum of these two words is (v + x)||(v + x + w + y). By assumption R(s, t − 1) is closed under addition, so contains v + x, and R(s − 1, t − 1) is closed under addition, so contains the word w + y. So R(s, t) contains the word (v + x)||(v + x + w + y). R(s, t) is closed under scalar multiplication: Since q = 2, there is nothing to check. Next we’ll work out the dimension of R(s, t). Lemma 6.2. Suppose 0 < s < t. Then |R(s, t)| = |R(s, t − 1)| × |R(s − 1, t − 1)|. Proof. We know that every word in R(s, t) is of the form v||(v + w) for v ∈ R(s, t − 1) and w ∈ R(s− 1, t− 1). The number of choices for v is |R(s, t− 1)|, and the number of choices for w is |R(s−1, t−1)|. So as long as different choices of v,w give us different words inR(s, t), we’re done. So suppose we have v, v′ ∈ R(s, t− 1) and w,w′ ∈ R(s− 1, t− 1) such that v||(v + w) = v′||(v′ + w′). Looking at the first half of each of these words, we see that v = v′ And looking at the second half, we see that v + w = v′ + w′, and hence w = w′. 34 Coding Theory Now we need to recall a couple of results about binomial coefficients. Lemma 6.3. 1. If t > 0, then ( t 0 ) + ( t 1 ) + · · · + ( t t ) = 2t. 2. If 0 < u < t, then ( t u ) = ( t − 1 u ) + ( t − 1 u − 1 ) . Proof. Exercse (non-examinable). Proposition 6.4. If 0 6 s 6 t, then dim(R(s, t)) = ( t 0 ) + ( t 1 ) + · · · + ( t s ) . Proof. We use induction on t. If s = 0, then R(s, t) is the repetition code, which has dimension 1 = (t 0 ) . If s = t, then R(s, t) = F2t2 , which has dimension 2t, which by lemma 6.3 equals(t 0 ) + (t 1 ) + · · · + (tt). Now suppose 0 < s < t. By induction, we can assume that dim(R(s, t − 1)) = ( t − 1 0 ) + ( t − 1 1 ) + · · · + ( t − 1 s ) , dim(R(s − 1, t − 1)) = ( t − 1 0 ) + ( t − 1 1 ) + · · · + ( t − 1 s − 1 ) . Hence by Lemma 4.5, |R(s, t)| = |R(s, t − 1)| × |R(s − 1, t − 1)| = 2(t−10 )+···+(t−1s ) × 2(t−10 )+···+(t−1s−1), so dimR(s, t) = ( t − 1 0 ) + ( t − 1 1 ) + ( t − 1 2 ) + · · ·+ ( t − 1 s ) + ( t − 1 0 ) + ( t − 1 1 ) + · · ·+ ( t − 1 s − 1 ) = ( t 0 ) + ( t 1 ) + ( t 2 ) + · · ·+ ( t s ) . How to construct a basis for R(s, t) The proof of the last proposition shows that for 0 < s < t, dim(R(s, t)) = dim(R(s, t − 1)) + dim(R(s − 1, t − 1)). This gives us a way to construct a basis. Take a basis B forR(s, t−1), and a basis C forR(s−1, t−1); then the set {v||v | v ∈ B } ∪ {00 . . . 0||w | w ∈ C} is a basis for R(s, t). Next we want to work out the minimum distance of R(s, t). Before we can do this, we need to know the following result. Some examples of linear codes 35 Lemma 6.5. For any t > 0, R(0, t) ⊂ R(1, t) ⊂ · · · ⊂ R(t, t). Proof. We prove that R(s, t) ⊂ R(s + 1, t) for all 0 6 s < t, using induction on t. • If s = t − 1, then R(s + 1, t) is the whole of F2t2 , so obviously R(s, t) ⊂ R(s + 1, t). So assume s < t − 1. • Now suppose s = 0. Then R(s, t) = {00 . . . 0, 11 . . . 1}. By induction we may assume that R(0, t− 1) ⊂ R(1, t− 1), i.e. R(1, t− 1) contains the words 00 . . . 0 and 11 . . . 1 of length 2t−1. By construction R(1, t) contains all words v||v for v ∈ R(1, t − 1), so in particular includes the words 00 . . . 0 and 11 . . . 1 of length 2t. • Now suppose 0 < s < t− 1. By induction we may assume that R(s− 1, t− 1) ⊂ R(s, t− 1) ⊂ R(s+1, t−1). Any word inR(s, t) has the form v||(v+w) for v ∈ R(s, t−1) andR(s−1, t−1). by assumption, this means v ∈ R(s+1, t−1) and w ∈ R(s, t−1), so v||(v+w) ∈ R(s+1, t). Proposition 6.6. Suppose 0 6 s 6 t. Then d(R(s, t)) = 2t−s. Proof. We use induction on t. • If s = 0, then R(s, t) is the repetition code of length 2t, and we know this has minimum distance 2t. • if s = t, then R(s, t) = F2t2 , which obviously has minimum distance 1. • Now suppose 0 < s < t. By induction we may assume that d(R(s, t − 1)) = 2t−s−1, d(R(s − 1, t − 1)) = 2t−s. Since R(s, t) is linear, we just need to show that the minimum weight of a non-zero word in R(s, t) is 2t−s. Since d(R(s − 1, t − 1)) = 2t−s, R(s − 1, t − 1) certainly contains a word w of weight 2t−s.Then R(s, t) contains the word 00 . . . 0||w of weight 2t−s. Conversely, suppose we have a non-zero word x ∈ R(s, t), and write x = v||(v + w) for v ∈ R(s, t − 1), w ∈ R(s − 1, t − 1). We consider three cases: v = 00 . . . 0: In this case, x = 00 . . . 0||w, so w is non-zero. So weight(x) = weight(w) > d(R(s − 1, t − 1)) = 2t−s. v = w: In this case, x = v||00 . . . 0, so v is non-zero. And v = w ∈ R(s − 1, t − 1), so weight(x) = weight(v) > d(R(s − 1, t − 1)) = 2t−s. w , v , 00 . . . 0: SinceR(s−1, t−1) ⊂ R(s, t−1), we have w ∈ R(s, t−1), so v+w ∈ R(s, t−1). So weight(x) = weight(v)+weight(v+w) > d(R(s, t−1))+d(R(s, t−1)) = 2t−s−1+2t−s−1 = 2t−s. 6.2 Hamming codes We begin with binary Hamming codes, which are slightly easier to describe. Definition. Suppose r > 2, and let Hr,2 be an r × (2r − 1) matrix whose columns are all the different non-zero vectors of length r over F2, in any order. Define the binary Hamming code Ham(r, 2) to be the binary [2r − 1, 2r − r − 1]-code with Hr,2 as its parity-check matrix, i.e. Ham(r, 2) = { v ∈ F2r−12 ∣∣∣ Hr,2vT = 0} . 36 Coding Theory Note that Ham(r, 2) is not uniquely defined – it depends on the order you choose for the columns of Hr,2. But choosing different orders still gives equivalent codes, so we talk about the Hamming code Ham(r, 2). Now we consider q-ary Hamming codes for an arbitrary prime power q. We impose a relation on the non-zero vectors in Frq by saying that v ≡ w if and only if v = λw for some non-zero λ ∈ Fq. Lemma 6.7. ≡ is an equivalence relation. Proof. We need to check three things. • ≡ is reflexive: given a non-zero vector v, we have v = 1v, and 1 is a non-zero element of Fq, so v ≡ v. • ≡ is symmetric: suppose v ≡ w. Then v = λw for some non-zero λ ∈ Fq. Since λ , 0, it has an inverse λ−1. So w = λ−1v, so w ≡ v. • ≡ is transitive: suppose v ≡ w and w ≡ x. Then v = λw and w = µx for some non-zero λ, µ ∈ Fq. Then v = (λµ)x, so v ≡ x. Now we count the equivalence classes of non-zero vectors. Lemma 6.8. Under the equivalence relation ≡, each equivalence class contains exactly q − 1 vectors, and there are exactly qr − 1 q − 1 equivalence classes. Proof. Take v ∈ Frq \ {0}. The equivalence class containing v consists of all vectors w such that w ≡ v, i.e. w = λv for non-zero λ ∈ Fq. There are q − 1 possible choices of λ, and these give distinct words: if λ , µ and λv = µv, then we get (λ − µ)v = 0. λ − µ is non-zero, so it has an inverse. So we get v = 0, a contradiction. So there are exactly q − 1 vectors in each equivalence class. The number of equivalence classes is the number of non-zero vectors divided by the size of an equivalence class, i.e. qr − 1 q − 1 . Definition. Suppose q is a prime power and r > 0, and let N = qr − 1 q − 1 . Choose non-zero vectors v1, . . . , vN ∈ Frq such that exactly one lies in each equivalence class. Define Hr,q to be an r × N matrix with columns v1, . . . , vN, and define the q-ary Hamming code Ham(r, q) to the [N,N − r]-code over Fq with parity-check matrix Hr,q. Again, the actual code depends on the order of v1, . . . , vN (and on the choice of v1, . . . , vN), but different choices give equivalent codes. It may not be obvious how to choose the vectors v1, . . . , vN, but there is a trick: choose all non-zero vectors whose first non-zero entry is a 1. You might like to prove as an exercise that this always works, but you can use this trick without justification in the exam. Lemma 6.9. Ham(r, q) is an [N,N − r]-code over Fq, where as above N = q r − 1 q − 1 . Proof. Hr,q is an r × N matrix, and is a generator matrix for Ham(r, q)⊥, so Ham(r, q)⊥ is an [N, r]-code. So Ham(r, q) is an [N,N − r]-code, by Theorem 5.3. Some examples of linear codes 37 Now we’ll see the key property of Hamming codes. Theorem 6.10. Ham(r, q) has minimum distance 3. Proof. Since Ham(r, q) is linear, we want to show that the minimum weight of a non-zero word in Ham(r, q) is 3, i.e. that there are no words of weight 1 or 2 in Ham(r, q), but there is a word of weight 3. We have a parity-check matrix H = Hr,q, and by Lemma 5.5 a word w lies in Ham(r, q) if and only if HwT = 0. Writing v1, . . . , vN for the columns of H(r, q), and writing w as w1 . . .wN, we have HwT = w1v1 + · · · + wNvN. Ham(r, q) contains no words of weight 1 Suppose w is a word of weight 1. Letting wi be the unique non-zero symbol in w, we find that HwT = wivi. wi , 0 and vi , 0, so wivi , 0; so w < Ham(r, q). Ham(r, q) contains no words of weight 2 Suppose w is a word of weight 2, with non-zero symbols wi,w j. Then HwT = wivi + w jv j. If this is zero, we get vi = −w jw−1i v j, (n.b. w−1i exists because wi , 0) which means that vi ≡ v j. But v1, . . . , vN were chosen to be inequivalent; contradiction. So HwT is non-zero, so w < Ham(r, q). Ham(r, q) contains a word of weight 3 Let v = v1 + v2. Then v , 0, because v1 , −v2. Now we claim that v . v1. If v ≡ v1, then we have v = λv1, and this means that v2 = (λ − 1)v1, so that v2 ≡ v1; but this contradicts the choice of v1, . . . , vN. So v . v1; similarly, we find v . v2. But v must be equivalent to some v j, because v1, . . . , vN are chosen so that one lies in each equivalence class. So for some j > 3 we have v ≡ v j; this means v = λv j for some λ ∈ Fq. Now define w by w1 = 1, w2 = 1, w j = −λ, wk = 0 for all other k. Then w has weight 3, and we claim that w ∈ Ham(r, q): we have HwT = 1.v1 + 1.v2 − λ.v j = v1 + v2 − v = 0, so w ∈ Ham(r, q). Theorem 6.11. Ham(r, q) is a perfect 1-error-correcting code. 38 Coding Theory Proof. Since Ham(r, q) has minimum distance 3, it is 1-error-correcting, by Lemma 1.2. To show that it is perfect, we have to show that equality holds in the Hamming bound, i.e. ∣∣∣Ham(r, q)∣∣∣ = qN(N 0 ) + (q − 1) (N 1 ) , where N = qr − 1 q − 1 . Since Ham(r, q) is an [N,N − r]-code, the left-hand side equals qN−r by Lemma 4.5. The right-hand side equals qN 1 + (q − 1)N = qN 1 + (q − 1) qr−1q−1 = qN 1 + qr − 1 = qN−r, and the equation follows. Non-examinable material 6.3 The ternary Golay code In this section, we’ll look at one particular very special code. We begin by taking a parity- check matrix for the Hamming code Ham(2, 3): K = ( 1 0 1 1 0 1 1 2 ) Notice that KKT = 0, which means that the rows of K lie in Ham(2, 3) (in fact, K is a generator matrix for Ham(2, 3). Here’s a lemma about Ham(2, 3). Lemma 6.12. Suppose u is a non-zero word in Ham(2, 3) and x ∈ F43. Then weight(x+u)+weight(x− u) > 3. Proof. weight(x + u) + weight(x − u) = d(x + u, 0000) + d(x − u, 0000) = d(x,−u) + d(x,u) by Lemma 4.3 > d(u,−u) by the triangle inequality. Now u,−u are distinct words in Ham(2, 3), so d(u,−u) > d(Ham(2, 3)) = 3. Now let I denote the 4 × 4 identity matrix over F3, let J be the 4 × 4 matrix in which every entry is a 1. Notice that J2 = J, and hence (I+J)2 = I. Some examples of linear codes 39 Now let H be the following 6 × 12 matrix over F3: H = I I+J I+J 0 K −K Lemma 6.13. The rows of H are linearly independent. Proof. Let r1, . . . , r6 be the rows, and suppose we have a linear relation λ1r1 + · · · + λ6r6 = 000000000000 (λ1, . . . , λ6 ∈ F3). Then for i = 1, . . . , 6 in turn we can look at the ith entry of the word on the left-hand side and deduce that λi = 0. As a consequence, H is a generator matrix for a linear code. Definition. Define the extended ternary Golay code G12 to be the linear [12, 6]-code over F3 with generator matrix H. Lemma 6.14. G⊥12 = G12. Proof. We want to show that H is a generator matrix for G⊥12, i.e. a parity-check matrix for H. We’ll use Lemma 5.6. Certainly the rows of H are linearly independent and H is a (12 − 6) × 12 matrix, so we just need to check that HHT = 0: multiplying matrices in block form, we have HHT = I I+J I+J0 K −K I 0 I+J KT I+J −KT = I2 + 2(I+J)2 (I+J)KT − (I+J)KTK(I+J) − K(I+J) KKT + (−K)(−KT) which is zero because KKT = 0 and I2 = (I+J)2 = I. So H is a generator matrix for G⊥12; since H is also a generator matrix for G12, we must haveG⊥12 = G12. In particular, this result implies that v.v = 0 for any v ∈ G12. This has an important consequence. Lemma 6.15. Suppose v ∈ Fn3 . If v.v = 0, then the weight of v is divisible by 3. Proof. We have v.v = ∑n i=1 v 2 i . Since we’re working over F3, we have v 2 i = 0 if vi = 0, and vi = 1 if vi , 0. So v.v is just the weight of v mod 3. Proposition 6.16. The minimum distance of G12 is 6. Proof. Since G12 is linear, we just need to show that the minimum weight of a non-zero word is 6. Obviously there are words in G12 of weight 6, so by Lemma 6.15 we just need to show 40 Coding Theory that there are no words of weight 3. So suppose that v ∈ G12 has weight 3, and write v as λ1r1 + · · · + λ6r6. We write v as a sum of two words: v = (λ1r1 + λ2r2 + λ3r3 + λ4r4) + (λ5r5 + λ6r6) = w||x||x + 0000||u||(−u) = w||(x + u)||(x − u), where w, x ∈ F43 and u ∈ Ham(2, 3). Suppose u , 0000. Then by Lemma 6.12 we have weight(x + u) + weight(x − u) > 3, so we must have w = 0000. But w is the word λ1λ2λ3λ4, so this means that λ1 = λ2 = λ3 = λ4 = 0, and hence v = 0000||u||(−u) and so has even weight; contradiction. So we can assume u = 0000, so that v = w||x||x. Since v has weight 3, w must have odd weight, and hence has weight 1 or 3. If w has weight 1, then only one ofλ1, λ2, λ3, λ4 is non-zero, so v is just a scalar multiple of one of the rows r1, r2, r3, r4, so has weight 9; contradiction. So we can assume that w has weight 3. But then we must have x = 0000. But x = λ1λ2λ3λ4(I + J) = w(I + J), so w(I + J) = 0000. Hence w = wI = w(I + J)2 = 0000(I + J) = 0000, contradiction. Now we can define the ternary Golay code. Definition. Define the ternary Golay code G11 be the code of length 11 over F3 obtained by deleting the last symbol of each word in G12. Theorem 6.17. G11 is a perfect 2-error-correcting code. Proof. Since d(G12) = 6, we have d(G11) > 5 by the proof of the Singleton bound. So G11 is 2-error-correcting. To show that G11 is perfect, we just need to show that |G11| = 3 11 1 + (3 − 1)(111 ) + (3 − 1)2(112 ) . We have |G11| = |G12| = 36, by Lemma 4.5. The right-hand side is 311 1 + 2 × 11 + 4 × 55 = 311 243 = 311 35 = 36, as required. End of non-examinable material 6.4 Existence of codes and linear independence The proof that the Hamming codes have distance 3 relied on the following fact about their parity-check matrices: if we take a set consisting of at most 2 columns of H, then this set is linearly independent. This leads to a more general theorem on the existence of linear codes. Some examples of linear codes 41 Theorem 6.18. Suppose C is an [n, k]-code with parity-check matrix H, and d 6 n. Then C has minimum distance at least d if and only if every d − 1 columns of H are linearly independent. Proof. Let c1, . . . , cn be the columns of H. Suppose first that there are columns ci1 , . . . , cid−1 which are linearly dependent. This means there are scalars λ1, . . . , λd−1 (not all zero) such that λ1ci1 + · · · + λd−1cid−1 = 0. Let w be the word which has λ1, . . . , λd−1 in positions i1, . . . , id−1, and 0s elsewhere. Then HwT = w1c1 + · · · + wncn = λ1ci1 + · · · + λd−1cid−1 = 0, so w ∈ C by Lemma 5.5. But w is a non-zero word of weight at most d − 1, so C has minimum distance less than d. Conversely, suppose C has minimum distance less than d, and take a non-zero word w ∈ C of weight less than d. Then there are positions i1, . . . , id−1 such that wi = 0 for i , i1, . . . , id−1. Hence 0 = HwT = w1c1 + · · · + wncn = wi1ci1 + · · · + wid−1cid−1 , so ci1 , . . . , cid−1 are linearly dependent. Corollary 6.19. Suppose d, k 6 n. A linear [n, k, d]-code exists if and only if there is a sequence of n vectors in Fn−kq such that every d − 1 of them are linearly independent. Proof. Suppose C is such a code, and let H be a parity-check matrix for C. Then the columns of H are vectors in Fn−kq , and by Theorem 6.18 every d − 1 of them are linearly independent. Conversely, suppose c1, . . . , cn are vectors in Fn−kq such that every d − 1 of them are linearly independent. Let H be the matrix with clumns c1, . . . , cn, and letC be the code with parity-check matrix H. Then C is a linear [n, k]-code, and by Theorem 6.18 C has minimum distance at least d. Notice that we say ‘sequence’ rather than ‘set’, since two columns of a matrix might be the same. However, if there are two equal columns, then they are linearly dependent, and so the minimum distance of our code would be at most 2. 6.5 MDS codes We begin by recalling the Singleton bound, and applying it to linear codes. Theorem 6.20 (Singleton bound for linear codes). If C is a linear [n, k, d]-code over Fq, then d 6 n − k + 1. 42 Coding Theory Proof. C is a q-ary (n,M, d)-code, where M = qk by Lemma 4.5. So qk = M 6 Aq(n, d) 6 qn−d+1 by Theorem 2.2(3). Hence k 6 n + 1 − d, so d 6 n − k + 1. The aim of this section is to look at codes which give equality in this bound. Definition. Suppose C is a linear [n, k]-code. The number r = n− k is called the redundancy of C. What Theorem 6.20 says is that if C is a linear code of redundancy r, then the minimum distance of C is at most r + 1. Definition. A maximum distance separable code (or MDS code) of length n and redundancy r is a linear [n,n − r, r + 1]-code. Theorem 6.18 says that an MDS code of length n and redundancy r exists if and only if we can find a sequence of n vectors in Frq of which any r are linearly independent. Clearly if we can do this for a given value of n, then we can do it for any smaller value of n, just by deleting some of the vectors. This implies that for each r (and q) there is a maximum n (possibly infinite) for which an MDS code of length n and redundancy r exists. Definition. Suppose r > 0. We define max(r, q) to be the largest n such that an MDS code over Fq of length n and redundancy r exists. The main question about MDS codes is to find the values of max(r, q). Note that max(r, q) may be infinite, even though there are only finitely many vectors in Fnq , because we are allowed repeats in our sequence of vectors. Note, though, that if we have a repeated vector in our sequence, then the code cannot possibly have minimum distance more than 2. Here is our main theorem on the values max(r, q). Theorem 6.21. 1. max(1, q) = ∞. 2. If r > q, then max(r, q) = r + 1. 3. If 2 6 r 6 q, then max(r, q) > q + 1. Proof of Theorem 6.21(1). We must show that for for any n and q, there exists an MDS code of length n and redundancy 1. Using Theorem 6.18, we just need to construct a 1 × n matrix such that any one column is linearly independent. Saying that one vector is linearly independent just means that it is non-zero, so we just need to find a 1 × n matrix in which every entry is non-zero; we could take the matrix ( 1 1 . . . 1 ) . Proof of Theorem 6.21(2). To show that max(r, q) > r + 1, we need to show that an MDS code of length r + 1 and redundancy r exists. But this is an [r + 1, 1, r + 1]-code, and the repetition code is such a code. To show that max(r, q) 6 r + 1, we have to show that we can’t find an MDS code of length r + 2 and redundancy r, i.e. an [r + 2, 2, r + 1]-code. Let’s suppose that we can find such a code C, and take a generator matrix. Applying MO1–5, we put this generator matrix in standard Some examples of linear codes 43 form; this gives a generator matrix for a codeD which is equivalent to C and therefore is also an [r + 2, 2, r + 1]-code. Let’s write our generator matrix as G = ( 1 0 0 1 B ) , where B is a 2 × r matrix over Fq. The rows of G are non-zero words in D and d(D) = r + 1, so the rows must both have weight at least r + 1; hence all the entries of B are non-zero. This means that we can define µi = b1i b2i for i = 1, . . . , r, and µi is a non-zero element of Fq. There are only q − 1 < r non-zero elements in Fq, so there must be i , j such that µi = µ j. Now define x = (row 1 of G) − µi(row 2 of G). Then x ∈ D and x , 0 because the rows of G are linearly independent, so x has weight at least r + 1. But xi+2 = b1i − µib2i = b1i − b1ib2i b2i = 0 and x j = b1 j − µib2 j = b1 j − µ jb2 j = 0, so x contains at least two zeroes, so has weight at most r; contradiction. For the proof of Theorem 6.21(3), we need to look at determinants. We use the following fact: if A is an r × r matrix, then the columns of A are linearly independent if and only if det(A) , 0. We need to look at a particular type of determinant, called a ‘Vandermonde determinant’. Proposition 6.22. Suppose x1, . . . , xr are distinct elements of a field F. Then the determinant det 1 1 . . . 1 x1 x2 . . . xr x21 x 2 2 . . . x 2 r ... ... ... xr−11 x r−1 2 . . . x r−1 r is non-zero. Proof. Not given. In fact, the determinant equals ∏ i< j(x j − xi), and this is non-zero, because each x j − xi is non-zero. Now we can prove the last part of the theorem. 44 Coding Theory Proof of Theorem 6.21(3). Assume 2 6 r 6 q + 1. We need to show that there is an MDS code of length q + 1 and redundancy r, i.e. a [q + 1, q + 1 − r, r + 1]-code. Label the elements of Fq as λ1, . . . , λq in some order, and let H = 1 1 . . . 1 0 λ1 λ2 . . . λq 0 λ21 λ 2 2 . . . λ 2 q 0 ... ... ... ... λr−21 λ r−2 2 . . . λ r−2 q 0 λr−11 λ r−1 2 . . . λ r−1 q 1 . Let C be the code with H as its parity-check matrix. Then C is a [q + 1, q + 1 − r], code, and we claim that C has minimum distance at least r + 1. Recall from Theorem 6.18 that this happens if and only if any r columns of H are linearly independent. H has r rows, and so any r columns together will form a square matrix, and we can check whether the columns are linearly independent by evaluating the determinant. So choose r columns c1, . . . , cr of H, and let A be the matrix formed by them. If the columns c1, . . . , cr do not include the last column of H, then A has the form 1 1 . . . 1 x1 x2 . . . xr x21 x 2 2 . . . x 2 r ... ... ... xr−11 x r−1 2 . . . x r−1 r for some distinct x1, . . . , xr, and so det(A) , 0 by Proposition 6.22. If c1, . . . , cr do include the last column of H, then we have A = 1 1 . . . 1 0 x1 x2 . . . xr−1 0 x21 x 2 2 . . . x 2 r−1 0 ... ... ... ... xr−21 x r−2 2 . . . x r−2 r−1 0 xr−11 x r−1 2 . . . x r−1 r−1 1 for some distinct x1, . . . , xr−1. Let A′ be the matrix formed by the first r − 1 rows and the first r − 1 columns. Then det(A′) , 0 by Proposition 6.22, and so the columns c′1, . . . , c′r−1 of A′ are linearly independent. To show that c1, . . . , cr are linearly independent, suppose we have µ1c1 + · · · + µrcr = 0, (∗) where µ1, . . . , µr ∈ Fq; we must show that µ1 = · · · = µr = 0. If we take all the vectors in (∗) and delete the last entry from each one, we get µ1c′1 + · · · + µr−1c′r−1 + µr0 = 0. Since c′1, . . . , c ′ r−1 are linearly independent, this gives µ1 = . . . µr−1 = 0. Hence we have µrcr = 0; since cr , 0 we get µr = 0 and we are done. Index (n,M, d)-code, 3, 19 Aq(n, d), 6 Hr,q, 36 trick for constructing, 36 [n, k, d]-code, 19 [n, k]-code, 19 C⊥, 28 dimension of, 29 G11, 40 G12 minimum distance of, 39 Fn, 17 Fq, 18 ≡, 36 Ham(r, q), 36 dimension of, 36 minimum distance of, 37 bxc, 9 G12, 39 max(r, q), 42 R(r,n), 33 dimension of, 34 minimum distance of, 35 q-ary alphabet, 2 code, 2 alphabet, 2 basis, 16 binary, 2 binary Hamming code, 35 capacity, 14 code, 2 coset, 25, 31 leader, 26 decoding process, 13, 26, 32 dimension, 16 distance, 2 dot product, 28 dual code, 28 dimension of, 29 equivalent codes, 3, 19 equivalent linear codes, 22 error-correcting, 2, 8 error-detecting, 2 extended ternary Golay code, 39 minimum distance of, 39 field, 15 finite fields, 18 generator matrix, 19, 28 Hamming bound, 8 Hamming code, 36 dimension of, 36 minimum distance of, 37 Hamming space, 2 leader, 26 linear code, 18 linear map, 16 defined by a matrix, 17 main coding theory problem, 6 matrix operations, 22, 24 maximum distance separable code, 42 MDS code, 42 minimum distance, 3 and linear independence, 40 of a linear code, 18 nearest-neighbour decoding process, 13, 26, 32 parity-check matrix, 29 perfect code, 9, 37, 40 Plotkin bound, 9 Rank–nullity Theorem, 17, 29 rate, 14 redundancy, 42 Reed–Muller code, 33 dimension of, 34 minimum distance of, 35 repetition code, 3, 6 Shannon’s Theorem, 14 45 46 Coding Theory Singleton bound, 6 for linear codes, 41 Slepian array, 26 quick method, 27 sphere, 8 standard form, 24, 31 subspace, 16 subspace test, 16 symbol error probability, 13 syndrome, 31 syndrome decoding, 31 syndrome look-up table, 32 quick method, 32 ternary, 2 ternary Golay code, 40 triangle inequality, 2 Vandermonde determinant, 43 vector space, 15 weight, 18 word, 2 word error probability, 13
欢迎咨询51作业君