辅导案例-COMP2610/COMP6261-Assignment 2
COMP2610/COMP6261 - Information Theory Assignment 2 Robert C. Williamson Released: 19th September 2019. Due: 10am, 21st October 2019. Read these instructions carefully! Instructions You are to design a question for a hypothetical future student of this information theory course in order to test their knowledge, skill and understanding of aspects of source coding that we have studied (by the end of lecture 16, on 24th september). COMP2610 students should design a question suitable for a future COMP2610 student; COMP6261 students should design one for future COMP6261 students. You need to submit three things: 1. Your question, which should comprise several severable parts, and needs to be a direct question about the content we have learned regarding source coding You can not ask a tricksy meta-question like I have done!. That is, your question should look something like an exam question (or questions) from previous years, or questions from the assignment from a previous year that I have appended. You should considering including components testing the understanding of the source coding theorem, what it says and does not say; questions requiring calculation of (say) a Huffman code and an arithmetic code for particular ensembles; and questions that probe the student’s understanding of the material (“why” things work they way they do). The bulk of your question should focus upon calculating some actual codes, and then answering questions about them (rather than requiring proofs of theorems). Your questions should including a marking guide for the student. 2. You also need to submit an exemplary answer to your questions, written in a manner that is as clear as possible and which fully answers the question 1 you posed. Please position answers to subquestions immediately following the sub-question. 3. Finally, you need to submit a rationale as to why you designed your question the way you did, and exactly what it is you are hoping the future student would learn from both attempting the question, and from your exemplary answer. You need to seperately and explicitly answer what knowledge, skills and understanding you are testing / teaching in setting your question. Thus the format of your submission should be Initial instructions, and situation set-up Subquestion 1 Answer to subquestion 1 Subquestion 2 Answer to subquestion 2 ... Subquestion n Answer to subquestion n Rationale You can either type or handwrite your submission, but if handwritten, it needs to be very clearly legible and neat, and the scan that you submit needs to be clear and sharp. I impose no restrictions on the details of what you ask (other than it tests knowledge, skills and understanding re source coding). You are at liberty to require the student to use software you provide, but I encourage that you focus on problems that can be answered with just pencil and paper. Resist the temptation tomake your question really complex, long and hard! The goal is to design something that tests the essence of what we have learned about source coding, and is tractable for an average student. You should largely stick to material covered in lectures. If you want to go beyond what was already covered, you need to take considerable care in walking the student through anything new. I strongly encourage sticking almost entirely to what was done in lectures! Your are at liberty to look at previous exams, the example I have appended, questions in the textbook or elsewhere, for inspiration of suitable questions to ask. But you must explicitly cite any sources you use. Your solution, and your rationale are significant aspects of the assignment, somerely finding a nice question elsewhere will not do the assignment for you. . . 2 It is not necessary to try and test every single thing we have covered re source coding. Indeed, any attempt to do so would be doomed (the poor hypothetical student would not have time). I am wanting you to reflect on the most important aspects of what we have learned, and turn that into an assignment for future students. What you choose to ask, is a key part of this assignment. Assessment This assignment is worth 15% of your overall grade. The assignment is perhaps unusual relative to other CS assignments because of its open-ended nature. It is deliberately designed this way to encourage you to reflect upon what you have learned, and to practice explaining the key concepts. (The final exam will give you plenty of straight-forward non-tricksy questions!) Since it is open-ended, I am not allocating numericalmarks in the usual fashion. The clarity of yourwritingwill play a role in the assessment for this assignment. You need to pay attention to the precision of your statements, lack of ambiguity, and correctness (of course). The mere length of your overall submission is not a major positive factor, and indeed concision and elegance will be applauded. Neither is the sheer number of sub-questions you pose. Your question should be designed such that an average student can solve it in 10 hours. (I know full well how hard this is to estimate). Your entire investment in this assignment should be less than 20 hours (ideally around 15). It is much more important that you have a highly polished submission with detailed rationale. In grading your submission, I will be taking account of its correctness, clarity, concision, elegance, inspiration, tractability, insight, and what the student would be expected to learn. I will take account of the degree to which it tests all three of knowledge, skills and understanding. To be clear, if you merely pose a trivial question which is answerable in a few minutes, that is not sufficient. Neither is it sufficient to reproduce a question you have found verbatim. I want you to design your own question. I explict want you to think hard about what is important in source coding, and craft a question that captures that in an elegant and concise manner (because this will help you learn that essence!). Your grade for this assignment will be according to the following categorical scale. Fail (5, 15, 25, 35 or 45% depending upon degree): There are important flaws in your submission. Perhaps you did not provide any rationale at all. Perhaps there are egregious ambiguities in the question, or your answer is signifi- cantly flawed. Your explanations are confusing and hard to follow. 3 Pass (55%): Your question is not flawed in any fatal way, but may well have some minor to medium severity bugs. It tests some non-trivial aspect of source coding in a manner that would take a student a fair time to solve. Your answer is largely correct. And you have provided some sort of rationale. Your explanations and writing are ok, but not very elegant or concise. The question is not really inspiring. It may test one of knowledge, skills and understanding, but does not do a good job testing all three. It is perhaps largely requiring a calculation. It is likely a minor variation of an existing question. Credit (65%): Your question and solution has only minor flaws. The writing is clear, but perhaps not particularly elegant. The question is straight- forward but not boring, and tests more than one of knowledge, skills and understanding. There is an element of interestingness to the question. The rationale is moderately clear. Distinction (75%): Neither your question nor answer has any detectable flaws. It is well written, and your rationale shows how you test all three of knowledge, skills and understanding. There is clearly a creative aspect to your question. It is elegant, concise, and inspiring. Your rationale is very clear. High Distinction (85, 90, 95, or 100% depending upon degree) A very well exe- cuted elegant question, solution and rationale, that demonstrates creativity, is exciting and inspiring and has no errors. The rationale makes it very clear what the students would learn. One might hope that students doing your question would thank you for a stimulating and fun experience. You articulate very well in your rationale how it tests all three of knowledge, skills and understanding. There are no flaws in the question, no ambiguities, it is tractable, and your answer is exemplary. For the highest marks, it is inspirational and teaches the students something they would not have learned from perfect attention to the lectures. Hints and Starting Points In case the open-ended nature of this assignment is daunting, here are some hints / starting points. Start with a couple of straight-forward questions that require the student to determine a source code for a given ensemble. Say a Huffman code for a fixed ensemble, and a simple example of arithmetic coding. Then add questions that test the understanding and insight regarding what they have done. Build some connection to the theory we have learned (so it is not mere cranking an algorithmic 4 handle). Construct the parts of the questions, your answers, and the rationale simultaneously (that is don’t put off the latter till the end). You need to ensure your question is tractable. Find some problem in the text, past exams or elsewhere, and simply modify some of the particulars (so that the future student can’t copy an existing answer verbatim). Make sure you explicitly cite what you have based your question on, and what you have changed. Keep iterating, adding additional subquestions until your answers are of a length and complexity that you judge could be answered in 10 hours or so. Then double and triple check everything for correctness. Polish the language. Explicitly check for vagueness and ambiguity and typos. I apppend an assignment from a previous year, with solutions. Focus on just questions 1-4 (note that the overall assignment was worth 20%). And of course, this example does not have a written rationale. If you are not sure where to start, take the questions there, and just tweak some of the constants. In order to get a high grade (D/HD) you would need to show more creativity. I stress that the total length of the questions you write should be substantially less than the attached assignment since you have to design the question, provide the rationale, as well as providing the answer. 5 COMP2610/COMP6261 - Information Theory: Assignment 3 Sample Solutions Australian National University Lecturers: Young Lee and Aditya Menon COMP2610/6261 Shared Questions 1. (20 pts) LetX be an ensemblewithAX = {x1,x2,x3} and probabilitiespX = (1/4, 1/3, 5/12). (a) (2 pt) Compute the cumulative distribution function F (xi ) for each outcome xi . (b) (2 pt) Compute the modified distribution function F¯ (xi ) for each outcome xi . (c) (2 pt) Compute the codeword lengths `(xi ) for each outcome xi . (d) (3 pt) Compute the (binary) symbol intervals [F (xi−1), F (xi ))2 for each outcome xi , assuming that F (x0) = 0. (e) (4 pt) Construct a Shannon-Fano-Elias code for X . (f) (3 pt) Compute the (binary) codeword intervals [bF¯ (xi )c`(xi ), bF¯ (xi )c`(xi ) + 2−`(xi ) )2 for each outcome xi . (g) (4 pt) Assuming the above code, decode the sequence 011001110, showing all your steps. (a) The cumulative distribution function is: F (1) = 1/4 = 0.012 F (2) = 1/4 + 1/3 = 7 12 = 0.10012 F (3) = 1/4 + 1/3 + 5/12 = 1 = 1.02. (b) The modified cumulative distribution function is: F¯ (1) = (1/2) (1/4) = 1/8 = 0.0012 F¯ (2) = 1/4 + (1/2) (1/3) = 5/12 = 0.011012 F¯ (3) = 1/4 + 1/3 + (1/2) (5/12) = 19/24 = 0.110012. (c) The code lengths are given by `(x ) = ⌈ log2 1 pi ⌉ + 1 so that `(1) = ⌈ log2 4 ⌉ + 1 = 3 `(2) = ⌈ log2 3 ⌉ + 1 = 3 `(3) = ⌈ log2 12/5 ⌉ + 1 = 3. 1 (d) From (a), the symbol intervals are: [0, 0.01)2, [0.01, 0.10012), [0.10012, 1)2, or [0, 1/4)10, [1/4, 7 12 )10, [ 7 12 , 1)10. (e) From (b) and (c), the code is C = {001, 011, 110}, as we just extract the first three bits in the binary expansion of F¯ (x ). (f) From (b), the codeword intervals are [0.001, 0.010)2, [0.011, 0.100)2, [0.110, 0.111)2, where in each casewe added a single bit of 1 to the final position of the corresponding left endpoint of the interval. (g) The decoding of 011001110 is x2 x1 x3. Proceeding step by step: • After the first bit, we have interval [0.0, 0.1)2. This overlaps with the intervals for x1 and x2, so we continue. • After the next bit, we have interval [0.01, 0.10)2. This is within the interval for x2. So, the first symbol must be x2. • We can skip the next bit since we know the length of representation for x2 is 3 bits. • After the next bit, we have interval [0.0, 0.1)2. This overlaps with the intervals for x1 and x2, so we continue. • After the next bit, we have interval [0.00, 0.01)2. This is exactly the interval for x1. So, the second symbol must be x1. • We can skip the next bit since we know the length of representation for x1 is 3 bits. • After the next bit, we have interval [0.1, 1.0)2. This overlaps with the intervals for x2 and x3, so we continue. • After the next bit, we have interval [0.11, 1.0)2. This is within the interval for x3. So, the first symbol must be x3. 2. (5 pts) In the Shannon-Fano-Elias code, one codes an outcome xi using the first `(xi ) = dlog2 1/p (xi )e + 1 bits of the symbol interval mid-point F¯ (xi ). In a Shannon-Fano-Elijah code, one codes an outcome xi using the first `(xi ) bits of the symbol interval left-point F (xi−1), where F (x0) = 0. Will the Shannon-Fano-Elijah code be prefix-free in general? If yes, explain why. If no, give an example of an ensemble where the code is not prefix-free. No, it is not prefix-free in general. Example: Consider an ensemble X with just two outcomes x1,x2. Let p (X = x1) = 1/8 and p (X = x2) = 78 . The lengths of codewords are `(x1) = log2 8 + 1 = 4 and `(x2) = dlog2(8/7)e + 1 = 2. The cumulative distribution function is F (x0) = 0 = 0.02, F (x1) = 1/8 = 0.0012, F (x2) = 1 = 0.12. The symbol intervals are [F (x0), F (x1)) = [0.0, 0.001)2 and [F (x1), F (x2)) = [0.001, 0.1)2. Since we use the left endpoints to code, we will thus have the encoding {0000, 00}. The result is not prefix-free. Note that in this case, bF (x1)c`(x2) = 002, which lies outside the interval [F (x1), F (x2)). Thus we can no longer guarantee non-overlapping codeword intervals. The fact that the truncated representation falls outside the interval is intuitive: by truncating the left endpoint, we will invariably reduce its value, thus pushing us outside the interval. 2 3. (20 pts) Let X be an ensemble with AX = {a, b, c,}. (a) (15 pt) Show the output of arithmetic coding for the sequence cab, where the probabilities at every iteration are adapted using the Dirichlet model from lectures. Assume an initial set of virtual counts ma = mb = mc = 1, and a constant end-of- stream probability p () = 0.125 at every iteration. (b) (5 pt) Suppose one codes the string a using arithmetic coding with two settings: the first uses the virtual countsma = mb = mc = 1, and the second uses the virtual countsma = 100,mb = mc = 1. In which of the two settings will the codeword for a be longer? Justify your answer. (You do not need to compute the actual encoding in either setting.) (a) We proceed as follows. • We know p () = 0.125 = 1/8. From the virtual counts, since at this stage we have not observed anything, we have probabilities for a, b, c via p (·|ϵ ) = 0 + 1 0 + 3 · (1 − p ()) = (1/3) · (7 8 ) = 7 24 ≈ 0.2917. So, we start off with the intervals[ 0.0, 7 24 ) , [ 7 24 , 14 24 ) , [ 14 24 , 21 24 ) , [ 21 24 , 1.0 ) = [ 0.0, 7 24 ) , [ 7 24 , 7 12 ) , [ 7 12 , 7 8 ) , [ 7 8 , 1.0 ) = [0.0000, 0.2917 ) , [0.2917, 0.5833 ) , [0.5833, 0.8750 ) , [0.8750, 1.0000 ) • The first symbol is a c. So, we slice up the interval [ 7 12 , 7 8 ) = [0.5833, 0.8750 ). From the virtual counts, we have probabilities for a, b, c via p (·|c) = (0 + 1 1 + 3 , 0 + 1 1 + 3 , 1 + 1 1 + 3 ) · (1 − p ()) = (1/4, 1/4, 1/2) · 7 8 = ( 7 32 , 7 32 , 7 16 ) = (0.21875, 0.21875, 0.4375). All probabilities must be scaled by the length of the interval, 724 = 0.8750 − 0.5833 ≈ 0.2917. This means we have increments( 49 32 · 24 , 49 32 · 24 , 49 16 · 24 , 7 24 · 8 ) ≈ (0.0638, 0.0638, 0.1276, 0.0365). 3 So, we have the intervals [0.5833, 0.6471 ) , [0.6471, 0.7109 ) , [0.7109, 0.8385 ) , [0.8385, 0.8750 ) . • The next symbol is a a. So, we slice up the interval [ 7 12 , 7 12 + 49 24·32 ) ≈ [0.5833, 0.6471). From the virtual counts, we have probabilities for a, b, c via p (·|ca) = (1 + 1 2 + 3 , 0 + 1 2 + 3 , 1 + 1 2 + 3 ) · (1 − p ()) = (2/5, 1/5, 2/5) · 7 8 = ( 7 20 , 7 40 , 7 20 ) = (0.35, 0.175, 0.35). All probabilities must be scaled by the length of the interval, 4924·32 ≈ 0.0638. This means we have increments( 343 32 · 24 · 20 , 343 32 · 24 · 40 , 343 16 · 24 · 20 , 49 24 · 32 · 8 ) ≈ (0.0223, 0.0111, 0.0223, 0.0079). So, we have the intervals [0.5833, 0.6057 ) , [0.6057, 0.6168 ) , [0.6168, 0.6392 ) , [0.6392, 0.6471 ) . • The next symbol is a b. So, we slice up the interval [0.6057, 0.6168). From the virtual counts, we have probabilities for a, b, c via p (·|cab) = (1 + 1 3 + 3 , 1 + 1 3 + 3 , 1 + 1 3 + 3 ) · (1 − p ()) = (1/3, 1/3, 1/3) · 7 8 = ( 7 24 , 7 24 , 7 24 ) ≈ (0.2917, 0.2917, 0.2917). All probabilities must be scaled by the length of the interval, 34332·24·40 ≈ 0.0111. This means we have increments( 2401 32 · 242 · 40 , 2401 32 · 242 · 40 , 2401 32 · 242 · 40 , 343 32 · 24 · 40 · 8 ) ≈ (0.0032, 0.0032, 0.0032, 0.0013). So, we have the intervals [0.6057, 0.6089 ) , [0.6089, 0.6122 ) , [0.6122, 0.6154 ) , [0.6154, 0.6168 ) . This is the end of the stream. So, we end up with the last interval, [0.6154, 0.6168). The final number that we choose for coding is the interval midpoint, 0.61613159. 4 The probability of the sequence is p (cab) = p (c|ϵ ) · p (a|c) · p (b|ca) · p (|cab) = p (c|ϵ ) · p (a|c) · p (b|ca) · p () = 7 24 · 7 32 · 7 40 · 1 8 ≈ 0.00139. We have codeword length `(cab) = d− log2 0.00139e + 1 = 11. So, the final code is 10011101101. (b) The code in the second setting will be shorter. Reason: Since we actually observe an a in the string to encode, we will use the subinterval reserved for this symbol. This subinterval will have length proportional to − log2 p (a|ϵ ). In the second setting, we consider a to be a-priori much more likely than in the first setting, since we assign many more “fake” occurrences of it in the virtual countsm. So, − log2 p (a|ϵ ) will be much smaller in the second setting, since p (a|ϵ ) will be much closer to 1. Since the codeword length is proportional to the interval length, we can expect that the second setting will lead to a shorter codeword. 4. (6 pts) Let X be an ensemble over outcomes AX = {a, b,}, with XN the corresponding extended ensemble for some integerN . Suppose one codes a sequencex = x1x2 . . . xN−1 from XN using arithmetic coding with fixed ensemble probabilities pX at every iteration. (a) (3 pt) Suppose x has na,nb occurrences of a, b respectively. Provide an expression for the length of the codeword for x , expressing your answer in terms of na,nb, as well as the probabilities pX of the individual symbols. (b) (3 pt) Suppose now one uses arithmetic coding with adaptive probabilities using the Dirichlet model with virtual counts ma = mb = 1. Will the resulting codeword be identical to that when using fixed ensemble probabilities? Explain why or why not. (a) In general, the length is d− logp (x1x2 . . . xN)e + 1. Since we know that the proba- bilities stay the same at every iteration, p (x1x2 . . . xN ) = p (a) na · p (b)nb · p ()n . Thus, the length of the sequence is d−na · logp (a) − nb · logp (b) − logp ()e + 1 (b) No, in general the two will give different results. Reason: with adaptive probabilities, the interval sizes will be different depending on the particular sequence that we observe. With fixed probabilities, however, the interval sizes stay the same at every single iteration. While we expect the two to be increasingly similar for large sequences, it is thus not hard to construct cases where the two give different answers. 5 5. (20 pts) Suppose we wish to compress the string aoxomoxoa. (a) (4 pt) Suppose we code the string using LZ77 with a window sizeW = 1. What is the output at each step of the algorithm? Show all your working. (b) (8 pt) Suppose we code the string using LZ77 with a window sizeW = 3. What is the output at each step of the algorithm? Show all your working. (c) (8 pt) Suppose we code the string using LZ78. What is the output at each step of the algorithm? Show all your working. (a) The final output will be: (0,a),(0,o),(0,x),(0,o),(0,m),(0,o),(0,x),(0,o),(0,a)/ This is because with a window size of 1, we will only be able to have a pointer output when we have two successive characters the same. (b) The final output will be: (0,a),(0,o),(0,x),(1,2,1),(0,m),(1,2,1),(0,x),(1,2,1),(0,a). The full processing is as per Table 1. Processed Unprocessed Output aoxomoxoa (0, a) a oxomoxoa (0, o) ao xomoxoa (0, x) aox omoxoa (1, 2, 1) oxo moxoa (0, m) xom oxoa (1, 2, 1) omo xoa (0, x) mox oa (1, 2, 1) oxo a (0, a) Table 1: LZ77 output. (c) The final output will be: (0,a),(0,o),(0,x),(2,m),(2,x),(2,a). The full processing is as per Figure 1. The numbering of the nodes indicates the iteration at which the node is created. Alternately, in tabular form, we have the output as per Table 2. 6. (9 pts) For each of the following channels, explain whether or not they are symmetric. (a) (2 pt) Q = 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 . 6 01 a 2 4 m 5 x 6 a o 3 x Figure 1: LZ78 output. Processing Output aoxomoxoa (0, a) oxomoxoa (0, o) xomoxoa (0, x) omoxoa (2, m) oxoa (2, x) oa (2, a) Table 2: LZ78 output. (b) (3 pt) Q = 0.7 0.1 0.2 0.2 0.1 0.7 . (c) (3 pt) Q = 0.7 0.2 0.2 0.1 0.1 0.7 . (a) Yes, this is trivially symmetric as-is: every row and column are permutations of one another. (b) Yes, this is symmetric if we partition the matrix into rows {1, 3} and {2}. This is because the matrix [ 0.7 0.1 0.1 0.7 ] has every row and column permutation of one another; and the matrix [ 0.2 0.2 ] is trivially symmetric. (c) No, there is no feasible partition which results in every row and column being a permutation of one another. 7. (18 pts) Consider a channel over inputs {1, 2} and outputs {1, 2, 3}, with transition matrix Q = 0 1/2 1/4 0 3/4 1/2 . 7 Let X ,Y be random variables over the inputs and outputs respectively. Suppose X has probabilities (θ , 1 − θ ) for some θ ∈ [0, 1]. (a) (3 pt) Compute the distribution p (Y ), expressing your answer in terms of θ . (b) (3 pt) Compute the entropy H (Y ), expressing your answer in terms of θ . (c) (3 pt) Compute the conditional entropy H (Y |X ), expressing your answer in terms of θ . (d) (3 pt) Compute the mutual information I (X ;Y ), expressing your answer in terms of θ . (e) (5 pt) Compute the channel capacity C of Q . Provide the choice of θ that attains this capacity. (f) (2 pt) Is it possible to construct a block code that achieves a rate of 2C bits per transmission over Q , and arbitrarily small probability of bit error? (g) (2 pt) Is it possible to construct a block code that achieves a rate of 2C bits per transmission over Q , and 49% probability of bit error? (a) We have p (Y = 1) = p (X = 1) · p (Y = 1|X = 1) + p (X = 2) · p (Y = 1|X = 2) = 0 + (1 − θ ) · (1/2) = 1 − θ 2 p (Y = 2) = p (X = 1) · p (Y = 2|X = 1) + p (X = 2) · p (Y = 2|X = 2) = θ · (1/4) + 0 = θ 4 p (Y = 3) = p (X = 1) · p (Y = 3|X = 1) + p (X = 2) · p (Y = 3|X = 2) = θ · (3/4) + (1 − θ ) · (1/2) = (3/4)θ + (1 − θ )/2 = 2 + θ 4 (b) From (a), we have H (Y ) = − ∑ y p (Y = y) · log2 p (Y = y) = −1 − θ 2 · log2 1 − θ 2 − θ 4 log2 θ 4 − 2 + θ 4 · log2 2 + θ 4 = −1 − θ 2 log2(1 − θ ) + 1 − θ 2 − θ 4 log2 θ + θ 2 − 2 + θ 4 log2(2 + θ ) + 2 + θ 2 = −1 − θ 2 log2(1 − θ ) − θ 4 log2 θ − 2 + θ 4 log2(2 + θ ) + 3 + θ 2 8 (c) Suppose H2(·) denotes the binary entropy function. Due to the zero entries in each column, we have H (Y |X = 1) = H2(3/4) H (Y |X = 2) = H2(1/2) = 1. So, H (Y |X ) = p (X = 1) · H (Y |X = 1) + p (X = 2) · H (Y |X = 2) = θ · H2(3/4) + (1 − θ ) = θ · (H2(3/4) − 1) + 1. (d) By definition, I (X ;Y ) = H (Y ) − H (Y |X ) = −1 − θ 2 log2(1 − θ ) − θ 4 log2 θ − 2 + θ 4 log2(2 + θ ) + 3 + θ 2 − θ · (H2(3/4) − 1) − 1 = −1 − θ 2 log2(1 − θ ) − θ 4 log2 θ − 2 + θ 4 log2(2 + θ )+ θ · (3 2 − H2(3/4) ) + 1 2 . (e) We need to find the θ which maximises I (X ;Y ). This can be done with a calculator or computer. Algebraically, recalling that f (x ) = x · log2 x has derivative log2 x + 1log 2 , we have dI (X ;Y ) dθ = 1 2 · log2(1 − θ ) − 1 4 · log2 θ − 1 4 · log2(2 + θ ) + 3 2 − H2(3/4) = 1 4 · log2(1 − θ )2 − 1 4 · log2 θ − 1 4 · log2(2 + θ ) + 3 2 − H2(3/4) = 1 4 · log2 (1 − θ )2 θ · (2 + θ ) + 3 2 − H2(3/4). 9 This equals zero when (1 − θ )2 θ · (2 + θ ) = c0 ⇐⇒ 1 + θ 2 − 2 · θ θ2 + 2 · θ = c0 ⇐⇒ 1 + θ2 − 2 · θ = c0 · (θ2 + 2 · θ ) ⇐⇒ 1 + θ2 − 2 · θ = c0 · (θ2 + 2 · θ ) ⇐⇒ (1 − c0) · θ2 − 2 · (1 + c0)θ + 1 = 0 ⇐⇒ θ = 2 · (1 + c0) ± √ 4 · (1 + c0)2 − 4 · (1 − c0) 2 · (1 − c0) ⇐⇒ θ = 2 · (1 + c0) ± √ 4 + 4c20 + 8c0 − 4 + 4c0 2 · (1 − c0) ⇐⇒ θ = 2 · (1 + c0) ± √ 4c20 + 12c0 2 · (1 − c0) ⇐⇒ θ = (1 + c0) ± √ c20 + 3c0 1 − c0 , for c0 = 24·(H2 (3/4)−3/2). Since θ ∈ [0, 1], we pick the solution θ = (1 + c0) − √ c20 + 3c0 1 − c0 ≈ 0.5461. The resulting channel capacity is computed by plugging in this value into part (d). We find that the capacity is C ≈ 0.3957. (f) No, it’s not possible. By the noisy channel coding theorem, we can’t attain rates above the capacity while having arbitrarily small error probability. (g) Yes, it’s possible. By the noisy channel coding theorem part (b), for a bit error probability of pb = 0.4, we can attain rates below C/(1 − H2(0.4)). We can easily check that C1−H2 (0.4) > 2C. So, we can design such a code. COMP6261 Extension Questions This section is for COMP6261 students only. There is no bonus awarded to COMP2610 students attempting this question. 8. (5 pts) Given some binary string B (e.g. B = 0100), recall that [0.B, 0.B1)2 is its codeword interval. (a) (2 pt) Are there binary intervals [a,b)2 ⊆ [0, 1)2 which are not codeword intervals for any binary string B? If yes, give an example of such an interval, and justify your answer. If no, explain why not. 10 (b) (4 pt) Let B,B′ be two distinct binary strings, and suppose that 0.B2 < 0.B′2. Is it possible that 0.B′2 < 0.B12 < 0.B ′12? If yes, give an example of strings B,B′ for which this is so. If no, explain why not. Hint: It may be useful to recall the intuitive meaning of the codeword interval. (a) Yes, this is possible. Example: [0.0, 0.11)2. This can’t be the codeword interval for any string, since any candidate B has to comprise only zeros; and for such a B, the right endpoint has to be either 0.1, 0.01, and so on. (b) No, this is not possible. Reason: The codeword interval IB = [0.B, 0.B1)2 corresponds to all strings that start with B, i.e.all strings for which B is a prefix. If 0.B′2 < 0.B12, then B ′ ∈ IB. Consequently, B is a prefix of B′. If further 0.B12 < 0.B′12, then there are strings x such that x ∈ IB′ but x < IB, i.e.there are strings that start with B′ but not B. This is however not possible: if B is a prefix of B′, then every string that starts with B′ by definition also starts with B. Hence, the given condition cannot hold. 9. (5 pts) You have a noisy channel Q that takes as input a symbol from X, and outputs a symbol fromY. A scientist offers you a function f (y) that produces a new channelQ′ as follows: for any input x ∈ X, one uses the original channel to get an output y ∈ Y, and then uses the function to output y′ = f (y) ∈ Y. Is it possible that Q′ can have greater capacity than Q? Explain why or why not. No, this is not possible. Reason: Suppose X is a random variable over the inputs, Y a variable over the outputs in Q , and Y ′ over the outputs in Q′. Evidently, we have a Markov chain X → Y → Y ′, where Y ′ = f (Y ). By the data processing inequality, we know that I (X ;Y ) ≥ I (X ;Y ′). Since the capacity is the maximal mutual information over all possible input distributions, this means that the capacity of Q′ cannot exceed that of Q . Formally, suppose X ∗Q ′ is the choice of random variable over inputs which results in the capacity for Q′. Then, C (Q ) = max pX I (X ;Y ) ≥ I (X ∗Q ′;Y ) by definition of max ≥ I (X ∗Q ′;Y ′) by data-processing inequality = C (Q′) by definition of capacity. 11