辅导案例-COMP2610/COMP6261-Assignment 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

COMP2610/COMP6261 - Information Theory
Assignment 2
Robert C. Williamson
Released: 19th September 2019. Due: 10am, 21st October 2019.
Read these instructions carefully!
Instructions
You are to design a question for a hypothetical future student of this information
theory course in order to test their knowledge, skill and understanding of aspects of
source coding that we have studied (by the end of lecture 16, on 24th september).
COMP2610 students should design a question suitable for a future COMP2610
student; COMP6261 students should design one for future COMP6261 students.
You need to submit three things:
1. Your question, which should comprise several severable parts, and needs
to be a direct question about the content we have learned regarding source
coding You can not ask a tricksy meta-question like I have done!. That is,
your question should look something like an exam question (or questions)
from previous years, or questions from the assignment from a previous
year that I have appended. You should considering including components
testing the understanding of the source coding theorem, what it says and
does not say; questions requiring calculation of (say) a Huffman code and
an arithmetic code for particular ensembles; and questions that probe the
student’s understanding of the material (“why” things work they way they
do). The bulk of your question should focus upon calculating some actual
codes, and then answering questions about them (rather than requiring proofs
of theorems). Your questions should including a marking guide for the
student.
2. You also need to submit an exemplary answer to your questions, written in
a manner that is as clear as possible and which fully answers the question
1
you posed. Please position answers to subquestions immediately following
the sub-question.
3. Finally, you need to submit a rationale as to why you designed your question
the way you did, and exactly what it is you are hoping the future student
would learn from both attempting the question, and from your exemplary
answer. You need to seperately and explicitly answer what knowledge, skills
and understanding you are testing / teaching in setting your question.
Thus the format of your submission should be
Initial instructions, and situation set-up
Subquestion 1
Answer to subquestion 1
Subquestion 2
Answer to subquestion 2
...
Subquestion n
Answer to subquestion n
Rationale
You can either type or handwrite your submission, but if handwritten, it needs to
be very clearly legible and neat, and the scan that you submit needs to be clear
and sharp.
I impose no restrictions on the details of what you ask (other than it tests
knowledge, skills and understanding re source coding). You are at liberty to
require the student to use software you provide, but I encourage that you focus on
problems that can be answered with just pencil and paper.
Resist the temptation tomake your question really complex, long and hard! The
goal is to design something that tests the essence of what we have learned about
source coding, and is tractable for an average student. You should largely stick to
material covered in lectures. If you want to go beyond what was already covered,
you need to take considerable care in walking the student through anything new. I
strongly encourage sticking almost entirely to what was done in lectures!
Your are at liberty to look at previous exams, the example I have appended,
questions in the textbook or elsewhere, for inspiration of suitable questions to
ask. But you must explicitly cite any sources you use. Your solution, and your
rationale are significant aspects of the assignment, somerely finding a nice question
elsewhere will not do the assignment for you. . .
2
It is not necessary to try and test every single thing we have covered re source
coding. Indeed, any attempt to do so would be doomed (the poor hypothetical
student would not have time). I am wanting you to reflect on the most important
aspects of what we have learned, and turn that into an assignment for future
students. What you choose to ask, is a key part of this assignment.
Assessment
This assignment is worth 15% of your overall grade. The assignment is perhaps
unusual relative to other CS assignments because of its open-ended nature. It is
deliberately designed this way to encourage you to reflect upon what you have
learned, and to practice explaining the key concepts. (The final exam will give
you plenty of straight-forward non-tricksy questions!)
Since it is open-ended, I am not allocating numericalmarks in the usual fashion.
The clarity of yourwritingwill play a role in the assessment for this assignment.
You need to pay attention to the precision of your statements, lack of ambiguity,
and correctness (of course).
The mere length of your overall submission is not a major positive factor, and
indeed concision and elegance will be applauded. Neither is the sheer number of
sub-questions you pose. Your question should be designed such that an average
student can solve it in 10 hours. (I know full well how hard this is to estimate).
Your entire investment in this assignment should be less than 20 hours (ideally
around 15). It is much more important that you have a highly polished submission
with detailed rationale.
In grading your submission, I will be taking account of its correctness, clarity,
concision, elegance, inspiration, tractability, insight, and what the student would
be expected to learn. I will take account of the degree to which it tests all three
of knowledge, skills and understanding. To be clear, if you merely pose a trivial
question which is answerable in a few minutes, that is not sufficient. Neither is it
sufficient to reproduce a question you have found verbatim. I want you to design
your own question. I explict want you to think hard about what is important in
source coding, and craft a question that captures that in an elegant and concise
manner (because this will help you learn that essence!).
Your grade for this assignment will be according to the following categorical
scale.
Fail (5, 15, 25, 35 or 45% depending upon degree): There are important flaws in
your submission. Perhaps you did not provide any rationale at all. Perhaps
there are egregious ambiguities in the question, or your answer is signifi-
cantly flawed. Your explanations are confusing and hard to follow.
3
Pass (55%): Your question is not flawed in any fatal way, but may well have some
minor to medium severity bugs. It tests some non-trivial aspect of source
coding in a manner that would take a student a fair time to solve. Your
answer is largely correct. And you have provided some sort of rationale.
Your explanations and writing are ok, but not very elegant or concise. The
question is not really inspiring. It may test one of knowledge, skills and
understanding, but does not do a good job testing all three. It is perhaps
largely requiring a calculation. It is likely a minor variation of an existing
question.
Credit (65%): Your question and solution has only minor flaws. The writing
is clear, but perhaps not particularly elegant. The question is straight-
forward but not boring, and tests more than one of knowledge, skills and
understanding. There is an element of interestingness to the question. The
rationale is moderately clear.
Distinction (75%): Neither your question nor answer has any detectable flaws. It
is well written, and your rationale shows how you test all three of knowledge,
skills and understanding. There is clearly a creative aspect to your question.
It is elegant, concise, and inspiring. Your rationale is very clear.
High Distinction (85, 90, 95, or 100% depending upon degree) A very well exe-
cuted elegant question, solution and rationale, that demonstrates creativity,
is exciting and inspiring and has no errors. The rationale makes it very
clear what the students would learn. One might hope that students doing
your question would thank you for a stimulating and fun experience. You
articulate very well in your rationale how it tests all three of knowledge,
skills and understanding. There are no flaws in the question, no ambiguities,
it is tractable, and your answer is exemplary. For the highest marks, it is
inspirational and teaches the students something they would not have learned
from perfect attention to the lectures.
Hints and Starting Points
In case the open-ended nature of this assignment is daunting, here are some hints
/ starting points.
Start with a couple of straight-forward questions that require the student to
determine a source code for a given ensemble. Say a Huffman code for a fixed
ensemble, and a simple example of arithmetic coding. Then add questions that
test the understanding and insight regarding what they have done. Build some
connection to the theory we have learned (so it is not mere cranking an algorithmic
4
handle). Construct the parts of the questions, your answers, and the rationale
simultaneously (that is don’t put off the latter till the end). You need to ensure
your question is tractable. Find some problem in the text, past exams or elsewhere,
and simply modify some of the particulars (so that the future student can’t copy
an existing answer verbatim). Make sure you explicitly cite what you have based
your question on, and what you have changed. Keep iterating, adding additional
subquestions until your answers are of a length and complexity that you judge
could be answered in 10 hours or so. Then double and triple check everything for
correctness. Polish the language. Explicitly check for vagueness and ambiguity
and typos.
I apppend an assignment from a previous year, with solutions. Focus on just
questions 1-4 (note that the overall assignment was worth 20%). And of course,
this example does not have a written rationale. If you are not sure where to start,
take the questions there, and just tweak some of the constants. In order to get a
high grade (D/HD) you would need to show more creativity. I stress that the total
length of the questions you write should be substantially less than the attached
assignment since you have to design the question, provide the rationale, as well as
providing the answer.
5
COMP2610/COMP6261 - Information Theory:
Assignment 3 Sample Solutions
Australian National University
Lecturers: Young Lee and Aditya Menon
COMP2610/6261 Shared Questions
1. (20 pts) LetX be an ensemblewithAX = {x1,x2,x3} and probabilitiespX = (1/4, 1/3, 5/12).
(a) (2 pt) Compute the cumulative distribution function F (xi ) for each outcome xi .
(b) (2 pt) Compute the modified distribution function F¯ (xi ) for each outcome xi .
(c) (2 pt) Compute the codeword lengths `(xi ) for each outcome xi .
(d) (3 pt) Compute the (binary) symbol intervals [F (xi−1), F (xi ))2 for each outcome xi ,
assuming that F (x0) = 0.
(e) (4 pt) Construct a Shannon-Fano-Elias code for X .
(f) (3 pt) Compute the (binary) codeword intervals [bF¯ (xi )c`(xi ), bF¯ (xi )c`(xi ) + 2−`(xi ) )2
for each outcome xi .
(g) (4 pt) Assuming the above code, decode the sequence 011001110, showing all your
steps.
(a) The cumulative distribution function is:
F (1) = 1/4 = 0.012
F (2) = 1/4 + 1/3 =
7
12
= 0.10012
F (3) = 1/4 + 1/3 + 5/12 = 1 = 1.02.
(b) The modified cumulative distribution function is:
F¯ (1) = (1/2) (1/4) = 1/8 = 0.0012
F¯ (2) = 1/4 + (1/2) (1/3) = 5/12 = 0.011012
F¯ (3) = 1/4 + 1/3 + (1/2) (5/12) = 19/24 = 0.110012.
(c) The code lengths are given by `(x ) =
⌈
log2
1
pi
⌉
+ 1 so that
`(1) =
⌈
log2 4
⌉
+ 1 = 3
`(2) =
⌈
log2 3
⌉
+ 1 = 3
`(3) =
⌈
log2 12/5
⌉
+ 1 = 3.
1
(d) From (a), the symbol intervals are: [0, 0.01)2, [0.01, 0.10012), [0.10012, 1)2, or
[0, 1/4)10, [1/4,
7
12 )10, [
7
12 , 1)10.
(e) From (b) and (c), the code is C = {001, 011, 110}, as we just extract the first three
bits in the binary expansion of F¯ (x ).
(f) From (b), the codeword intervals are [0.001, 0.010)2, [0.011, 0.100)2, [0.110, 0.111)2,
where in each casewe added a single bit of 1 to the final position of the corresponding
left endpoint of the interval.
(g) The decoding of 011001110 is x2 x1 x3. Proceeding step by step:
• After the first bit, we have interval [0.0, 0.1)2. This overlaps with the intervals
for x1 and x2, so we continue.
• After the next bit, we have interval [0.01, 0.10)2. This is within the interval for
x2. So, the first symbol must be x2.
• We can skip the next bit since we know the length of representation for x2 is 3
bits.
• After the next bit, we have interval [0.0, 0.1)2. This overlaps with the intervals
for x1 and x2, so we continue.
• After the next bit, we have interval [0.00, 0.01)2. This is exactly the interval for
x1. So, the second symbol must be x1.
• We can skip the next bit since we know the length of representation for x1 is 3
bits.
• After the next bit, we have interval [0.1, 1.0)2. This overlaps with the intervals
for x2 and x3, so we continue.
• After the next bit, we have interval [0.11, 1.0)2. This is within the interval for
x3. So, the first symbol must be x3.
2. (5 pts) In the Shannon-Fano-Elias code, one codes an outcome xi using the first `(xi ) =
dlog2 1/p (xi )e + 1 bits of the symbol interval mid-point F¯ (xi ). In a Shannon-Fano-Elijah
code, one codes an outcome xi using the first `(xi ) bits of the symbol interval left-point
F (xi−1), where F (x0) = 0.
Will the Shannon-Fano-Elijah code be prefix-free in general? If yes, explain why. If no,
give an example of an ensemble where the code is not prefix-free.
No, it is not prefix-free in general.
Example: Consider an ensemble X with just two outcomes x1,x2. Let p (X = x1) = 1/8
and p (X = x2) = 78 . The lengths of codewords are `(x1) = log2 8 + 1 = 4 and `(x2) =
dlog2(8/7)e + 1 = 2. The cumulative distribution function is F (x0) = 0 = 0.02, F (x1) =
1/8 = 0.0012, F (x2) = 1 = 0.12. The symbol intervals are [F (x0), F (x1)) = [0.0, 0.001)2
and [F (x1), F (x2)) = [0.001, 0.1)2. Since we use the left endpoints to code, we will thus
have the encoding {0000, 00}. The result is not prefix-free.
Note that in this case, bF (x1)c`(x2) = 002, which lies outside the interval [F (x1), F (x2)).
Thus we can no longer guarantee non-overlapping codeword intervals. The fact that
the truncated representation falls outside the interval is intuitive: by truncating the left
endpoint, we will invariably reduce its value, thus pushing us outside the interval.
2
3. (20 pts) Let X be an ensemble with AX = {a, b, c,}.
(a) (15 pt) Show the output of arithmetic coding for the sequence cab, where the
probabilities at every iteration are adapted using the Dirichlet model from lectures.
Assume an initial set of virtual counts ma = mb = mc = 1, and a constant end-of-
stream probability p () = 0.125 at every iteration.
(b) (5 pt) Suppose one codes the string a using arithmetic coding with two settings:
the first uses the virtual countsma = mb = mc = 1, and the second uses the virtual
countsma = 100,mb = mc = 1. In which of the two settings will the codeword for
a be longer? Justify your answer. (You do not need to compute the actual encoding
in either setting.)
(a) We proceed as follows.
• We know p () = 0.125 = 1/8. From the virtual counts, since at this stage we
have not observed anything, we have probabilities for a, b, c via
p (·|ϵ ) = 0 + 1
0 + 3
· (1 − p ())
= (1/3) · (7
8
)
=
7
24
≈ 0.2917.
So, we start off with the intervals[
0.0,
7
24
)
,
[ 7
24
,
14
24
)
,
[ 14
24
,
21
24
)
,
[ 21
24
, 1.0
)
=
[
0.0,
7
24
)
,
[ 7
24
,
7
12
)
,
[ 7
12
,
7
8
)
,
[ 7
8
, 1.0
)
= [0.0000, 0.2917 ) , [0.2917, 0.5833 ) , [0.5833, 0.8750 ) , [0.8750, 1.0000 )
• The first symbol is a c. So, we slice up the interval
[
7
12 ,
7
8
)
= [0.5833, 0.8750 ).
From the virtual counts, we have probabilities for a, b, c via
p (·|c) =
(0 + 1
1 + 3
,
0 + 1
1 + 3
,
1 + 1
1 + 3
)
· (1 − p ())
= (1/4, 1/4, 1/2) · 7
8
=
( 7
32
,
7
32
,
7
16
)
= (0.21875, 0.21875, 0.4375).
All probabilities must be scaled by the length of the interval, 724 = 0.8750 −
0.5833 ≈ 0.2917. This means we have increments( 49
32 · 24 ,
49
32 · 24 ,
49
16 · 24 ,
7
24 · 8
)
≈ (0.0638, 0.0638, 0.1276, 0.0365).
3
So, we have the intervals
[0.5833, 0.6471 ) , [0.6471, 0.7109 ) , [0.7109, 0.8385 ) , [0.8385, 0.8750 ) .
• The next symbol is a a. So, we slice up the interval
[
7
12 ,
7
12 +
49
24·32
)
≈
[0.5833, 0.6471). From the virtual counts, we have probabilities for a, b, c via
p (·|ca) =
(1 + 1
2 + 3
,
0 + 1
2 + 3
,
1 + 1
2 + 3
)
· (1 − p ())
= (2/5, 1/5, 2/5) · 7
8
=
( 7
20
,
7
40
,
7
20
)
= (0.35, 0.175, 0.35).
All probabilities must be scaled by the length of the interval, 4924·32 ≈ 0.0638.
This means we have increments( 343
32 · 24 · 20 ,
343
32 · 24 · 40 ,
343
16 · 24 · 20 ,
49
24 · 32 · 8
)
≈ (0.0223, 0.0111, 0.0223, 0.0079).
So, we have the intervals
[0.5833, 0.6057 ) , [0.6057, 0.6168 ) , [0.6168, 0.6392 ) , [0.6392, 0.6471 ) .
• The next symbol is a b. So, we slice up the interval [0.6057, 0.6168). From the
virtual counts, we have probabilities for a, b, c via
p (·|cab) =
(1 + 1
3 + 3
,
1 + 1
3 + 3
,
1 + 1
3 + 3
)
· (1 − p ())
= (1/3, 1/3, 1/3) · 7
8
=
( 7
24
,
7
24
,
7
24
)
≈ (0.2917, 0.2917, 0.2917).
All probabilities must be scaled by the length of the interval, 34332·24·40 ≈ 0.0111.
This means we have increments( 2401
32 · 242 · 40 ,
2401
32 · 242 · 40 ,
2401
32 · 242 · 40 ,
343
32 · 24 · 40 · 8
)
≈ (0.0032, 0.0032, 0.0032, 0.0013).
So, we have the intervals
[0.6057, 0.6089 ) , [0.6089, 0.6122 ) , [0.6122, 0.6154 ) , [0.6154, 0.6168 ) .
This is the end of the stream. So, we end up with the last interval, [0.6154, 0.6168).
The final number that we choose for coding is the interval midpoint, 0.61613159.
4
The probability of the sequence is
p (cab) = p (c|ϵ ) · p (a|c) · p (b|ca) · p (|cab)
= p (c|ϵ ) · p (a|c) · p (b|ca) · p ()
=
7
24
· 7
32
· 7
40
· 1
8
≈ 0.00139.
We have codeword length
`(cab) = d− log2 0.00139e + 1 = 11.
So, the final code is 10011101101.
(b) The code in the second setting will be shorter.
Reason: Since we actually observe an a in the string to encode, we will use the
subinterval reserved for this symbol. This subinterval will have length proportional
to − log2 p (a|ϵ ). In the second setting, we consider a to be a-priori much more likely
than in the first setting, since we assign many more “fake” occurrences of it in the
virtual countsm. So, − log2 p (a|ϵ ) will be much smaller in the second setting, since
p (a|ϵ ) will be much closer to 1. Since the codeword length is proportional to the
interval length, we can expect that the second setting will lead to a shorter codeword.
4. (6 pts) Let X be an ensemble over outcomes AX = {a, b,}, with XN the corresponding
extended ensemble for some integerN . Suppose one codes a sequencex = x1x2 . . . xN−1
from XN using arithmetic coding with fixed ensemble probabilities pX at every iteration.
(a) (3 pt) Suppose x has na,nb occurrences of a, b respectively. Provide an expression
for the length of the codeword for x , expressing your answer in terms of na,nb, as
well as the probabilities pX of the individual symbols.
(b) (3 pt) Suppose now one uses arithmetic coding with adaptive probabilities using the
Dirichlet model with virtual counts ma = mb = 1. Will the resulting codeword be
identical to that when using fixed ensemble probabilities? Explain why or why not.
(a) In general, the length is d− logp (x1x2 . . . xN)e + 1. Since we know that the proba-
bilities stay the same at every iteration,
p (x1x2 . . . xN ) = p (a)
na · p (b)nb · p ()n .
Thus, the length of the sequence is d−na · logp (a) − nb · logp (b) − logp ()e + 1
(b) No, in general the two will give different results.
Reason: with adaptive probabilities, the interval sizes will be different depending
on the particular sequence that we observe. With fixed probabilities, however, the
interval sizes stay the same at every single iteration. While we expect the two to be
increasingly similar for large sequences, it is thus not hard to construct cases where
the two give different answers.
5
5. (20 pts) Suppose we wish to compress the string aoxomoxoa.
(a) (4 pt) Suppose we code the string using LZ77 with a window sizeW = 1. What is
the output at each step of the algorithm? Show all your working.
(b) (8 pt) Suppose we code the string using LZ77 with a window sizeW = 3. What is
the output at each step of the algorithm? Show all your working.
(c) (8 pt) Suppose we code the string using LZ78. What is the output at each step of the
algorithm? Show all your working.
(a) The final output will be:
(0,a),(0,o),(0,x),(0,o),(0,m),(0,o),(0,x),(0,o),(0,a)/
This is because with a window size of 1, we will only be able to have a pointer output
when we have two successive characters the same.
(b) The final output will be:
(0,a),(0,o),(0,x),(1,2,1),(0,m),(1,2,1),(0,x),(1,2,1),(0,a).
The full processing is as per Table 1.
Processed Unprocessed Output
aoxomoxoa (0, a)
a oxomoxoa (0, o)
ao xomoxoa (0, x)
aox omoxoa (1, 2, 1)
oxo moxoa (0, m)
xom oxoa (1, 2, 1)
omo xoa (0, x)
mox oa (1, 2, 1)
oxo a (0, a)
Table 1: LZ77 output.
(c) The final output will be:
(0,a),(0,o),(0,x),(2,m),(2,x),(2,a).
The full processing is as per Figure 1. The numbering of the nodes indicates the
iteration at which the node is created.
Alternately, in tabular form, we have the output as per Table 2.
6. (9 pts) For each of the following channels, explain whether or not they are symmetric.
(a) (2 pt)
Q =

1/3 1/3 1/3
1/3 1/3 1/3
1/3 1/3 1/3
 .
6
01
a
2
4
m
5
x
6
a
o
3
x
Figure 1: LZ78 output.
Processing Output
aoxomoxoa (0, a)
oxomoxoa (0, o)
xomoxoa (0, x)
omoxoa (2, m)
oxoa (2, x)
oa (2, a)
Table 2: LZ78 output.
(b) (3 pt)
Q =

0.7 0.1
0.2 0.2
0.1 0.7
 .
(c) (3 pt)
Q =

0.7 0.2
0.2 0.1
0.1 0.7
 .
(a) Yes, this is trivially symmetric as-is: every row and column are permutations of one
another.
(b) Yes, this is symmetric if we partition the matrix into rows {1, 3} and {2}. This is
because the matrix
[
0.7 0.1
0.1 0.7
]
has every row and column permutation of one another;
and the matrix
[
0.2 0.2
]
is trivially symmetric.
(c) No, there is no feasible partition which results in every row and column being a
permutation of one another.
7. (18 pts) Consider a channel over inputs {1, 2} and outputs {1, 2, 3}, with transition matrix
Q =

0 1/2
1/4 0
3/4 1/2
 .
7
Let X ,Y be random variables over the inputs and outputs respectively. Suppose X has
probabilities (θ , 1 − θ ) for some θ ∈ [0, 1].
(a) (3 pt) Compute the distribution p (Y ), expressing your answer in terms of θ .
(b) (3 pt) Compute the entropy H (Y ), expressing your answer in terms of θ .
(c) (3 pt) Compute the conditional entropy H (Y |X ), expressing your answer in terms of
θ .
(d) (3 pt) Compute the mutual information I (X ;Y ), expressing your answer in terms of
θ .
(e) (5 pt) Compute the channel capacity C of Q . Provide the choice of θ that attains this
capacity.
(f) (2 pt) Is it possible to construct a block code that achieves a rate of 2C bits per
transmission over Q , and arbitrarily small probability of bit error?
(g) (2 pt) Is it possible to construct a block code that achieves a rate of 2C bits per
transmission over Q , and 49% probability of bit error?
(a) We have
p (Y = 1) = p (X = 1) · p (Y = 1|X = 1) + p (X = 2) · p (Y = 1|X = 2)
= 0 + (1 − θ ) · (1/2)
=
1 − θ
2
p (Y = 2) = p (X = 1) · p (Y = 2|X = 1) + p (X = 2) · p (Y = 2|X = 2)
= θ · (1/4) + 0
=
θ
4
p (Y = 3) = p (X = 1) · p (Y = 3|X = 1) + p (X = 2) · p (Y = 3|X = 2)
= θ · (3/4) + (1 − θ ) · (1/2)
= (3/4)θ + (1 − θ )/2
=
2 + θ
4
(b) From (a), we have
H (Y ) = −
∑
y
p (Y = y) · log2 p (Y = y)
= −1 − θ
2
· log2
1 − θ
2
− θ
4
log2
θ
4
− 2 + θ
4
· log2
2 + θ
4
= −1 − θ
2
log2(1 − θ ) +
1 − θ
2
− θ
4
log2 θ +
θ
2
− 2 + θ
4
log2(2 + θ ) +
2 + θ
2
= −1 − θ
2
log2(1 − θ ) −
θ
4
log2 θ −
2 + θ
4
log2(2 + θ ) +
3 + θ
2
8
(c) Suppose H2(·) denotes the binary entropy function. Due to the zero entries in each
column, we have
H (Y |X = 1) = H2(3/4)
H (Y |X = 2) = H2(1/2) = 1.
So,
H (Y |X ) = p (X = 1) · H (Y |X = 1) + p (X = 2) · H (Y |X = 2)
= θ · H2(3/4) + (1 − θ )
= θ · (H2(3/4) − 1) + 1.
(d) By definition,
I (X ;Y ) = H (Y ) − H (Y |X )
= −1 − θ
2
log2(1 − θ ) −
θ
4
log2 θ −
2 + θ
4
log2(2 + θ ) +
3 + θ
2
− θ · (H2(3/4) − 1) − 1
= −1 − θ
2
log2(1 − θ ) −
θ
4
log2 θ −
2 + θ
4
log2(2 + θ )+
θ ·
(3
2
− H2(3/4)
)
+
1
2
.
(e) We need to find the θ which maximises I (X ;Y ). This can be done with a calculator
or computer.
Algebraically, recalling that f (x ) = x · log2 x has derivative log2 x + 1log 2 , we have
dI (X ;Y )
dθ
=
1
2
· log2(1 − θ ) −
1
4
· log2 θ −
1
4
· log2(2 + θ ) +
3
2
− H2(3/4)
=
1
4
· log2(1 − θ )2 −
1
4
· log2 θ −
1
4
· log2(2 + θ ) +
3
2
− H2(3/4)
=
1
4
· log2
(1 − θ )2
θ · (2 + θ ) +
3
2
− H2(3/4).
9
This equals zero when
(1 − θ )2
θ · (2 + θ ) = c0
⇐⇒ 1 + θ
2 − 2 · θ
θ2 + 2 · θ = c0
⇐⇒ 1 + θ2 − 2 · θ = c0 · (θ2 + 2 · θ )
⇐⇒ 1 + θ2 − 2 · θ = c0 · (θ2 + 2 · θ )
⇐⇒ (1 − c0) · θ2 − 2 · (1 + c0)θ + 1 = 0
⇐⇒ θ = 2 · (1 + c0) ±
√
4 · (1 + c0)2 − 4 · (1 − c0)
2 · (1 − c0)
⇐⇒ θ =
2 · (1 + c0) ±
√
4 + 4c20 + 8c0 − 4 + 4c0
2 · (1 − c0)
⇐⇒ θ =
2 · (1 + c0) ±
√
4c20 + 12c0
2 · (1 − c0)
⇐⇒ θ =
(1 + c0) ±
√
c20 + 3c0
1 − c0 ,
for c0 = 24·(H2 (3/4)−3/2). Since θ ∈ [0, 1], we pick the solution
θ =
(1 + c0) −
√
c20 + 3c0
1 − c0 ≈ 0.5461.
The resulting channel capacity is computed by plugging in this value into part (d).
We find that the capacity is C ≈ 0.3957.
(f) No, it’s not possible. By the noisy channel coding theorem, we can’t attain rates
above the capacity while having arbitrarily small error probability.
(g) Yes, it’s possible. By the noisy channel coding theorem part (b), for a bit error
probability of pb = 0.4, we can attain rates below C/(1 − H2(0.4)). We can easily
check that C1−H2 (0.4) > 2C. So, we can design such a code.
COMP6261 Extension Questions
This section is for COMP6261 students only. There is no bonus awarded to COMP2610
students attempting this question.
8. (5 pts) Given some binary string B (e.g. B = 0100), recall that [0.B, 0.B1)2 is its codeword
interval.
(a) (2 pt) Are there binary intervals [a,b)2 ⊆ [0, 1)2 which are not codeword intervals
for any binary string B? If yes, give an example of such an interval, and justify your
answer. If no, explain why not.
10
(b) (4 pt) Let B,B′ be two distinct binary strings, and suppose that 0.B2 < 0.B′2. Is it
possible that 0.B′2 < 0.B12 < 0.B
′12? If yes, give an example of strings B,B′ for
which this is so. If no, explain why not.
Hint: It may be useful to recall the intuitive meaning of the codeword interval.
(a) Yes, this is possible.
Example: [0.0, 0.11)2. This can’t be the codeword interval for any string, since any
candidate B has to comprise only zeros; and for such a B, the right endpoint has to be
either 0.1, 0.01, and so on.
(b) No, this is not possible.
Reason: The codeword interval IB = [0.B, 0.B1)2 corresponds to all strings that
start with B, i.e.all strings for which B is a prefix. If 0.B′2 < 0.B12, then B
′ ∈ IB.
Consequently, B is a prefix of B′.
If further 0.B12 < 0.B′12, then there are strings x such that x ∈ IB′ but x < IB, i.e.there
are strings that start with B′ but not B. This is however not possible: if B is a prefix
of B′, then every string that starts with B′ by definition also starts with B. Hence, the
given condition cannot hold.
9. (5 pts) You have a noisy channel Q that takes as input a symbol from X, and outputs a
symbol fromY. A scientist offers you a function f (y) that produces a new channelQ′ as
follows: for any input x ∈ X, one uses the original channel to get an output y ∈ Y, and
then uses the function to output y′ = f (y) ∈ Y.
Is it possible that Q′ can have greater capacity than Q? Explain why or why not.
No, this is not possible.
Reason: Suppose X is a random variable over the inputs, Y a variable over the outputs
in Q , and Y ′ over the outputs in Q′. Evidently, we have a Markov chain X → Y → Y ′,
where Y ′ = f (Y ). By the data processing inequality, we know that I (X ;Y ) ≥ I (X ;Y ′).
Since the capacity is the maximal mutual information over all possible input distributions,
this means that the capacity of Q′ cannot exceed that of Q . Formally, suppose X ∗Q ′ is the
choice of random variable over inputs which results in the capacity for Q′. Then,
C (Q ) = max
pX
I (X ;Y )
≥ I (X ∗Q ′;Y ) by definition of max
≥ I (X ∗Q ′;Y ′) by data-processing inequality
= C (Q′) by definition of capacity.
11