程序代写案例-MAT1830

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
Contents
Lecture 1: What is MAT1830 about?
Lecture 2: Divisors and primes
Lecture 3: Congruences
Lecture 4: Logic
Lecture 5: Tautologies and logical equivalence
Lecture 6: Rules of inference
Lecture 7: Predicates and quantifiers
Lecture 8: Predicate logic
Lecture 9: Mathematical induction
Lecture 10: Induction and well-ordering
Lecture 11: Sets
Lecture 12: Operations on sets
Lecture 13: Functions
Lecture 14: Examples of functions
Lecture 15: Composition and inversion
Lecture 16: Relations
Lecture 17: Equivalence relations
Lecture 18: Order relations
Lecture 19: Selections and arrangements
Lecture 20: Pascal’s triangle
Lecture 21: Probability
Lecture 22: Conditional probability and Bayes’ theorem
Lecture 23: Random variables
Lecture 24: Expectation and variance
Lecture 25: Discrete distributions
Lecture 26: Recursion
Lecture 27: Recursive algorithms
Lecture 28: Recursion, lists and sequences
Lecture 29: Graphs
Lecture 30: Walks, paths and trails
Lecture 31: Degree
Lecture 32: Trees
Lecture 33: Trees, queues and stacks
Unit Information
MAT1830 – Semester 2 2018
Course coordinator
• Name: Prof. David Wood
• Office: 9Rnf/424 (Clayton)
• Email: [email protected]
Lecturer (Malaysia)
• Name: Dr. Tham Weng Kee
• Email: [email protected]
Moodle Page
The course’s moodle page can be accessed through the my.monash portal at
https://my.monash.edu.au/
Unit Guide
The full unit guide can be found on the moodle page. It contains a topic outline and information on
assessment, special consideration, plagiarism etc.
Course Information
• Lectures: Three hours per week (lectures run in weeks 1–12).
– Wed 4:00 – 6:00
– Thu 3:00 – 4:00
• Support classes: One per week as allocated (support classes run in weeks 2–12).
• Required materials: Course notes booklet – available as a pdf from the course Moodle page.
There is no required textbook for the course.
Recordings of the lectures will be available through the moodle page.
Assessment
• Ten assignments worth 4% each (one due each week from week 3 to week 12).
• Final examination worth 60% (held in the examination period).
In order to pass the course you must receive at least 50% overall AND at least 40% for the assignments
AND at least 40% for the exam.
The assignments will be issued in lectures and will be available from the course Moodle page. Assign-
ments are to be submitted to your tutor during your support class. They will then be marked and
returned to you in your next support class. No calculators or other materials will be allowed in the
final exam.
Lecture 1: What is MAT1830 about?
Discrete mathematics studies objects which
have distinct separated values (e.g. integers), as
opposed to objects which vary smoothly (e.g.
real numbers). You can think of it as being
“digital” mathematics rather than “analogue”
mathematics.
Discrete mathematics is particularly impor-
tant in computer science and the two fields are
very closely linked.
This course covers a wide range of topics in
discrete mathematics including the following:
• Numbers
• Logic
• Induction and recursion
• Sets, functions and relations
• Probability
• Graph theory
1.1 What to expect
What we do here might be a bit different to a lot
of the maths you’ve done in the past. We’ll be
concentrating on really understanding the con-
cepts, rather than simply learning how to solve
certain types of questions.
For a lot of the questions we ask, there won’t
be a single fixed procedure you can apply to get
the answer. Instead, you’ll have to think care-
fully about what the question is asking and try
to work out what is really going on. Don’t be
afraid to try different things, play around, and
look at examples.
We’ll also be emphasising the importance of
proving results.
1.2 Proofs
A proof is essentially just a water-tight argu-
ment that a certain statement must be true. As
we’ll see, even if you are pretty sure that some-
thing is true, it can be really useful to have a
proof of it, for a number of reasons.
1.3 Maths in computer science
As we mentioned above, maths and computer
science are very closely related. The topics in
this course all have many applications to com-
puter science. For example:
• Number theory is used in cryptography
to enable secure communication, identity
verification, online banking and shopping
etc.
• Logic is used in digital circuit design and
in program control flow.
• Induction and recursion are used to study
algorithms and their effectiveness.
• Functions are important in the theory of
programming and relations are vital in
database theory and design.
• Probability is vital for understanding ran-
domised algorithms and for creating sys-
tems to deal with uncertain situations.
• Graph theory is used in software which
solves allocation and scheduling problems.
Questions
1.1 What maths that you’ve done in the past
would count as discrete? What would
count as continuous instead? Are there
grey areas?
1.2 Why might proofs be important to math-
ematicians and computer scientists?
1.3 Can you think of other links between
maths and computer science?
Lecture 2: Divisors and primes
We say that integer a divides integer b if
b = qa for some integer q.
Example. 2 divides 6 because 6 = 3× 2.
This is the same as saying that division with
remainder gives remainder 0. Thus a does not
divide b when the remainder is 6= 0.
Example. 3 does not divide 14 because it leaves
remainder 2: 14 = 4× 3 + 2.
When a divides b we also say:
• a is a divisor of b,
• a is a factor of b,
• b is divisible by a,
• b is a multiple of a.
2.1 Primes
A positive integer p > 1 is a prime if its only
positive integer divisors are 1 and p. Thus the
first few prime numbers are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, . . .
The number 1 is not counted as a prime, as this
would spoil the
Fundamental Theorem of Arithmetic.
Each integer > 1 can be expressed in exactly
one way, up to order, as a product of primes.
Example. 210 = 2 × 3 × 5 × 7, and this is the
only product of primes which equals 210.
This would not be true if 1 was counted as
a prime, because many factorisations involve 1.
E.g.
210 = 1× 2× 3× 5× 7 = 12× 2× 3× 5× 7 = . . .
2.2 Recognising primes
If an integer n > 1 has a divisor, it has a divisor
6 √n, because for any divisor a > √n we also
have the divisor n/a, which is <

n.
Thus to test whether 10001 is prime, say, we
only have to see whether any of the numbers
2, 3, 4, . . . 6 100 divide 10001, since

10001 <
101. (The least divisor found is in fact 73, be-
cause 10001 = 73× 137.)
This explains a common algorithm for recog-
nising whether n is prime: try dividing n by
a = 2, 3, . . . while a 6 √n.
The algorithm is written with a boolean vari-
able prime, and n is prime if prime = T (true)
when the algorithm terminates.
assign a the value 2.
assign prime the value T.
while a 6 √n and prime= T
if a divides n
give prime the value F
else
increase the value of a by 1.
2.3 Finding divisors
This algorithm also finds a prime divisor of n.
Either
the least a 6 √n which divides n,
or,
if we do not find a divisor among the a 6 √n, n
itself is prime.
2.4 The greatest common divisor of
two numbers
It is remarkable that we can find the greatest
common divisor of positive integers m and n,
gcd(m,n), without finding their prime divisors.
This is done by the famous Euclidean al-
gorithm, which repeatedly divides the greater
number by the smaller, keeping the smaller num-
ber and the remainder.
Euclidean Algorithm.
Input: positive integers m and n withm > n
Output: gcd(m,n)
a := m, b := n
r := remainder when a is divided by b
while r 6= 0 do
a := b
b := r
r := remainder when a is divided by b
end
return b
Example. m = 237, n = 105
The first values are a = 237, b = 105,
so r = 237− 2× 105 = 27.
The next values are a = 105, b = 27,
so r = 105− 3× 27 = 24.
The next values are a = 27, b = 24,
so r = 27− 1× 24 = 3.
The next values are a = 24, b = 3,
so r = 24− 8× 3 = 0.
Thus the final value of b is 3, which is
gcd(237, 105).
This can be set out more neatly:
237 = 2 × 105 + 27
105 = 3 × 27 + 24
27 = 1 × 24 + 3
24 = 8 × 3 + 0
2.5 The Euclidean algorithm works!
We start with the precondition m > n > 0.
Then the division theorem tells us there is a re-
mainder r < b when a = m is divided by b = n.
Repeating the process gives successively smaller
remainders, and hence the algorithm eventually
returns a value.
That the value returned value is actually
gcd(m,n) relies on the following fact.
Fact. If a, b and k are integers, then
gcd(a− kb, b) = gcd(a, b).
By using this fact repeatedly, we can show that
after each execution of the while loop in the al-
gorithm gcd(b, r) = gcd(m,n). When the algo-
rithm terminates, this means b = gcd(b, 0) =
gcd(m,n). (Equivalently, in the neat set out
given above, the gcd of the numbers in the last
two columns is always gcd(m,n).)
2.6 Extended Euclidean algorithm
If we have used the Euclidean algorithm to find
that gcd(m,n) = d, we can “work backwards”
through its steps to find integers a and b such
that am+ bn = d.
Example. For our m = 237, n = 105 example
above:
3 = 27− 1×24
3 = 27− 1(105− 3×27) = −105 + 4×27
3 = −105 + 4(237− 2×105) = 4×237− 9×105
So we see that a = 4 and b = −9 is a solution in
this case.
Our first line above was a rearrangement of
the second last line of our original Euclidean al-
gorithm working. In the second line we made a
substitution for 24 based on the second line of
our original Euclidean algorithm working. In
the third line we made a substitution for 27
based on the first line of our original Euclidean
algorithm working.
Questions
2.1 Write down multiples of 13, and multiples
of 21, until you find a multiple of 13 and
a multiple of 21 which differ by 1.
2.2 Can a multiple of 15 and a multiple of 21
differ by 1? If not, what is the small-
est positive difference between such mul-
tiples?
2.3 Find gcd(13, 21) and gcd(15, 21), and sug-
gest how they are related to the results in
Questions 2.1 and 2.2.
2.4 Work out the prime factorisations of 999
and 1000.
2.5 You should find no common prime factor
of 999 and 1000. How could you see this
without factorising the numbers? (Hint: a
common divisor of 1000 and 999 is also a
common divisor of . . . what?)
Lecture 3: Congruences
We’re used to classifying the integers as ei-
ther even or odd. The even integers are those
that can be written as 2k for some integer k.
The odd integers are those that can be written
as 2k + 1 for some integer k.
even . . . ,−6,−4,−2, 0, 2, 4, 6, . . .
odd . . . ,−5,−3,−1, 1, 3, 5, . . .
This classification is useful because even and
odd integers have particular properties. For ex-
ample, the sum of any two odd integers is even.
Similarly we can split the integers into three
classes: those that are 3k for some integer k,
those that are 3k + 1 for some integer k, and
those that are 3k + 2 for some integer k.
3k . . . ,−9,−6,−3, 0, 3, 6, 9, . . .
3k + 1 . . . ,−8,−5,−2, 1, 4, 7, 10, . . .
3k + 2 . . . ,−7,−4,−1, 2, 5, 8, 11, . . .
These classes also have particular properties.
For example, the sum of an integer in the sec-
ond class and an integer in the third class will
always be in the first class.
We don’t have to stop with 3. We could di-
vide integers into 4 different classes according to
their remainders when divided by 4, and so on.
3.1 Congruences
Let n > 2 be an integer. We say integers a
and b are congruent modulo n and write
a ≡ b (mod n)
when n divides a− b.
Example.
19 ≡ 13 (mod 6) because 6 divides 19-13
12 ≡ 20 (mod 4) because 4 divides 20-12
22 ≡ 13 (mod 3) because 3 divides 22-13
3.2 Working with congruences
When working with congruences modulo some
fixed integer n, we can “substitute in” just like
we can with equalities.
If a ≡ b (mod n) and b ≡ c (mod n), then
a ≡ c (mod n).
Example. Suppose x ≡ 13 (mod 7). Then
x ≡ 6 (mod 7) because 13 ≡ 6 (mod 7).
We can add, subtract and multiply congru-
ences just like we can with equations.
If a1 ≡ b1 (mod n) and a2 ≡ b2 (mod n), then
• a1 + a2 ≡ b1 + b2 (mod n)
• a1 − a2 ≡ b1 − b2 (mod n)
• a1a2 ≡ b1b2 (mod n).
Example. If x ≡ 3 (mod 8) and y ≡ 2 (mod 8),
then
• x+ y ≡ 5 (mod 8)
• x− y ≡ 1 (mod 8)
• xy ≡ 6 (mod 8).
We can also deduce that x+ 4 ≡ 7 (mod 8),
that 4x ≡ 12 (mod 8) and so on, because obvi-
ously 4 ≡ 4 (mod 8). Note as well that 4x ≡
12 (mod 8) can be simplified to 4x ≡ 4 (mod 8).
In some situations we can also “divide
through” a congruence by an integer.
If a ≡ b (mod n) and d divides a, b and n,
then
a
d ≡ bd (mod nd ).
3.3 Solving linear congruences
Think of a congruence like 7x ≡ 5 (mod 9). This
will hold if 9 divides 7x− 5 or in other words if
there is an integer y such that 7x − 5 = 9y. So
to solve our original congruence we can find an
integer solution to 7x− 9y = 5.
Some congruences don’t have solutions.
For example, there is no solution to 10x ≡
6 (mod 20) because there are no integers x and
y such that 10x− 20y = 6.
We can find an expression for all the integers
x that satisfy a congruence like ax ≡ b (mod n)
in the following way:
1. Find d = gcd(a, n).
2. If d doesnt divide b, then there are no so-
lutions.
3. If d divides b, then divide through the con-
gruence by d to get an equivalent congru-
ence adx ≡ bd (mod nd )
4. Find integers x′ and y′ such that adx
′ −
n
dy
′ = bd . The integers x that satisfy the
original congruence are exactly those for
which x ≡ x′ (mod nd ).
Example. Find all integers x such that 36x ≡
10 (mod 114).
Using the Euclidean algorithm we find
gcd(36, 114) = 6. So 6 divides 36x − 114y
for any integers x and y, and consequently
36x − 114y 6= 10. This means that there are
no integers x such that 36x ≡ 10 (mod 114).
Example. Find all integers x such that
24x ≡ 8 (mod 44).
Using the Euclidean algorithm we find
gcd(24, 44) = 4. So we divide through by 4 to
get the equivalent congruence 6x ≡ 2 (mod 11).
Using the extended euclidean algorithm we see
that 2×6−1×11 = 1, and hence 4×6−2×11 = 2.
Thus the integers x such that 24x ≡ 8 (mod 44)
are exactly the integers x ≡ 4 (mod 11).
3.4 Modular inverses
A modular multiplicative inverse of an inte-
ger a modulo n is an integer x such that
ax ≡ 1 (mod n).
From the last section we know that such an in-
verse will exist if and only if gcd(a, n) = 1. If
inverses do exist then we can find them using
the extended Euclidean algorithm (there will
be lots of inverses, but they will all be in one
congruence class modulo n). These inverses
have important applications to cryptography
and random number generation.
Example. 8 should have a multiplicative in-
verse modulo 45 because gcd(8, 45) = 1. Using
the extended Euclidean algorithm we see that
−3 × 45 + 17 × 8 = 1. So 8 × 17 ≡ 1 (mod 45).
This means that 17 is a multiplicative inverse of
8 modulo 45.
Questions
3.1 Are the following true or false?
• 6 ≡ 3 (mod 3)
• 9 ≡ 18 (mod 8)
• 5x+ 6 ≡ 2x (mod 3)
3.2 Prove all of the facts about congruences
that were stated in this lecture (use the
definition of congruence modulo n and the
definition of divides).
3.3 Find an expression for all the integers x
that satisfy 9x ≡ 36 (mod 60).
Lecture 4: Logic
The simplest and most commonly used part
of logic is the logic of “and”, “or” and “not”,
which is known as propositional logic.
A proposition is any sentence which has a
definite truth value (true= T or false= F), such
as
1 + 1 = 2, or
10 is a prime number.
but not
What is your name? or
This sentence is false.
Propositions are denoted by letters such as
p, q, r, . . . , and they are combined into com-
pound propositions by connectives such as ∧
(and), ∨ (or) and ¬ (not).
4.1 Connectives ∧,∨ and ¬
∧,∨ and ¬ are called “connectives” because they
can be used to connect two sentences p and q
into one. These particular connectives are de-
fined so that they agree with the most com-
mon interpretations of the words “and”, “or”
and “not”.
To define p∧ q, for example, we only have to
say that p ∧ q is true only when p is true and q
is true.
We define ∧ by the following truth table:
p q p ∧ q
T T T
T F F
F T F
F F F
Similarly, p ∨ q is true when p is true or q is
true, but now we have to be more precise, be-
cause “or” has at least two meanings in ordinary
speech.
We define ∨ by the truth table
p q p ∨ q
T T T
T F T
F T T
F F F
This is the inclusive sense of “p or q” (often writ-
ten “p and/or q” and meaning at least one of p,
q is true).
Finally, “not” ¬ (also called negation) is de-
fined as follows.
We define ¬ by the truth table
p ¬p
T F
F T
The connectives ∧, ∨ and ¬, are functions of the
propositional variables p and q, which can take
the two values T and F. For this reason, ∧, ∨
and ¬ are also called truth functions.
4.2 Implication
Another important truth function is p → q,
which corresponds to “if p then q” or “p implies
q” in ordinary speech.
In ordinary speech the value of p → q de-
pends only on what happens when p is true. For
example to decide whether
MCG flooded→ the cricket is off
it is enough to see what happens when the MCG
is flooded. Thus we agree that p → q is true
when p is false.
We define → by the truth table
p q p→ q
T T T
T F F
F T T
F F T
4.3 Other connectives
Two other important connectives are↔ (“if and
only if”) and ∨ (“exclusive or”).
The sentence p↔ q is true exactly when the
truth values of p and q agree.
We define ↔ by the truth table
p q p↔ q
T T T
T F F
F T F
F F T
We could also write p↔ q as (p→ q)∧ (q → p).
We’ll see how to prove this in the next lecture.
The sentence p ∨ q is true exactly when the
truth values of p and q disagree.
We define ∨ by the truth table
p q p ∨ q
T T F
T F T
F T T
F F F
4.4 Remarks
1. The symbols ∧ and ∨ are intentionally sim-
ilar to the symbols ∩ and ∪ for set intersection
and union because
x ∈ A ∩B ⇔ (x ∈ A) ∧ (x ∈ B)
x ∈ A ∪B ⇔ (x ∈ A) ∨ (x ∈ B)
(We study sets later.)
2. The “exclusive or” function ∨ is written XOR
in some programming languages.
3. If we write 0 for F and 1 for T then ∨ becomes
the function
p q p ∨ q
1 1 0
1 0 1
0 1 1
0 0 0
This is also known as the “mod 2 sum”, because
1 + 1 = 2 ≡ 0 (mod 2). (It could also be called
the “mod 2 difference” because a+b is the same
a− b, mod 2).
4. The mod 2 sum occurs in many homes where
two switches p, q control the same light. The
truth value of p ∨ q tells whether the light is on
or not, and the light can be switched to the op-
posite state by switching the value of either p or
q.
Questions
4.1 Which of the following are propositions?
1 + 1 = 3, 1 + 1, 3 divides 7, 3÷ 7
4.2 Let f be the proposition “foo” and let b be
the proposition “bar”. Write the following
propositions in symbols, using f, b,→ and
¬.
• if foo, then bar.
• bar if foo.
• bar only if foo.
• foo implies not bar.
• foo is sufficient for bar.
• foo is necessary for bar.
4.3 In the following examples, is the “or” in-
tended to be inclusive or exclusive?
• Would you like coffee or tea?
• Oranges or lemons are a good source
of vitamin C.
• He will arrive in a minute or two.
Lecture 5: Tautologies and logical equivalence
A major problem in logic is to recognise
statements that are “always true” or “always
false”.
5.1 Tautologies and contradictions
A sentence φ in propositional logic is a formula
with variables p, q, r, . . . which can take the val-
ues T and F. The possible interpretations of φ
are all possible assignments of values to its vari-
ables.
A sentence in propositional logic is
• a tautology if it has value T under all
interpretations;
• a contradiction if it has value F under
all interpretations.
We can check whether φ is a tautology, a
contradiction, or neither by computing its value
for all possible values of its variables.
Example. (¬p) ∨ p
For p = T we have
(¬T) ∨ T = F ∨ T = T,
using the known values of the ¬ and ∨ functions.
For p = F we have
(¬F) ∨ F = T ∨ F = T,
hence (¬p) ∨ p is a tautology. (It is known as
the law of the excluded middle).
We can similarly compute the values of any
truth function φ, so this is an algorithm for
recognising tautologies. However, if φ has n
variables, they have 2n sets of values, so the
amount of computation grows rapidly with n.
One of the biggest unsolved problems of logic
and computer science is to find an efficient algo-
rithm for recognising tautologies.
5.2 Logical equivalence
Sentences φ and ψ are logically equivalent if they
are the same truth function, which also means
φ ↔ ψ is a tautology. This relation between
sentences is written φ⇔ ψ or φ ≡ ψ.
Example. p→ q ≡ (¬p) ∨ q
We know p→ q has the truth table
p q p→ q
T T T
T F F
F T T
F F T
Now (¬p) ∨ q has the truth table
p q ¬p (¬p) ∨ q
T T F T
T F F F
F T T T
F F T T
So p → q and (¬p) ∨ q have the same truth
table (looking just at their columns). It follows
from this that p → q can always be rewritten
as (¬p) ∨ q. In fact, all truth functions can be
expressed in terms of ∧,∨, and ¬.
This is like finding identities in algebra – one
uses known equivalences to rearrange, expand
and simplify.
5.3 Useful equivalences
The following equivalences are the most fre-
quently used in this “algebra of logic”.
Equivalence law
p↔ q ≡ (p→ q) ∧ (q → p)
Implication law
p→ q ≡ (¬p) ∨ q
Double Negation law
¬¬p ≡ p
Idempotent laws
p ∧ p ≡ p
p ∨ p ≡ p
Commutative laws
p ∧ q ≡ q ∧ p
p ∨ q ≡ q ∨ p
Associative laws
p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r
p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r
Distributive laws
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
De Morgan’s laws
¬(p ∧ q) ≡ (¬p) ∨ (¬q)
¬(p ∨ q) ≡ (¬p) ∧ (¬q)
Identity laws
p ∧ T ≡ p
p ∨ F ≡ p
Annihilation laws
p ∧ F ≡ F
p ∨ T ≡ T
Inverse laws
p ∧ (¬p) ≡ F
p ∨ (¬p) ≡ T
Absorption laws
p ∧ (p ∨ q) ≡ p
p ∨ (p ∧ q) ≡ p
Remarks
1. The commutative laws are used to rear-
range terms, as in ordinary algebra. The law
p ∨ q ≡ q ∨ p is like p + q = q + p in ordinary
algebra, and p ∧ q ≡ q ∧ p is like pq = qp.
2. The associative laws are used to remove
brackets. Since p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r, we
can write either side as p ∨ q ∨ r. This is like
p+ (q+ r) = (p+ q) + r = p+ q+ r in ordinary
algebra.
3. The distributive laws are used to “expand”
combinations of ∧ and ∨.
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
is like
p(q + r) = pq + pr.
The other distributive law is not like anything
in ordinary algebra.
4. Some of these laws are redundant, in the sense
that other laws imply them. For example, the
absorption law
p ∧ (p ∨ q) ≡ p
follows from the distributive, idempotent, iden-
tity and annihilation laws:
p ∧ (p ∨ q) ≡ (p ∧ p) ∨ (p ∧ q)
by distributive law
≡ p ∨ (p ∧ q)
by idempotent law
≡ (p ∧ T) ∨ (p ∧ q)
by identity law
≡ p ∧ (T ∨ q)
by distributive law
≡ p ∧ T
by annihilation law
≡ p
by identity law
Questions
5.1 Explain why there are 8 ways to assign
truth values to variables p, q, r; 16 ways
to assign truth values to variables p, q, r, s;
and in general 2n ways to assign truth val-
ues to n variables.
5.2 Use truth tables to verify the de Morgan’s
laws and absorption laws.
5.3 If p∨ q is the “exclusive or” discussed last
lecture, see whether it satisfies the dis-
tributive laws
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
Lecture 6: Rules of inference
Last time we saw how to recognise tautolo-
gies and logically equivalent sentences by com-
puting their truth tables. Another way is to in-
fer new sentences from old by rules of inference.
6.1 The rule of replacement
This rule says that any sentence may be replaced
by an equivalent sentence. Any series of such re-
placements therefore leads to a sentence equiv-
alent to the one we started with.
Using replacement is like the usual method
of proving identities in algebra – make a series
of replacements until the left hand side is found
equal to the right hand side.
Examples
1. x→ y ≡ (¬y)→ (¬x)
Proof.
x→ y ≡ (¬x) ∨ y
by implication law
≡ y ∨ (¬x)
by commutative law
≡ (¬¬y) ∨ (¬x)
by law of double negation
≡ (¬y)→ (¬x)
by implication law
x→ y ≡ (¬y)→ (¬x)
(¬y)→ (¬x) is the contrapositive of x→ y.
Example. The contrapositive of
MCG flooded→ cricket is off
is
Cricket is on→ MCG not flooded.
An implication and its contrapositive are
equivalent: they mean the same thing!
2. p→ (q → p) ≡ T
Proof.
p→ (q → p)
≡ (¬p) ∨ (q → p)
by implication law
≡ (¬p) ∨ ((¬q) ∨ p)
by implication law
≡ (¬p) ∨ (p ∨ (¬q))
by commutative law
≡ ((¬p) ∨ p) ∨ (¬q)
by associative law
≡ (p ∨ (¬p)) ∨ (¬q)
by commutatve law
≡ T ∨ (¬q) by inverse law
≡ T by annihilation law
3. ((p→ q) ∧ p)→ q ≡ T
Proof.
((p→ q) ∧ p)→ q
≡ ¬((p→ q) ∧ p) ∨ q
by implication law
≡ (¬(p→ q) ∨ (¬p)) ∨ q
by de Morgan’s law
≡ ¬(p→ q) ∨ ((¬p) ∨ q)
by associative law
≡ ¬(p→ q) ∨ (p→ q)
by implication law
≡ (p→ q) ∨ ¬(p→ q)
by commutative law
≡ T by inverse law
This tautology says that “if p implies q and p is
true then q is true”.
6.2 Modus ponens
The tautology ((p→ q) ∧ p)→ q also translates
into a rule of inference known as modus ponens:
from sentences p → q and p we can infer the
sentence q.
6.3 Logical consequence
A sentence ψ is a logical consequence of a sen-
tence φ, if ψ = T whenever φ = T. We write
this as φ⇒ ψ.
It is the same to say that φ→ ψ is a tautol-
ogy, but φ⇒ ψ makes it clearer that we are dis-
cussing a relation between the sentences φ and
ψ.
Any sentence ψ logically equivalent to φ is
a logical consequence of φ, but not all conse-
quences of ψ are equivalent to it.
Example. p ∧ q ⇒ p
p is a logical consequence of p∧q, because p = T
whenever p ∧ q = T. However, we can have
p ∧ q = F when p = T (namely, when q = F).
Hence p ∧ q and p are not equivalent.
This example shows that ⇒ is not symmet-
ric:
(p ∧ q)⇒ p but p; (p ∧ q)
This is where⇒ differs from ≡, because if φ ≡ ψ
then ψ ≡ φ.
In fact, we build the relation ≡ from ⇒ the
same way ↔ is built from →:
φ ≡ ψ means (φ⇒ ψ) and (ψ ⇒ φ).
Questions
6.1 The slogan “no pain, no gain” stands for
an implication. What is it?
6.2 What is the contrapositive of “no pain, no
gain”?
6.3 Write the following sentences as implica-
tions, and then write their contrapositives.
• You can’t make an omelette without
breaking eggs.
• If n is even, so is n2
• Haste makes waste.
6.4 Show that p → (q → (r → p)) is a tautol-
ogy using the laws of logic.
6.5 Find a tautology with n variables which
is p → (q → p) for n = 2 and p →
(q → (r → p)) for n = 3.
Lecture 7: Predicates and quantifiers
We get a more expressive language than
propositional logic by admitting predicates like
P (n), Q(x, y), R(a, b, c)
These stand for properties or relations such as
P (n) : n is prime
Q(x, y) : x 6 y
R(a, b, c) : a+ b = c.
Those with one variable, such as “n is prime,”
are usually called properties, while those with
two or more variables, such as “x 6 y,” are usu-
ally called relations.
7.1 Predicates
A predicate such as “n is prime” is not a sen-
tence because it is neither true nor false. Rather,
it is a function P (n) of n with the Boolean val-
ues T (true) or F (false). In this case, P (n) is a
function of natural numbers defined by
P (n) =
{
T if n is prime
F otherwise.
Similarly, the “x 6 y” predicate is a function of
pairs of real numbers, defined by
R(x, y) =
{
T if x 6 y
F otherwise.
Since most of mathematics involves properties
and relations such as these, only a language with
predicates is adequate for mathematics (and
computer science).
7.2 Building sentences from predi-
cates
One way to create a sentence from a predicate
is to replace its variables by constants. For ex-
ample, when P (n) is the predicate “n is prime,”
P (3) is the sentence “3 is prime.”
Another way is to use quantifiers:
• ∀ (meaning “for all”) and
• ∃ (meaning “there exists” or “there is”).
Example. ∃nP (n) is the (true) sentence
there exists an n such that n is prime.
∀nP (n) is the (false) sentence
for all n, n is prime.
Note that when ∃n is read “there exists an
n” we also add a “such that.”
7.3 Quantifiers and connectives
We can also combine quantifiers with connec-
tives from propositional logic.
Example. Let Sq(n) be the predicate “n is a
square,” and let Pos(n) be the predicate “n is
positive” as above. Then we can symbolise the
following sentences:
There is a positive square:
∃n(Pos(n) ∧ Sq(n)).
There is a positive integer which is not a
square:
∃n(Pos(n) ∧ ¬Sq(n))
All squares are positive:
∀n(Sq(n)→ Pos(n))
Notice that the “All. . . are” combination in
English actually involves an implication. This
is needed because we are making a claim only
about squares and the implication serves to
“narrow down” the set we are examining.
7.4 Alternating quantifiers
Combinations of quantifiers like ∀x∃y . . . , “for
all x there is a y . . .” are common in mathe-
matics, and can be confusing. It helps to have
some examples in mind to recall the difference
between ∀x∃y . . . and ∃y∀x . . .
The relation x < y is convenient to illustrate
such combinations; we write x < y as the pred-
icate L(x, y)
Then
∀x∃yL(x, y)
is the (true) sentence
for all x there is a y such that x < y,
which says that there is no greatest number.
But with the opposite combination of quan-
tifiers we have
∃y∀xL(x, y)
is the false sentence
there is a y such that for all x, x < y,
which says there is a number greater than all
numbers.
Even though these statements are usually
written without brackets they are effectively
bracketed “from the centre”. So ∀x∃yL(x, y)
means ∀x(∃yL(x, y)) and ∃y∀xL(x, y) means
∃y(∀xL(x, y)).
7.5 An example from Abraham Lin-
coln
You can fool all of the people some of the time
and
you can fool some of the people all of the time
but
you can’t fool all of the people all of the time.
Let F (p, t) be the predicate:
person p can be fooled at time t.
Then
∀p∃tF (p, t) says
you can fool all of the people some of the time,
∃p∀tF (p, t) says
you can fool some of the people all of the time,
¬∀p∀tF (p, t) says
you can’t fool all of the people all of the time.
Hence Lincoln’s sentence in symbols is:
∀p∃tF (p, t) ∧ ∃p∀tF (p, t) ∧ ¬∀p∀tF (p, t)
Remark. Another way to say “you can’t fool
all of the people all of the time” is
∃p∃t¬F (p, t).
Questions
7.1 Write “roses are red” in the language of
predicate logic, using
rose(x) for “x is a rose”
red(x) for “x is red.”
7.2 If P (n) stands for “n is prime” and E(n)
stands for “n is even,” what does P (n) ∧
(¬E(n)) say about n?
7.3 Using the predicates
pol(x) for “x is a politician”
liar(x) for “x is a liar”
• all politicians are liars
• some politicians are liars
• no politicians are liars
• some politicians are not liars.
Are any of these sentences logically equiv-
alent?
Lecture 8: Predicate logic
8.1 Valid sentences
The language of predicate logic is based on
predicate symbols, variables, constants, brack-
ets, ∀,∃ and connectives. The examples from
last lecture illustrate how these ingredients are
used to form sentences.
A sentence in predicate logic is valid if it has
value T under all interpretations.
This is similar to the definition of a tautology
in propositional logic. But now “all interpreta-
tions” means all interpretations of the predicate
symbols, which is more complicated. The inter-
pretation of a symbol P (n), say, must include
both the range of the variable n, as well as say-
ing those n for which P (n) is true.
8.2 Interpretations
For example, one interpretation of P (n) is “n is
positive,” where n ranges over the real numbers.
Under this interpretation, ∀nP (n) is false.
A different interpretation of P (n) is “n is
positive,” where n ranges over the numbers > 2.
Under this interpretation, ∀nP (n) is true.
Unlike in propositional logic, there are in-
finitely many different interpretations of each
formula. Thus there is no truth table method
for predicate logic. We cannot decide whether a
formula is valid by testing all interpretations.
8.3 Recognising valid sentences
Nevertheless, in some cases, we can see that a
sentence is true for all interpretations.
Example. ∀xP (x) → ∃xP (x) is true for all
properties P , and hence is valid.
Likewise, we can sometimes see that a sen-
tence is not valid by finding an interpretation
which makes it false.
Example. The sentence
∀x∃yQ(x, y)→ ∃x∀yQ(x, y)
is false if we interpret Q(x, y) as x 6 y on the
real numbers. With this interpretation
∀x∃yQ(x, y) is true
(for any number there is a larger number), but
∃x∀yQ(x, y) is false
(there is no number 6 all numbers). Hence the
implication is false.
8.4 Consequence and equivalence
As in propositional logic, a sentence ψ is a logi-
cal consequence of a sentence φ if any interpre-
tation which makes φ true makes ψ true. Again
we write φ ⇒ ψ if ψ is a consequence of φ, and
this is the same as saying φ→ ψ is valid.
Example. Any interpretation which makes
∀nP (n) true makes ∃nP (n) true, and this is why
∀xP (x)→ ∃xP (x) is valid.
Similarly, sentences ψ and φ are equivalent,
written ψ ≡ φ, if each is a consequence of the
other. Some sentences are equivalent for “propo-
sitional logic reasons.”
Example. We have
∀x(P (x) ∧Q(x)) ≡ ∀x(Q(x) ∧ P (x))
simply because
P (x) ∧Q(x) ≡ Q(x) ∧ P (x)
for any x.
However there are also equivalences that
genuinely involve quantifiers.
8.5 Useful equivalences
Two important equivalences involving quanti-
fiers are
¬∀xP (x) ≡ ∃x¬P (x)
¬∃xP (x) ≡ ∀x¬P (x)
These make sense intuitively. For example,
¬∀xP (x) means P (x) is not true for all x, hence
there is an x for which P (x) is false, that is,
∃x¬P (x).
They can also be viewed as “infinite De Mor-
gan’s laws.” If x ranges over {1, 2, 3, . . .} for ex-
ample, then
∀xP (x) ≡ P (1) ∧ P (2) ∧ P (3) ∧ · · ·
and
∃xP (x) ≡ P (1) ∨ P (2) ∨ P (3) ∨ · · ·
Hence
¬∀xP (x) ≡ ¬ (P (1) ∧ P (2) ∧ P (3) ∧ · · · )
≡ (¬P (1)) ∨ (¬P (2)) ∨ (¬P (3)) ∨ · · ·
by de Morgan’s law
≡ ∃x¬P (x).
And similarly
¬∃xP (x) ≡ ¬ (P (1) ∨ P (2) ∨ P (3) ∨ · · · )
≡ (¬P (1)) ∧ (¬P (2)) ∧ (¬P (3)) ∧ · · ·
by de Morgan’s law
≡ ∀x¬P (x).
8.6 Simplification
The infinite de Morgan’s laws allow a certain
simplification of predicate formulas by “pushing
¬ inside quantifiers.”
Example.
¬∀x∃yQ(x, y) ≡ ∃x¬∃yQ(x, y)
≡ ∃x∀y¬Q(x, y).
It is in fact possible to transform any quanti-
fied statement in predicate logic to an equivalent
with all quantifiers at the front.
8.7* Completeness and undecidabil-
ity
In 1930, Go¨del proved that there is a complete
set of rules of inference for predicate logic. This
means, in particular, that there is an algorithm
to list the valid sentences.
However, in 1936, Church and Turing proved
that there is no algorithm to list the logically
false sentences. This means, in particular, that
predicate logic is undecidable: there is no algo-
rithm which, for any sentence φ, decides whether
φ is valid or not.
This negative result is due to the power of
predicate logic: it can express all mathemati-
cal or computational problems, and it is known
that some of these problems cannot be solved by
algorithm.
Questions
8.1 Give interpretations which make the fol-
lowing sentences false.
∃nP (n)→ ∀nP (n)
∀x∀y (R(x, y)→ R(y, x))
∀m∃nS(m,n)
8.2 Give interpretations which show that the
sentences
∃x (P (x) ∧ L(x))
and
∃x (P (x) ∧ ¬L(x))
are not equivalent.
8.3 Is ∃y∀xR(x, y) a logical consequence of
∀x∃yR(x, y)?
If so, explain why. If not, give an interpre-
tation which makes ∀x∃yR(x, y) true and
∃y∀xR(x, y) false.
8.4 Is ∀x∃yR(x, y) a logical consequence of
∃y∀xR(x, y)?
If so, explain why. If not, give an interpre-
tation which makes ∃y∀xR(x, y) true and
∀x∃yR(x, y) false.
8.5 Explain why ¬∀p∀tF (p, t) ≡ ∃p∃t¬F (p, t).
Lecture 9: Mathematical induction
Since the natural numbers 0, 1, 2, 3, . . . are
generated by a process which begins with 0 and
repeatedly adds 1, we have the following.
Property P is true for all natural numbers if
1. P (0) is true.
2. P (k)⇒ P (k + 1) for all k ∈ N.
This is called the principle of mathematical
induction.
It is used in a style of proof called proof by
induction, which consists of two steps.
Base step: Proof that the required property P
is true for 0.
Induction step: Proof that if P (k) is true
then P (k + 1) is true, for each k ∈ N.
9.1 Examples
Example 1. Prove that 3 divides n3 + 2n for
all n ∈ N
Let P (n) be “3 divides n3 + 2n”.
Base step. 3 divides 03 + 2 × 0 = 0, so P (0) is
true.
Induction step. We want to prove
3 divides k3 + 2k
⇒ 3 divides (k + 1)3 + 2(k + 1).
Well,
(k + 1)3 + 2(k + 1)
= k3 + 3k2 + 3k + 1 + 2k + 2
= k3 + 2k + 3k2 + 3k + 3
= k3 + 2k + 3(k2 + k + 1).
Therefore,
3 divides k3 + 2k
⇒ 3 divides k3 + 2k + 3(k2 + k + 1)
⇒ 3 divides (k + 1)3 + 2(k + 1)
as required. This completes the induction step,
and hence completes the proof.
Example 2. Prove there are 2n n-bit binary
strings.
Let P (n) be “there are 2n n-bit binary strings”.
Base step. There is 20 = 1 0-bit binary string
(the empty string) so P (0) is true.
Induction step. We want to prove that
there are 2k k-bit binary strings
⇒ there are 2k+1 (k + 1)-bit binary strings
Well, a (k+ 1)-bit binary string is either W0 or
W1, where W is any k-bit binary string. Thus
if there are 2k k-bit binary strings W , there are
2× 2k = 2k+1 (k + 1)-bit binary strings.
This completes the induction step, and hence
completes the proof.
9.2 Starting the base step higher
It is not always appropriate to start the induc-
tion at 0. Some properties are true only from a
certain positive integer upwards, in which case
the induction starts at that integer.
Example 3. Prove n! > 2n for all integers
n > 4
Let P (n) be “n! > 2n”.
Base step. 4! = 4 × 3 × 2 = 24 > 16 = 24, so
P (4) is true.
Induction step. We want to prove k! > 2k ⇒
(k + 1)! > 2k+1 for all integers k > 4.
Now, for k > 4, if k! > 2k,
(k+1)! = (k+1)×k! > (k+1)×2k > 2×2k = 2k+1.
(The first > holds because we are assuming
k! > 2k and the second holds because k > 4.)
Thus k! > 2k ⇒ (k + 1)! > 2k+1, as required to
complete the induction.
So n! > 2n for all n > 4.
Example 4. Prove any integer value n > 8 (in
cents) is obtainable with 3c and 5c stamps.
Let P (n) be “n cents is obtainable with 3c and
5c stamps”.
Base step. 8c can be obtained by a 3c plus a 5c
stamp. So P (8) is true.
Induction step. We have to show that if k cents
is obtainable, so is (k + 1) cents, when k > 8.
Case 1. The k cents is obtained using a 5c
stamp (among others). Replace the 5c stamp by
two 3c stamps, thus obtaining k + 1 cents.
Case 2. If the k cents is obtained using only
3c stamps, there are at least three of them (since
k > 8). In this case, replace three 3c stamps by
two 5c stamps, again obtaining k + 1 cents.
Thus in either case, when k > 8, P (k) ⇒
P (k + 1). This completes the induction proof
that n cents are obtainable from 3c and 5c
stamps, for all integers n > 8.
9.3 Sums of series
Induction is often used to prove that sum for-
mulas are correct.
Example 5. Prove 1+2+3+· · ·+n = n(n+1)/2
for all integers n > 1.
Let P (n) be “1 + 2 + 3 + · · ·+ n = n(n+ 1)/2”.
Base step. When n = 1, the left hand side is 1,
and the right hand side is 1(1 + 1)/2 = 2/2 = 1,
so P (1) is true.
Induction step. We have to prove that
1 + 2 + · · ·+ k = k(k+1)2
⇒ 1 + 2 + · · ·+ k + (k + 1) = (k+1)(k+2)2 .
Now, if P (k) is true,
1 + 2 + · · ·+ k + (k + 1)
= (1 + 2 + · · ·+ k) + (k + 1)
= k(k+1)2 + (k + 1) using P (k)
= (k + 1)(k2 + 1)
= (k+1)(k+2)2
as required.
This completes the induction proof.
Remark. Another proof is to write down
1 + 2 + 3 + · · ·+ (n− 1) + n
n+ (n− 1) + · · ·+ 3 + 2 + 1
and observe that each of the n columns sums
to n + 1. Thus the sum of twice the series is
n(n + 1), and hence the sum of the series itself
is n(n + 1)/2. One could argue that this proof
uses induction stealthily, to prove that the sum
of each column is the same.
Questions
In most induction problems set for students we
skip the experimental part, which is finding what
to prove. Before trying to prove that 3 divides
n3 + 2n, for example, someone needs to guess
that it is true, perhaps by trying n = 1, 2, 3, 4.
9.1 In this question, try to guess what ? stands
for, by trying a few values of n.
• ? divides n2 + n
• The sum of the first n odd numbers
is ?
• 11×2 + 12×3 + 13×4 + . . .+ 1n(n+1) = 1−?
9.2 If you correctly guessed the sum
1
1×2 +
1
2×3 +
1
3×4 + . . .+
1
n(n+1) ,
you might wonder why it is so simple.
Here is a clue:
1
1×2 =
1
1 − 12 .
What is 12×3?
1
3×4?
How does this lead to a simple formula for
1
1×2 +
1
2×3 +
1
3×4 + . . .+
1
n(n+1)?
OK, if we can guess formulas correctly,
why bother proving them by induction?
The reason is that a statement which fits
many values of n can still be wrong.
9.3 Show that n2 + n+ 41 is a prime number
for n = 1, 2, 3, 4 (and go further, if you
like). Do you think n2 + n + 41 is prime
for all natural numbers n?
Lecture 10: Induction and well-ordering
In the previous lecture we were able to prove
a property P holds for 0, 1, 2, . . . as follows:
Base step. Prove P (0)
Induction step. Prove P (k) ⇒ P (k + 1) for
each natural number k.
This is sufficient to prove that P (n) holds for
all natural numbers n, but it may be difficult to
prove that P (k + 1) follows from P (k). It may
in fact be easier to prove the induction step
P (0) ∧ P (1) ∧ · · · ∧ P (k)⇒ P (k + 1).
That is, it may help to assume P holds for
all numbers before k + 1. Induction with this
style of induction step is sometimes called the
strong form of mathematical induction.
10.1 Examples of strong induction
Example 1. Prove that every integer > 2 is a
product of primes. (Just a prime by itself counts
as a “product”.)
Let P (n) be “n is a product of primes”.
Base step. 2 is a prime, hence a product of (one)
prime. So P (2) is true.
Induction step. Suppose 2, 3, . . . , k are products
of primes. We wish to prove that k+1 is a prod-
uct of primes.
This is certainly true if k + 1 is a prime. If not
k + 1 = i× j,
for some natural numbers i and j less than k+1.
But then i and j are products of primes by our
assumption, hence so is i× j = k + 1.
This completes the induction proof.
Example 2. Prove that every positive integer
is a sum of distinct powers of 2. (Just a power
of two by itself counts as a “sum”.)
The idea behind this proof is to repeatedly sub-
tract the largest possible power of 2. We illus-
trate with the number 27.
27− largest power of 2 less than 27
= 27− 16 = 11
11− largest power of 2 less than 11
= 11− 8 = 3
3− largest power of 2 less than 3
= 3− 2 = 1
Hence 27 = 16 + 8 + 2 + 1 = 24 + 23 + 21 + 20.
(It is only interesting to find distinct powers
of 2, because of course each integer > 1 is a sum
of 1s, and 1 = 20.)
The strong induction proof goes as follows.
Let P (n) be “n is a sum of distinct powers of
2”.
Base step. 1 = 20, so 1 is a sum of (one) power
of 2. Thus P (1) is true.
Induction step. Suppose each of the numbers
1, 2, 3, . . . , k is a sum of distinct powers of 2. We
wish to prove that k+1 is a sum of distinct pow-
ers of 2.
This is certainly true if k + 1 is a power of 2. If
not, let 2j be the greatest power of 2 less than
k + 1. Then
i = k + 1− 2j
is one of the numbers 1, 2, 3, ..., k, and hence it
is a sum of distinct powers of 2.
Also, the powers of 2 that sum to i are all
less than 2j , otherwise 2j is less than half k+ 1,
contrary to the choice of 2j as the largest power
of 2 less than k + 1.
Hence k + 1 = 2j+ powers of 2 that sum to
i is a sum of distinct powers of 2.
This completes the induction proof.
10.2 Well-ordering and descent
Induction expresses the fact that each natural
number n can be reached by starting at 0 and
going upwards (e.g. adding 1) a finite number
of times.
Equivalent facts are that it is only a finite
number of steps downwards from any natural
number to 0, that any descending sequence of
natural numbers is finite, and that any set of
natural numbers has a least element.
This property is called well-ordering of the
natural numbers. It is often convenient to ar-
range a proof to “work downwards” and appeal
to well-ordering by saying that the process of
working downwards must eventually stop.
Such proofs are equivalent to induction,
though they are sometimes called “infinite de-
scent” or similar.
10.3 Proofs by descent
Example 1. Prove that any integer > 2 has a
prime divisor.
If n is prime, then it is a prime divisor of itself.
If not, let n1 < n be a divisor of n.
If n1 is prime, it is a prime divisor of n. If
not, let n2 < n1 be a divisor of n1 (and hence of
n).
If n2 is prime, it is a prime divisor of n. If
not, let n3 < n2 be a divisor of n2, etc.
The sequence n > n1 > n2 > n3 > · · · must
eventually terminate, and this means we find a
prime divisor of n.
Example 2. Prove

2 is irrational.
Suppose that

2 = m/n for natural numbers m
and n. We will show this is impossible. Since the
square of an odd number is odd, we can argue
as follows

2 = m/n
⇒ 2 = m2/n2 squaring both sides
⇒ m2 = 2n2
⇒ m2 is even
⇒ m is even
since the square of an odd number is odd
⇒ m = 2m1 say
⇒ 2n2 = m2 = 4m21
⇒ n2 = 2m21
⇒ n is even, = 2n1 say
But then

2 = m1/n1, and we can repeat
the argument to show that m1 and n1 are both
even, so m1 = 2m2 and n1 = 2n2, and so on.
Since the argument can be repeated indefi-
nitely, we get an infinite descending sequence of
natural numbers
m > m1 > m2 > · · ·
which is impossible.
Hence there are no natural numbers m and
n with

2 = m/n.
Questions
10.1 For each of the following statements, say
which is likely to require strong induction
for its proof.
• 1 + a+ a2 + · · ·+ an = an+1−1a−1
• ¬ (p1 ∨ p2 ∨ p3 ∨ · · · ∨ pn) ≡ (¬p1) ∧
(¬p2) ∧ (¬p3) ∧ · · · ∧ (¬pn)
• Each fraction nm < 1 is a sum of
distinct fractions with numerator 1(
for example, 1112 =
1
2 +
1
3 +
1
12
)
.
10.2 There is something else which tells you ev-
ery integer > 1 is a sum of distinct powers
of 2. What is it?
10.3 Is every integer > 1 a sum of distinct pow-
ers of 3?
Lecture 11: Sets
Sets are vital in expressing mathematics for-
mally and are also very important data struc-
tures in computer science.
A set is basically just an unordered collection
of distinct objects, which we call its elements or
members. Note that there is no notion of order
for a set, even though we often write down its
elements in some order for convenience. Also,
there is no notion of multiplicity: an object is
either in a set or not – it cannot be in the set
multiple times.
Sets A and B are equal when every element
of A is an element of B and vice-versa.
11.1 Set notation
• x ∈ S means x is an element of set S.
• {x1, x2, x3, . . .} is the set with elements
x1, x2, x3, . . . .
• {x : P (x)} is the set of all x with property
P .
Example.
17 ∈ {x : x is prime} = {2, 3, 5, 7, 11, 13, . . .}
{1, 2, 3} = {3, 1, 2}
{1, 1, 1} = {1}
For a finite set S, we write |S| for the number
of elements of S.
11.2 Universal set
The idea of a “set of all sets” leads to logical
difficulties. Difficulties are avoided by always
working within a local “universal set” which in-
cludes only those objects under consideration.
For example, when discussing arithmetic it
might be sufficient to work just with the num-
bers 0, 1, 2, 3, . . .. Our universal set could then
be taken as
N = {0, 1, 2, 3, . . .},
and other sets of interest, e.g. {x : x is prime},
are parts of N.
11.3 Subsets
We say that A is a subset of B and write
A ⊆ B when each element of A is an element
of B.
Example. The set of primes forms a subset of
N, that is {x : x is prime} ⊆ N.
11.4 Characteristic functions
A subset A of B can be specified by its charac-
teristic function χA, which tells which elements
of B are in A and which are not.
χA(x) =
{
1 if x ∈ A
0 if x /∈ A
Example. The subset A = {a, c} of B =
{a, b, c} has the characteristic function χA with
χA(a) = 1, χA(b) = 0, χA(c) = 1.
We also write this function more simply as
a b c
1 0 1
In fact we can list all characteristic functions
on {a, b, c}, and hence all subsets of {a, b, c}, by
listing all sequences of three binary digits:
characteristic function subset
a b c
0 0 0 {}
0 0 1 {c}
0 1 0 {b}
0 1 1 {b, c}
1 0 0 {a}
1 0 1 {a, c}
1 1 0 {a, b}
1 1 1 {a, b, c}
We could similarly list all the subsets of a
four-element set, and there would be 24 = 16 of
them, corresponding to the 24 sequences of 0s
and ls.
In the same way, we find that an n-element
set has 2n subsets, because there are 2n binary
sequences of length n. (Each of the n places in
the sequence can be filled in two ways.)
11.5 Power set
The set of all subsets of a set U is called the
power set P(U) of U .
Example. We see from the previous table that
P({a, b, c}) is the set{{}, {c}, {b}, {b, c}, {a}, {a, c}, {a, b}, {a, b, c}}.
If U has n elements, then P(U) has 2n ele-
ments.
(The reason P(U) is called the “power” set
is probably that the number of its elements is
this power of 2. In fact, the power set of U is
sometimes written 2U .)
11.6 Sets and properties
We mentioned at the beginning that {x : P (x)}
stands for the set of objects x with property P .
Thus sets correspond to properties.
Properties of the natural numbers
0, 1, 2, 3, . . . , for example, correspond to sub-
sets of the set N = {0, 1, 2, 3, . . .}. Thus the
subset
{0, 2, 4, 6, . . .} = {n ∈ N : n is even}
corresponds to the property of being even.
Similarly, the set
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .}
corresponds to the property of being prime. The
power set P(N) corresponds to all possible prop-
erties of natural numbers.
11.7* What are numbers?
The most common approach to building mathe-
matics up from logical foundations considers all
mathematical objects to be fundamentally made
of sets. One simple way to define numbers using
sets (due to von Neumann) is the following.
0 = {}
1 = {0}
2 = {0, 1}
...
n+ 1 = {0, 1, 2, . . . , n}
We are not going to use this definition in this
course. Still, it is interesting that numbers can
be defined in such a simple way.
Questions
11.1 Suppose E(x) stands for “x is even” and
F (x) stands for “5 divides x.”
• What is the set {x : E(x) ∧ F (x)}?
• Write a formula using E(x) and
F (x) which describes the set
{5, 15, 25, 35, . . .}.
11.2 How many subsets does the set
{2, 5, 10, 20} have?
11.3 Consider the infinitely many sets
{x : 0 < x < 1}
{x : 0 < x < 12}
{x : 0 < x < 13}
{x : 0 < x < 14}
...
Do they have any element in common?
Lecture 12: Operations on sets
There is an “arithmetic” of sets similar to or-
dinary arithmetic. There are operations similar
to addition, subtraction and multiplication.
12.1 Venn diagrams
The simple operations on sets can be visualised
with the help of Venn diagrams, which show sets
A,B,C, . . . as disks within a rectangle represent-
ing the universal set U .
A B
U
12.2 Union A ∪B
The union A ∪ B of sets A and B consists of
the elements in A or B, and is indicated by the
shaded region in the following Venn diagram.
A B
U
12.3 Intersection A ∩B
The intersection A∩B of sets A and B consists
of the elements in A and B, indicated by the
shaded region in the following Venn diagram.
A B
U
12.4 Difference A−B
The difference A−B of sets A and B consists of
the elements in A and not in B, indicated by the
shaded region in the following Venn diagram.
A B
U
The difference U −B relative to the univer-
sal set U is called the complement B of B. Here
is the Venn diagram of B.
A B
U
12.5 Symmetric difference A4B
The union of A − B and B − A is called the
symmetric difference A4B of A and B.
A B
U
A4B consists of the elements of one of A,B
but not the other.
It is clear from the diagram that we have not
only
A4B = (A−B) ∪ (B −A),
but also
A4B = (A ∪B)− (A ∩B).
12.6 Ordered Pairs
Sometimes we do want order to be important.
In computer science arrays are ubiquitous ex-
amples of ordered data structures. In maths,
ordered pairs are frequently used. An ordered
pair (a, b) consists simply of a first object a and
a second object b. The objects a and b are some-
times called the entries or coordinates of the or-
dered pair.
Two ordered pairs (a, b) and (c, d) are equal
if and only if a = c and b = d.
Example. {0, 1} = {1, 0} but (0, 1) 6= (1, 0).
There’s no reason we need to stop with pairs.
We can similarly define ordered triples, quadru-
ples, and so on. When there are k coordinates,
we call the object an ordered k-tuple. Two or-
dered k-tuples are equal if and only if their ith
coordinates are equal for i = 1, 2, . . . , k.
12.7 Cartesian product A×B
The set of ordered pairs
A×B = {(a, b) : a ∈ A and b ∈ B}
is the cartesian product of sets A and B.
The commonest example is where A = B =
R (the set of real numbers, or the number line).
Then the pairs (a, b) are points in the plane, so
R× R is the plane.
a
b
(a, b)
O
Because Descartes used this idea in geome-
try, the cartesian product is named after him.
12.8 A×B and multiplication
If A has |A| elements and B has |B| elements,
then A×B has |A| × |B| elements.
Similarly, if L is a line of length l, and W is
a line of length w, then L×W is a rectangle of
area l × w. In fact, we call it an “l × w rectan-
gle.” This is probably the reason for using the
× sign, and for calling A×B a “product.”
Questions
12.1 Draw a Venn diagram for A ∩B. What is
another name for this set?
12.2 Check the de Morgan laws by drawing
Venn diagrams for A ∪B,A ∩ B,A ∩B
and A ∪B
12.3 Find which of the following is true by
drawing suitable Venn diagrams.
A ∩ (B4C) = (A ∩B)4(A ∩ C)?
A4(B ∩ C) = (A4B) ∩ (A4C)?
12.4 If plane = line × line, what do you think
line × circle is? What about circle × cir-
cle?
Lecture 13: Functions
A function can be thought of as a “black
box” which accepts inputs and, for each input,
produces a single output.
13.1 Defining functions via sets
Formally we represent a function f as a set X
of possible inputs, a set Y so that every out-
put of f is guaranteed to be in Y , and a set
of (input,output) pairs from X × Y . The vital
property of a function is that each input gives
exactly one output.
A function f consists of a domain X, a
codomain Y , and a set of ordered pairs from
X × Y which has exactly one ordered pair
(x, y) for each x ∈ X.
When (a, b) is in this set we write f(a) = b.
The set of y values occurring in these pairs is
the image of f .
Note that the image of a function is always
a subset of its codomain but they may or may
not be equal.
If the image of a function is equal to its
codomain, we say the function is onto.
Examples.
1. The squaring function square(x)= x2 with
domain R, codomain R, and pairs
{(x, x2) : x ∈ R},
which form what we usually call the plot of the
squaring function.
-1 0 1
0
1
The image of this function (the set of y val-
ues) is the set R>0 of real numbers > 0.
2. The square root function sqrt(x) =

x with
domain R>0, codomain R, and pairs
{(x,√x) : x ∈ R and x > 0}.
-1 0 1 2 3
-1
2
0
1
The image of this function (the set of y values)
is the set R>0.
3. The cubing function cube(x) = x3 with do-
main R, codomain R, and pairs
{(x, x3) : x ∈ R},
-1 0 1
-1
0
1
The image of this function is the whole of the
codomain R, so it is onto.
13.2 Arrow notation
If f is a function with domain A and
codomain B we write
f : A→ B,
and we say that f is from A to B.
For example, we could define
square : R→ R.
We could also define
square : R→ R>0.
Likewise, we could define
cube : R→ R.
However we could not define
cube : R→ R>0,
because for some x ∈ R, cube(x) is negative.
For example, cube(−1) = −1.
13.3 One-to-one functions
A function f : X → Y is one-to-one if for
each y in the image of f there is only one
x ∈ X such that f(x) = y.
For example, the function cube(x) is one-to-
one because each real number y is the cube of
exactly one real number x.
The function square: R → R is not one-to-
one because the real number 1 is the square of
two different real numbers, 1 and −1. (In fact
each real y > 0 is the square of two different real
numbers,

y and −√y)
On the other hand, square : R>0 → R is one-
to-one because each real number y in R>0 is the
square of only one real number in R>0, namely√
y.
The last example shows that the domain of
a function is an important part of its descrip-
tion, because changing the domain can change
the properties of the function.
13.4 Proving a function is one-to-one
There is an equivalent way of phrasing the def-
inition of one-to-one: a function f : X → Y is
one-to-one when, for all x1, x2 ∈ X,
f(x1) = f(x2) ⇒ x1 = x2.
This can be useful for proving that some
functions are or are not one-to-one.
Example. The function f : R → R given by
f(x) = 6x+ 2 is one-to-one because
f(x1) = f(x2)
⇒ 6x1 + 2 = 6x2 + 2
⇒ 6x1 = 6x2
⇒ x1 = x2.
Example. The function f : R → R given
by f(x) = x2 + 1 is not one-to-one because
f(−1) = 2 and f(1) = 2 and so
f(−1) = f(1).
Questions
13.1 Some of the following “rules” do not define
genuine functions. Which are they?
• For each set S of natural numbers, let
f(S) be the least element of S.
• For each set X of real numbers be-
tween 0 and 1, let g(X) be the least
element of X.
• For each circle C in the (x, y) plane,
let h(C) be the minimum distance
from C to the x-axis.
• For each pair A,B of sets of real num-
bers, let s(A,B) be the smallest set
containing both A and B.
• For each pair A,B of sets of real num-
bers, let t(A,B) be the largest set
contained in both A and B.
13.2 For each of the following, say which can be
defined with domain R and codomain R.
x2, 1/x, log x,

x, 3

x
Lecture 14: Examples of functions
The functions discussed in the last lecture
were familiar functions of real numbers. Many
other examples occur elsewhere, however.
14.1 Functions of several variables
We might define a function
sum : R× R→ R by sum(x, y) = x+ y.
Because the domain of this function is R×R, the
inputs to this function are ordered pairs (x, y)
of real numbers. Because its codomain is R, we
are guaranteed that each output will be a real
number. This function can be thought of as a
function of two variables x and y.
Similarly we might define a function
binomial : R× R× N→ R
by
binomial(a, b, n) = (a+ b)n.
Here the inputs are ordered triples (x, y, n) such
that x and y are real numbers and n is a natural
number. We can think of this as a function of
three variables.
14.2 Sequences
An infinite sequence of numbers, such as
1, 12 ,
1
4 ,
1
8 ,
1
16 , . . . ,
can be viewed as the function f : N → R de-
fined by f(n) = 2−n. In this case, the inputs to
f are natural numbers, and its outputs are real
numbers.
Any infinite sequence a0, a1, a2, a3, . . . can be
viewed as a function g(n) = an from N to some
set containing the values an.
14.3 Characteristic functions
A subset of N = {0, 1, 2, 3, . . .} can be repre-
sented by its characteristic function. For exam-
ple, the set of squares is represented by the func-
tion χ : N→ {0, 1} defined by
χ(n) =
{
1 if n is a square
0 if n is not a square
which has the following sequence of values
110010000100000010000000010000000000100 . . .
(with 1s at the positions of the squares
0, 1, 4, 9, 16, 25, 36, . . .).
Any property of natural numbers can like-
wise be represented by a characteristic function.
For example, the function χ above represents
the property of being a square.
Thus any set or property of natural numbers
is represented by a function
χ : N→ {0, 1}.
Characteristic functions of two or more vari-
ables represent relations between two or more
objects. For example, the relation x 6 y be-
tween real numbers x and y has the character-
istic function χ : R× R→ {0, 1} defined by
χ(x, y) =
{
1 if x 6 y
0 otherwise.
14.4 Boolean functions
The connectives ∧,∨ and ¬ are functions of vari-
ables whose values come from the set B = {T,F}
of Boolean values (named after George Boole).
¬ is a function of one variable, so
¬ : B→ B
and it is completely defined by giving its values
on T and F, namely
¬T = F and ¬F = T.
This is what we previously did by giving the
truth table of ¬.
∧ and ∨ are functions of two variables, so
∧ : B× B→ B
and
∨ : B× B→ B
They are completely defined by giving their val-
ues on the pairs (T,T), (T,F), (F,T), (F,F) in
B× B, which is what their truth tables do.
14.5* Characteristic functions and
subsets of N
Mathematicians say that two (possibly infinite)
sets A and B have the same cardinality (size) if
there is a one-to-one and onto function from A
to B. This function associates each element of A
with a unique element of B and vice-versa. With
this definition, it is not too hard to show that,
for example, N and Z have the same cardinality
(they are both “countably infinite”).
It turns out, though, that P(N) has a strictly
greater cardinality than N. We can prove this
by showing: no sequence f0, f1, f2, f3, . . . in-
cludes all characteristic functions for subsets of
N. (This shows that there are more characteris-
tic functions than natural numbers.)
In fact, for any infinite list f0, f1, f2, f3, . . .
of characteristic functions, we can define a char-
acteristic function f which is not on the list.
Imagine each function given as the infinite se-
quence of its values, so the list might look like
this:
f0 values 0101010101 . . .
f1 values 0000011101 . . .
f2 values 1111111111 . . .
f3 values 0000000000 . . .
f4 values 1001001001 . . .
...
Now if we switch each of the underlined values
to its opposite, we get a characteristic function
f(n) =
{
1 if fn(n) = 0
0 if fn(n) = 1
which is different from each function on the list.
In fact, it has a different value from fn on the
number n.
For the given example, f has values
11011 . . .
The construction of f is sometimes called a “di-
agonalisation argument”, because we get its val-
ues by switching values along the diagonal in the
table of values of f0, f1, f2, f3, . . ..
Questions
14.1 Suggest domains and codomains for the
following functions, writing the domain as
a cartesian product where applicable.
gcd, reciprocal, remainder ∩, ∪
14.2 If A and B are subsets of N with charac-
teristic functions χA and χB respectively,
what set does the function χA(n)χB(n)
represent?
14.3 How many Boolean functions of n vari-
ables are there?
Lecture 15: Composition and inversion
Complicated functions are often built from
simple parts. For example, the function f : R→
R defined by f(x) = (x2 + 1)3 is computed by
doing the following steps in succession:
• square,
• add 1,
• cube.
We say that f(x) = (x2 + 1)3 is the composite
of the functions (from R to R)
• square(x)=x2,
• successor(x)=x+ 1,
• cube(x)=x3.
15.1 Notation for composite func-
tions
In the present example we write
f(x) = cube(successor(square(x))),
or
f = cube ◦ successor ◦ square.
Let h : X → Y and g : Y → Z be functions.
The function g ◦ h : X → Z is defined by
g ◦ h(x) = g(h(x))
and is called the composite of g and h.
Warning: Remember that g ◦ h means “do
h first, then g.” g ◦h is usually different from
h ◦ g.
Example.
square(successor(x)) = (x+ 1)2 = x2 + 2x+ 1
successor(square(x)) = x2 + 1
15.2 Conditions for composition
Composite functions do not always exist.
Example. If reciprocal : R − {0} → R is de-
fined by reciprocal(x) = 1x and predecessor :
R → R is defined by predecessor(x) = x − 1,
then reciprocal ◦ predecessor does not exist, be-
cause predecessor(1) = 0 is not a legal input for
reciprocal.
To avoid this problem, we demand that the
codomain of h be equal to the domain of g for
g ◦ h to exist. This ensures that each output of
h will be a legal input for g.
Let h : A → B and g : C → D be func-
tions. Then g ◦ h : A→ D exists if and only
if B = C.
15.3 The identity function
On each set A the function iA : A → A defined
by
iA(x) = x,
is called the identity function (on A).
15.4 Inverse functions
Functions f : A→ A and g : A→ A are said to
be inverses (of each other) if
f ◦ g = g ◦ f = iA.
Example. square and sqrt are inverses of each
other on the set R>0 of reals > 0.
sqrt(square(x)) = x and square(sqrt(x)) = x.
In fact, this is exactly what sqrt is supposed
to do – reverse the process of squaring. How-
ever, this works only if we restrict the domain to
R>0. On R we do not have sqrt(square(x)) = x
because, for example,
sqrt(square(−1)) = sqrt(1) = 1.
This problem arises whenever we seek an in-
verse for a function which is not one-to-one. The
squaring function on R sends both 1 and −1 to
1, but we want a single value 1 for sqrt(1). Thus
we have to restrict the squaring function to R>0.
15.5 Conditions for inversion
A function f can have an inverse without its
domain and codomain being equal.
The inverse of a function f : A → B is a
function f−1 : B → A such that
f−1 ◦ f = iA and f ◦ f−1 = iB.
Note that f−1 ◦ f and f ◦ f−1 are both iden-
tity functions but they have different domains.
Not every function has an inverse, but we
can neatly classify the ones that do.
Let f : A → B be a function. Then f−1 :
B → A exists if and only if f is one-to-one
and onto.
Example: ex and log
Consider f : R → R>0 − {0} defined by f(x) =
ex. We know that ex is one-to-one (e.g. because
it is strictly increasing), and onto. So it has an
inverse f−1 on R>0 − {0}.
-4 -3 -2 -1 0 1
0
1
2
3
Plot of y = ex.
In fact, f−1 = log(y) where
log : R>0 − {0} → R.
Now
elog x = x and log(ex) = x,
so elog x and log(ex) are both identity functions,
but they have different domains.
The domain of elog x is R>0 − {0} (note log
is defined only for reals > 0). The domain of
log(ex) is R.
15.6 Operations
An operation is a particular type of function,
with domain A×A×A× . . .×A and codomain
A, for some set A.
For example, the addition function f(a, b) =
a + b is called an operation on R, because f :
R × R → R. (That is, addition is a function of
two real variables, which takes real values.)
An operation with one variable is called
unary, an operation with two variables is called
binary, an operation with three variables is
called ternary, and so on.
Examples
1. Addition is a binary operation on R.
2. Successor is a unary operation on N.
3. Intersection is a binary operation on P(A)
for any set A.
4. Complementation is a unary operation on
P(A) for any set A.
Questions
15.1 Suppose f,m and s are the following func-
tions on the set of people.
m(x) = mother of x
f(x) = father of x
s(x) = spouse of x
What are the English terms for the follow-
ing composite functions?
m ◦ s, f ◦ s, m ◦m, f ◦m, s ◦ s
15.2 Write the following functions as compos-
ites of square(x), sqrt(x), successor(x) and
cube(x).

1 + x3, x3/2, (1 + x)3, (1 + x3)2
15.3 What interesting feature do the following
functions have in common? (Hint: con-
sider their inverses.)
• ¬ on B
• The reciprocal, f(x) = 1x , on R−{0}
• The function g(x) = xx−1 , on R−{1}.
Lecture 16: Relations
Mathematical objects can be related in var-
ious ways, and any particular way of relating
objects is called a relation on the set of objects
in question.
(This also applies to relations in the every-
day sense. For example, “parent of” is a relation
on the set of people.)
A binary relation R on a set A consists of A
and a set of ordered pairs from A×A.
When (a, b) is in this set we write aRb.
Similarly, a ternary relation on A would be
defined by a set of ordered triples from A×A×A,
and so on. (A unary relation on A is just a sub-
set of A.)
16.1 Relations and functions
Any function f : X → Y can be viewed as a
relation R on X ∪ Y . The relation is defined by
xRy if and only if y = f(x).
However, not every relation is a function.
Remember that a function must have exactly
one output y for each input x in its domain. In
a relation, on the other hand, an element x may
be related to many elements y, or to none at all.
16.2 Examples
1. Equality on R.
This is the relation consisting of the pairs
(x, x) for x ∈ R. Thus it is the following
subset of the plane.
-1 0 1
0
1
-1
This relation is also a function (the identity
function on R), since there is exactly one pair
for each x ∈ R.
2. The < relation on R.
This relation consists of all the pairs (x, y)
with x < y. It is the following shaded subset
of the plane.
-1 0 1
0
1
-1
(The dashed line indicates that the points
where x = y are omitted.)
3. Algebraic curves.
An algebraic curve consists of the points
(x, y) satisfying an equation p(x, y) = 0,
where p is a polynomial.
E.g. unit circle x2 + y2 − 1 = 0.
-1 0 1
0
1
-1
Notice that this relation is not a function,
because there are two pairs with the same x,
e.g. (0, 1) and (0,−1).
Likewise, the curve y2 = x2(x + 1) is not a
function.
-1 0 1 2
-2
-1
0
1
2
4. The subset relation ⊆.
This consists of the ordered pairs of sets
(A,B) such that A ⊆ B. A and B must both
be subsets of some universal set U .
5. Congruence modulo n.
For a fixed n, congruence modulo n is a bi-
nary relation. It consists of all the ordered
pairs of integers (a, b) such that n divides
a− b.
16.3 Properties of congruence
As the symbol ≡ suggests, congruence mod n is
a lot like equality. Numbers a and b which are
congruent mod n are not necessarily equal, but
they are “equal up to multiples of n,” because
they have equal remainders when divided by n.
Because congruence is like equality, congru-
ence a ≡ b (mod n) behave a lot like equations.
In particular, they have the following three prop-
erties.
1. Reflexive property.
a ≡ a (mod n)
for any number a.
2. Symmetric property.
a ≡ b (mod n)⇒ b ≡ a (mod n)
for any numbers a and b.
3. Transitive property.
a ≡ b (mod n) and b ≡ c (mod n)⇒
a ≡ c (mod n)
for any numbers a, b and c.
These properties are clear if one remembers
that a ≡ b (mod n) means a and b have the same
remainder on division by n.
Questions
16.1 Which of the following relations R(x, y)
satisfy ∀x∃yR(x, y)?
• x ∧ y = T (for propositions x, y)
• x ⊆ y (for sets x, y of natural num-
bers)
• x > y (for real numbers x, y )
• x divides y (for natural numbers
x, y)
16.2 Use logic symbols and the 6 relation to
write a relation between real numbers x, y
which says that the point (x, y) lies in the
square with corners (0,0), (1,0), (0,1) and
(1,1).
Lecture 17: Equivalence relations
An equivalence relation R on a set A is a bi-
nary relation with the following three prop-
erties.
1. Reflexivity.
aRa
for all a ∈ A.
2. Symmetry.
aRb⇒ bRa
for all a, b ∈ A.
3. Transitivity.
aRb and bRc⇒ aRc
for all a, b, c ∈ A.
Equality and congruence mod n (for fixed n) are
examples of equivalence relations.
17.1 Other equivalence relations
1. Equivalence of fractions.
Two fractions are equivalent if they reduce
to the same fraction when the numerator and
denominator of each are divided by their gcd.
E.g. 24 and
3
6 are equivalent because both re-
duce to 12 .
2. Congruence of triangles.
Triangles ABC and A′B′C ′ are congruent if
AB = A′B′, BC = B′C ′ and CA = C ′A′. E.g.
the following triangles are congruent.
A B
C
A′B′
C ′
3. Similarity of triangles.
Triangles ABC and A′B′C ′ are similar if
AB
A′B′
=
BC
B′C ′
=
CA
C ′A′
.
E.g. the following triangles are similar
A B
C
A′B′
C ′
4. Parallelism of lines.
The relation L‖M (L is parallel to M) is an
equivalence relation.
Remark
In all these cases the relation is an equivalence
because it says that objects are the same in some
respect.
1. Equivalent fractions have the same re-
duced form.
2. Congruent triangles have the same side
lengths.
3. Similar triangles have the same shape.
4. Parallel lines have the same direction.
Sameness is always reflexive (a is the same as
a), symmetric (if a is the same as b, then b is
the same as a) and transitive (if a is the same
as b and b is the same as c, then a is the same
as c).
17.2 Equivalence classes
Conversely, we can show that if R is a reflexive,
symmetric and transitive relation then aRb says
that a and b are the same in some respect: they
have the same R-equivalence class.
If R is an equivalence relation we define the
R-equivalence class of a to be
[a] = {s : sRa}.
Thus [a] consists of all the elements related to
a. It can also be defined as {s : aRs}, because
sRa if and only if aRs, by symmetry of R.
Examples
• The parallel equivalence class of a line L
consists of all lines parallel to L.
• The equivalence class of 1 for congruence
mod 2 is the set of all odd numbers.
17.3 Equivalence class properties
Claim. If two elements are related by an equiv-
alence relation R on a set A, their equivalence
classes are equal.
Proof. Suppose a, b ∈ A and aRb. Now
s ∈ [a] ⇒ sRa by definition of [a]
⇒ sRb by transitivity of R
since sRa and aRb
⇒ s ∈ [b] by definition of [b].
Thus all elements of [a] belong to [b]. Similarly,
all elements of [b] belong to [a], hence [a] = [b].

Claim. If R is an equivalence relation on a
set A, each element of A belongs to exactly one
equivalence class.
Proof. Suppose a, b, c ∈ A, and c ∈ [a] ∩ [b].
c ∈ [a] and c ∈ [b]
⇒ cRa and cRb
by definition of [a] and [b]
⇒ aRc and cRb by symmetry
⇒ aRb by transitivity
⇒ [a] = [b]
by the previous claim.
17.4 Partitions and equivalence
classes
A partition of a set S is a set of subsets of S
such that each element of S is in exactly one
of the subsets.
Using what we showed in the last section, we
have the following.
If R is an equivalence relation on a set A,
then the equivalence classes of R form a par-
tition of A. Two elements of A are related if
and only if they are in the same equivalence
class.
Example. Let R be the relation on Z defined
by aRb if and only if a ≡ b (mod 3). The three
equivalence classes of R are
{x : x ≡ 0 (mod 3)} = {3k : k ∈ Z}
{x : x ≡ 1 (mod 3)} = {3k + 1 : k ∈ Z}
{x : x ≡ 2 (mod 3)} = {3k + 2 : k ∈ Z}.
These partition the set Z.
Questions
17.1 Which of the following relations between
integers x and y are equivalence relations?
• |x| = |y|
• x3 − y3 = 0
• x3 − y3 = 1
• x divides y
• 5 divides x− y
17.2 For those relations in Question 17.1 that
are not equivalence relations, say which
properties of equivalence they fail to sat-
isfy.
17.3 For those that are equivalence relations,
say what is the “same” about the related
objects.
17.4 Also, for those relations that are equiva-
lence relations, describe their equivalence
classes.
Lecture 18: Order relations
18.1 Partial order relations
A partial order relation R on a set A is a bi-
nary relation with the following three prop-
erties.
1. Reflexivity.
aRa
for all a ∈ A.
2. Antisymmetry.
aRb and bRa⇒ a = b
for all a, b ∈ A.
3. Transitivity.
aRb and bRc⇒ aRc
for all a, b, c ∈ A.
Examples.
1. 6 on R.
Reflexive: a 6 a for all a ∈ R.
Antisymmetric: a 6 b and b 6 a⇒ a = b for
all a, b ∈ R.
Transitive: a 6 b and b 6 c ⇒ a 6 c for all
a, b, c ∈ R.
2. ⊆ on P(N).
Reflexive: A ⊆ A for all A ∈ P(N).
Antisymmetric: A ⊆ B and B ⊆ A⇒ A = B
for all A,B ∈ P(N).
Transitive: A ⊆ B and B ⊆ C ⇒ A ⊆ C for
all A,B,C ∈ P(N).
3. Divisibility on N.
The relation “a divides b” on natural num-
bers is reflexive, antisymmetric and transi-
tive. We leave checking this as an exercise.
4. Alphabetical order of words.
Words on the English alphabet are alphabet-
ically ordered by comparing the leftmost let-
ter at which they differ. We leave checking
that this relation is reflexive, antisymmetric
and transitive as an exercise.
18.2 Total order relations
A total order relation is a special kind of partial
order relation that “puts everything in order”.
A total order relation R on a set A is a par-
tial order relation that also has the property
aRb or bRa for all a, b ∈ A.
Examples.
1. 6 on R
This is a total order relation because for all
real numbers a and b we have a 6 b or b 6 a.
2. ⊆ on P(N).
This is not a total order because, for example,
{1, 2} * {1, 3} and {1, 3} * {1, 2}.
3. Divisibility on N.
This is not a total order because, for exam-
ple, 2 does not divide 3 and 3 does not divide
2.
4. Alphabetical order of words.
This is a total order because given any two
different words, one will appear before the
other in alphabetical order.
18.3 Hasse diagrams
A partial order relationR on a finite set A can be
represented as a Hasse diagram. The elements
of A are written on the page and connected by
lines so that, for any a, b ∈ A, aRb exactly when
b can be reached from a by travelling upward
along the lines.
Example. A Hasse diagram for the relation ⊆
on the set P({1, 2}) can be drawn as follows.
{1, 2}
{1} {2}

Example. A Hasse diagram for the relation “di-
vides” on the set {1, 2, 3, 5, 6, 10, 15, 30} can be
drawn as follows.
30
6
10
15
3
2 5
1
Example. A Hasse diagram for the relation 6
on the set {1, 2, 3, 4, 5} can be drawn as follows.
5
4
3
2
1
Notice how this last Hasse diagram can be
simply drawn as a vertical chain, when the pre-
vious two are “wider” and more complicated.
This corresponds to the fact that the last exam-
ple was of a total order relation but the previous
two were not of total order relations.
18.4 Well-ordering
A well-order relation on a set is a total order
relation that also has the property that each
nonempty set of its elements contains a least el-
ement.
A well-order relation R on a set A is a to-
tal order relation such that, for all nonempty
S ⊆ A, there exists an ` ∈ S such that `Ra
for all a ∈ S.
Example. The relation 6 on N is a well-order
relation because every nonempty subset of N has
a least element.
The well-ordering of N is the basis of proofs
by induction.
Example. The relation 6 on Z is not a well-
order relation. For example, Z itself has no least
element.
Example. The relation 6 on {x : x ∈ R, x > 0}
is not a well-order relation. For example, the
subset {x : x ∈ R, x > 3} has no least element.
Questions
18.1 Explain why “antisymmetric” does not
mean “not symmetric”. Give an example
of a relation which is neither symmetric
nor antisymmetric.
18.2 Draw a diagram of the positive divisors of
42 under the relation “divides.” Why does
it resemble the diagram for the positive di-
visors of 30}?
18.3 Invent a partial order relation on N × N.
Is your ordering a total ordering? Is your
ordering a well-ordering?
Lecture 19: Selections and Arrangements
19.1 Ordered selections without
repetition
A reviewer is going to compare ten phones and
list, in order, a top three. In how many ways can
she do this? More generally, how many ways are
there to arrange r objects chosen from a set of
n objects?
In our example, the reviewer has 10 options
for her favourite, but then only 9 for her second-
favourite, and 8 for third-favourite. So there are
10× 9× 8 ways she could make her list.
For an ordered selection without repetition
of r elements from a set of n elements there are
n options for the 1st element
n− 1 options for the 2nd element
n− 2 options for the 3rd element
...
...
n− r + 1 options for the rth element.
So we have the following formula.
The number of ordered selections without
repetition of r elements from a set of n el-
ements (0 6 r 6 n) is
n(n− 1) · · · (n− r + 1) = n!
(n− r)! .
When r = n and all the elements of a set S
are ordered, we just say that this is a permuta-
tion of S. Our formula tells us there are n! such
permutations. For example, there are 3! = 6
permutations of the set {a, b, c}:
(a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a).
19.2 Unordered selections without
repetition
What if our reviewer instead chose an unordered
top three? In how many ways could she do that?
More generally, how many ways are there to
choose (without order) r objects from a set of
n objects?
A combination of r elements from a set S is
a subset of S with r elements.
For every unordered list our reviewer could
make there are 3! = 6 corresponding possible or-
dered lists. And we’ve seen that she could make
10 × 9 × 8 ordered lists. So the number of un-
ordered lists she could make is 10×9×86 .
For every combination of r elements from a
set of n elements there are r! corresponding per-
mutations. So, using our formula for the number
of permutations we have the following.
The number of combinations of r elements
from a set of n elements (0 6 r 6 n) is
n(n− 1) · · · (n− r + 1)
r!
=
n!
r!(n− r)! =
(
n
r
)
.
Notice that the notation
(
n
r
)
is used for n!r!(n−r)! .
Expressions like this are called binomial coeffi-
cients. We’ll see why they are called this in the
next lecture.
19.3 Ordered selections with
repetition
An ordered selection of r elements from a set X
is really just a sequence of length r with each
term in X. If X has n elements, then there are
n possibilities for each term and so:
The number of sequences of r terms, each
from some set of n elements, is
n× n× · · · × n︸ ︷︷ ︸
r
= nr.
19.4 Unordered selections with
repetition
A shop has a special deal on any four cans of soft
drink. Cola, lemonade and sarsaparilla flavours
are available. In how many ways can you select
four cans?
We can write a selection in a table, for ex-
ample,
C L S
• •• • and
C L S
• ••• .
We can change a table like this into a string
of zeroes and ones, by moving from left to right
reading a “•” as a 0 and a column separator as
a 1. The tables above would be converted into
0 1 0 0 1 0 and 1 0 1 0 0 0
Notice that each string has four zeroes (one
for each can selected) and two ones (one fewer
than the number of flavours). We can choose
a string like this by beginning with a string of
six ones and then choosing four ones to change
to zeroes. There are
(
6
4
)
ways to do this and so
there are
(
6
4
)
possible can selections.
An unordered selection of r elements, with
repetition allowed, from a set X of n elements
can be thought of as a multiset with r elements,
each in X. As in the example, we can represent
each such multiset with a string of r zeroes and
n − 1 ones. We can choose a string like this by
beginning with a string of n + r − 1 ones and
then choosing r ones to change to zeroes.
The number of multisets of r elements, each
from a set of n elements, is(
n+ r − 1
r
)
=
(n+ r − 1)!
r!(n− 1)! .
19.5 The pigeonhole principle
The pigeonhole principle is a reasonably obvious
statement, but can still be very useful.
If n items are placed in m containers with
n > m, then at least one container has at
least two items.
Example. If a drawer contains only blue, black
and white socks and you take out four socks
without looking at them, then you are guaran-
teed to have two of the same colour.
We can generalise the pigeonhole principle
as follows.
If n items are placed in m containers, then at
least one container has at least d nme items.
In the above d nme means the smallest integer
greater than nm (or
n
m “rounded up”).
Example. If 21 tasks have been distributed be-
tween four processor cores, the busiest core must
have been assigned at least 6 tasks.
Questions
19.1 A bank requires a PIN that is a string of
four decimal digits. How many such PINs
are there? How many are made of four
different digits?
19.2 How many binary strings of length 5 are
there? How many of these contain exactly
two 1s?
19.3 In a game, each of ten players holds red,
blue and green marbles, and places one
marble in a bag. How many possibilities
are there for the colours of marbles in the
bag? If each player chooses their colour at
random are all of these possibilities equally
likely?
19.4 How many ways are there to partition
(that is, split up) a set with 10 elements
into one class of 5 elements, one class of
3 elements and one class of 2 elements?
How many ways are there to partition a
set with 10 elements into one class of 6 el-
ements and two classes of 2 elements each?
Lecture 20: Pascal’s triangle
20.1 Pascal’s triangle
We can write the binomial coefficients in an (in-
finite) triangular array as follows:(
0
0
)(
1
0
) (
1
1
)(
2
0
) (
2
1
) (
2
2
)(
3
0
) (
3
1
) (
3
2
) (
3
3
)(
4
0
) (
4
1
) (
4
2
) (
4
3
) (
4
4
)(
5
0
) (
5
1
) (
5
2
) (
5
3
) (
5
4
) (
5
5
)(
6
0
) (
6
1
) (
6
2
) (
6
3
) (
6
4
) (
6
5
) (
6
6
)
...
...
...
Here are the first ten rows with the entries as
integers:
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
This triangular array is often called Pascal’s
triangle (although Pascal was nowhere near the
first to discover it).
20.2 Patterns
Writing the binomial coeffcients this way reveals
a lot of different patterns in them. Perhaps the
most obvious is that every row reads the same
left-to-right and right-to-left. Choosing r ele-
ments from a set of n elements to be in a combi-
nation is equivalent to choosing n − r elements
from the same set to not be in the combination.
So: (
n
r
)
=
(
n
n− r
)
for 0 6 r 6 n.
This shows that every row reads the same left-
to-right and right-to-left.
Another pattern is that every “internal” en-
try in the triangle is the sum of the two entries
above it. To see why this is, we’ll begin with an
example.
Example. Why is
(
6
2
)
=
(
5
2
)
+
(
5
1
)
?
There are
(
6
2
)
combinations of 2 elements of
{1, 2, 3, 4, 5, 6}. Every such combination either
• does not contain a 6, in which case it is
one of the
(
5
2
)
combinations of 2 elements
of {1, 2, 3, 4, 5}; or
• does contain a 6, in which case the rest of
the combination is one of the
(
5
1
)
combi-
nations of 1 element from {1, 2, 3, 4, 5}.
So
(
6
2
)
=
(
5
2
)
+
(
5
1
)
.
We can make a similar argument in general.
Let X be a set of n elements and x is a fixed
element of X. For any r ∈ {1, . . . , n}, there are(
n−1
r
)
combinations of r elements of X that do
not contain x and there are
(
n−1
r−1
)
combinations
of r elements of X that do contain x. So:(
n
r
)
=
(
n− 1
r
)
+
(
n− 1
r − 1
)
for 1 6 r 6 n.
This shows that every internal entry in Pascal’s
triangle is the sum of the two above it.
20.3 The binomial theorem
(x+ y)0 = 1
(x+ y)1 = x+ y
(x+ y)2 = x2 + 2xy + y2
(x+ y)3 = x3 + 3x2y + 3xy2 + y3
(x+ y)4 = x4 + 4x3y + 6x2y2 + 4xy3 + y4
(x+y)5 = x5+5x4y+10x3y2+10x2y3+5xy4+y5
Notice that the coefficients on the right are
exactly the same as the entries in Pascal’s tri-
angle. Why does this happen? Think about
expanding (x+ y)3 and finding the coefficient of
xy2, for example.
(x+ y)(x+ y)(x+ y) = xxx+ xxy + xyx+ xyy
+yxx+yxy+yyx+yyy
= x3 + 3x2y + 3xy2 + y3
The coefficient of xy2 is 3 because we have
three terms in the sum above that contain two
y’s (those underlined). This is because there are(
3
2
)
ways to choose two of the three factors in a
term to be y’s.
The same logic holds in general. The coeffi-
cient of xn−ryr in (x + y)n will be
(
n
r
)
because
there will be
(
n
r
)
ways to choose r of the n fac-
tors in a term to be y’s. This fact is called the
binomial theorem.
Binomial theorem For any n ∈ N,
(x+y)n =
(
n
0
)
xny0+
(
n
1
)
xn−1y1+
(
n
2
)
xn−2y2+
· · ·+( nn−1)x1yn−1 +(nn)x0yn.
20.4 Inclusion-exclusion
A school gives out prizes to its best ten students
in music and its best eight students in art. If
three students receive prizes in both, how many
students get a prize? If we try to calculate this
as 10 + 8 then we have counted the three over-
achievers twice. To compensate we need to sub-
tract three and calculate 10 + 8− 3 = 15.
In general, if A and B are finite sets then we
have
|A ∪B| = |A|+ |B| − |A ∩B|.
With a bit more care we can see that if A, B
and C are sets then we have
|A ∪B ∪ C| = |A|+ |B|+ |C| − |A ∩B|
− |A ∩ C| − |B ∩ C|+ |A ∩B ∩ C|.
This is part of a more general law called the
inclusion-exclusion principle.
LetX1, X2, . . . , Xt be finite sets. To calculate
|X1 ∪X2 ∪ · · · ∪Xt|:
• add the sizes of the sets;
• subtract the sizes of the 2-way intersections;
• add the sizes of the 3-way intersections;
• subtract the sizes of the 4-way intersections;
...
• add/subtract the size of the t-way intersection.
To see why this works, think of an element x that
is in n of the sets X1, X2, . . . , Xt. It is counted(
n
1
)

(
n
2
)
+
(
n
3
)

(
n
4
)
+ · · · ±
(
n
n
)
times. By the Binomial theorem with x = 1 and
y = −1 (see Question 20.1), this is equal to 1.
So each element is counted exactly once.
Questions
20.1 Substitute x = 1 and y = −1 into the
statement of the binomial theorem. What
does this tell you about the rows of Pas-
cal’s triangle?
20.2 Find a pattern in the sums of the rows
in Pascal’s triangle. Prove your pattern
holds using the binomial theorem. Also
prove it holds by considering the powerset
of a set.
20.3 Use inclusion-exclusion to work out how
many numbers in the set {1, . . . , 100} are
divisible by 2 or 3 or 5.
Lecture 21: Probability
Probability gives us a way to model ran-
dom processes mathematically. These processes
could be anything from the rolling of dice, to
radioactive decay of atoms, to the performance
of a stock market index. The mathematical en-
vironment we work in when dealing with prob-
abilities is called a probability space.
21.1 Probability spaces
We’ll start with a formal definition and then
look at some examples of how the definition is
used.
A probability space consists of
• a set S called a sample space which con-
tains all the possible outcomes of the ran-
dom process; and
• a probability function Pr : S → [0, 1] such
that the sum of the probabilities of the out-
comes in S is 1.
Each time the process occurs it should produce
exactly one outcome (never zero or more than
one). The probability of an outcome is a mea-
sure of the likeliness that it will occur. It is
given as a real number between 0 and 1 inclu-
sive, where 0 indicates that the outcome cannot
occur and 1 indicates that the outcome must oc-
cur.
Example.
1
4
3
2
The spinner above might be modeled by a prob-
ability space with sample space S = {1, 2, 3, 4}
and probability function given as follows.
Pr(s) =

1
2 for s = 1
1
4 for s = 2
1
8 for s = 3
1
8 for s = 4.
It can be convenient to give this as a table:
s 1 2 3 4
Pr(s) 12
1
4
1
8
1
8
.
Example. Rolling a fair six-sided die could
be modeled by a probability space with sample
space S = {1, 2, 3, 4, 5, 6} and probability func-
tion Pr given as follows.
s 1 2 3 4 5 6
Pr(s) 16
1
6
1
6
1
6
1
6
1
6
.
A sample space like this one where every out-
come has an equal probability is sometimes
called a uniform sample space. Outcomes from
a uniform sample space are said to have been
taken uniformly at random.
21.2 Events
An event is a subset of the sample space.
An event is just a collection of outcomes we are
interested in for some reason.
Example. In the die rolling example with S =
{1, 2, 3, 4, 5, 6}, we could define the event of
rolling at least a 3. Formally, this would be the
set {3, 4, 5, 6}. We could also define the event of
rolling an odd number as the set {1, 3, 5}.
The probability of an event A is the sum of
the probabilities of the outcomes in A.
Example. In the spinner example, for the event
A = {1, 2, 4}, we have
Pr(A) = Pr(1) + Pr(2) + Pr(4)
= 12 +
1
4 +
1
8
= 78 .
In a uniform sample space (where all out-
comes are equally likely) the probability of an
event A can be calculated as:
Pr(A) =
number of outcomes in A
number of outcomes
=
|A|
|S| .
21.3 Operations on events
Because events are defined as sets we can per-
form set operations on them. If A and B are
events for a sample space S, then
• A ∪B is the event “A or B,”
• A ∩B is the event “A and B,”
• A is the event “not A.”
We always take the sample space as our univer-
sal set, so A means S −A.
21.4 Probabilities of unions
We saw in the section on the inclusion-exclusion
principal that |A ∪B| = |A|+ |B| − |A ∩B| for
finite sets A and B. We have a similar law in
probability.
For any two events A and B,
Pr(A ∪B) = Pr(A) + Pr(B)− Pr(A ∩B).
Example. In our die rolling example, let A =
{1, 2} and B = {2, 3, 4} be events. Then
Pr(A∪B) = |A||S|+
|B|
|S| −
|A ∩B|
|S| =
2
6
+
3
6
−1
6
=
2
3
.
Two events A and B are mutually exclusive
if Pr(A ∩ B) = 0, that is, if A and B cannot
occur together. For mutually exclusive events,
we have
Pr(A ∪B) = Pr(A) + Pr(B).
21.5 Independent events
We say that two events are independent when
the occurrence or non-occurrence of one event
does not affect the likelihood of the other occur-
ring.
Two events A and B are independent if
Pr(A ∩B) = Pr(A)Pr(B).
Example. A binary string of length 3 is gener-
ated uniformly at random. The event A that the
first bit is a 1 is independent of the event B that
the second bit is a 1. But A is not independent
of the event C that the string contains exactly
two 1s.
Formally, the sample space is S =
{000, 001, 010, 011, 100, 101, 110, 111} and
Pr(s) = 18 for any s ∈ S. So,
A = {100, 101, 110, 111} Pr(A) = 12
B = {010, 011, 110, 111} Pr(B) = 12
C = {011, 101, 110} Pr(C) = 38
A ∩B = {110, 111} Pr(A ∩B) = 14
A ∩ C = {101, 110} Pr(A ∩ C) = 14
So Pr(A ∩ B) = Pr(A)Pr(B) but Pr(A ∩ C) 6=
Pr(A)Pr(C).
21.6 Warning
Random processes can occur in both discrete
and continuous settings, and probability theory
can be applied in either setting. In this lecture,
and in the next four lectures, we are discussing
only the discrete case. Many of the definitions
and results we state apply only in this case. Our
definition of a probability space, for example, is
actually the definition of a discrete probability
space, and so on.
The discrete setting provides a good envi-
ronment to learn most of the vital concepts and
intuitions of probability theory. What you learn
here is very useful in itself, and will act as a good
base if you go on to study continuous probabil-
ity.
Questions
An integer is chosen uniformly at random from
the set {1, 2, . . . , 30}. Let A be the event that
the integer is at most 20. Let B be the event
that the integer is divisible by 6. Let C be the
event that the integer’s last digit is a 5.
21.1 Write A,B and C as sets, and find their
probabilities.
21.2 Find the probabilities of A∪B, A∪C and
B ∪ C. Which pairs of A,B,C are mutu-
ally exclusive?
21.3 Find the probabilities of A∩B, A∩C and
B ∩ C. Which pairs of A,B,C are inde-
pendent?
Lecture 22: Conditional probability and Bayes’ theorem
Your friend believes that Python coding has
become more popular than AFL in Melbourne.
She bets you $10 that the next person to pass
you on the street will be a Python program-
mer. You feel confident about this bet. How-
ever, when you see a man in a “Hello, world!”
t-shirt approaching, you don’t feel so confident
any more. Why is this?
We can think about this with a diagram.
The rectangle represents the set of people in
Melbourne, the circle P is the set of Python
coders, and the circle T is the set of “Hello,
world!” t-shirt owners.
T P
Initially, you feel confident because the circle
P takes up a small proportion of the rectan-
gle. But when you learn that your randomly
selected person is in the circle T , you feel bad
because the circle P covers almost all of T . In
mathematical language, the probability that a
random Melbournian is a Python coder is low,
but the probability that a random Melbournian
is a Python coder given that they own a “Hello,
world!” t-shirt is high.
22.1 Conditional probability
Conditional probabilities measure the likelihood
of an event, given that some other event occurs.
For events A and B, the conditional probabil-
ity of A given B is
Pr(A|B) = Pr(A∩B)Pr(B) .
This definition also implies that
Pr(A ∩B) = Pr(A|B)Pr(B).
Example. The spinner from the last lecture is
spun. Let A be the event that the result was at
least 3 and B be the event that the result was
even. What is Pr(A|B)?
Pr(A ∩B) = Pr(4) = 18
Pr(B) = Pr(2) + Pr(4) = 14 +
1
8 =
3
8
Thus,
Pr(A|B) = Pr(A∩B)Pr(B) = (18)/(38) = 13 .
Example. A binary string of length 6 is gener-
ated uniformly at random. Let A be the event
that the first bit is a 1 and B be the event that
the string contains two 1s. What is Pr(A|B)?
There are 26 strings in our sample space.
Now A∩B occurs when the first bit is 1 and the
rest of the string contains 1 one. There are
(
5
1
)
such strings and so Pr(A ∩ B) = (51)/26. Also,
there are
(
6
2
)
strings containing two 1s and so
Pr(B) =
(
6
2
)
/26. Thus,
Pr(A|B) = Pr(A∩B)Pr(B) =
(
5
1
)
/
(
6
2
)
= 13 .
22.2 Independence again
Our definition of conditional probability gives us
another way of defining independence. We can
say that events A and B are independent if
Pr(A) = Pr(A|B).
This makes sense intuitively: it is a formal way
of saying that the likelihood of A does not de-
pend on whether or not B occurs.
22.3 Independent repeated trials
Generally if we perform exactly the same action
multiple times, the results for each trial will be
independent of the others. For example, if we
roll a die twice, then the result of the first roll
will be independent of the result of the second.
For two independent repeated trials, each
from a sample space S, our overall sample
space is S × S and our probability function will
be given by Pr((s1, s2)) = Pr(s1)Pr(s2). For
three independent repeated trials the sample
space is S × S × S and the probability function
Pr((s1, s2, s3)) = Pr(s1)Pr(s2)Pr(s3), and so on.
Example. The spinner from the previous ex-
ample is spun twice. What is the probability
that the results add to 5?
A total of 5 can be obtained as (1,4), (4,1), (2,3)
or (3,2). Because the spins are independent:
Pr((1, 4)) = Pr((4, 1)) = 12 × 18 = 116
Pr((2, 3)) = Pr((3, 2)) = 14 × 18 = 132
So, because (1,4), (4,1), (2,3) and (3,2) are mu-
tually exclusive, the probability of the total be-
ing 5 is 116 +
1
16 +
1
32 +
1
32 =
3
16 .
22.4 Bayes’ theorem
Bayes’ theorem gives a way of calculating the
conditional probability of an event A given an
event B when we already know the probabilities
of A, of B given A, and of B given A.
Bayes’ theorem. For the events A and B,
Pr(A|B) = Pr(B|A)Pr(A)
Pr(B|A)Pr(A) + Pr(B|A)Pr(A) .
Note that the denominator above is simply an
expression for Pr(B). The fact that
Pr(B) = Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
is due to the law of total probability.
22.5 Bayes’ theorem examples
Example. Luke Skywalker discovers that some
porgs have an extremely rare genetic mutation
that makes them powerful force users. He de-
velops a test for this mutation that is right 99%
of the time and decides to test all the porgs on
Ahch-To. Suppose there are 100 mutant porgs
in the population of 24 million. We would guess
that the test would come up positive for 99 of the
100 mutants, but also for 239 999 non-mutants.
We are assuming that the conditional prob-
ability of a porg testing positive given its a mu-
tant is 0.99. But what is the conditional prob-
ability of it being a mutant given that it tested
positive? From our guesses, we would expect
this to be 9999+239999 ≈ 0.0004. Bayes’ theorem
gives us a way to formalise this:
Pr(M |P ) = Pr(P |M)Pr(M)
Pr(P |M)Pr(M)+Pr(P |M)Pr(M)
=
100
24000000
×0.99
100
24000000
×0.99+(1− 100
24000000
)×0.01
= 9999+239999
≈ 0.0004.
Example. A binary string is created so that the
first bit is a 0 with probability 13 and then each
subsequent bit is the same as the preceding one
with probability 34 . What is the probability that
the first bit is 0, given that the second bit is 0?
Let F be the event that the first bit is 0 and
let S be the event that the second bit is 0. So
Pr(F ) = 13 . If F occurs then the second bit will
be 0 with probability 34 and so Pr(S|F ) = 34 . If
F does not occur then the second bit will be 0
with probability 14 and so Pr(S|F ) = 14 . So, by
Bayes theorem,
Pr(F |S) = Pr(F )Pr(S|F )
Pr(F )Pr(S|F )+Pr(F )Pr(S|F )
=
1
3
× 3
4
1
3
× 3
4
+ 2
3
× 1
4
= (14)/(
5
12)
= 35 .
Questions
22.1 An integer is selected uniformly at random
from the set {1, 2, . . . , 15}. What is the
probability that it is divisible by 5, given
that it is odd?
22.2 A standard die is rolled twice. What is the
probability that the first roll is a 1, given
that the sum of the rolls is 6?
22.3 A bag contains three black marbles and
two white marbles and they are randomly
selected and removed, one at a time un-
til the bag in empty. Use Bayes’ theo-
rem to calculate the probability that the
first marble selected is black, given that
the second marble selected is black.
Lecture 23: Random variables
In a game, three standard dice will be rolled
and the number of sixes will be recorded. We
could let X stand for the number of sixes rolled.
Then X is a special kind of variable whose value
is based on a random process. These are called
random variables.
Because the value of X is random, it doesn’t
make sense to ask whether X = 0, for exam-
ple. But we can ask what the probability is that
X = 0 or that X > 2. This is because “X = 0”
and “X > 2” correspond to events from our
sample space.
23.1 Formal definition
Formally, a random variable is defined as a func-
tion from the sample space to R. In the example
above, X is a function from the process’s sample
space that maps every outcome to the number
of sixes in that outcome.
Example. Let X be the number of 1s in
a binary string of length 2 chosen uniformly
at random. Formally, X is a function from
{00, 01, 10, 11} to {0, 1, 2} such that
X(00) = 0, X(01) = 1, X(10) = 1, X(11) = 2.
For most purposes, however, we can think of X
as simply a special kind of variable.
23.2 Probability distribution
We can describe the behaviour of a random vari-
able X by listing, for each value x that X can
take, the probability that X = x. This gives the
probability distribution of the random variable.
Again, formally this listing is a function from
the values of X to their probabilities.
Example. Continuing with the last example,
the probability distribution of X is given by
Pr(X = x) =

1
4 if x = 0
1
2 if x = 1
1
4 if x = 2.
It can be convenient to give this as a table:
x 0 1 2
Pr(X = x) 14
1
2
1
4
.
Example. A standard die is rolled three times.
Let X be the number of sixes rolled. What is the
probability distribution of X? Obviously X can
only take values in {0, 1, 2, 3}. Each roll there
is a six with probability 16 and not a six with
probability 56 . The rolls are independent.
Pr(X = 0) = 56 × 56 × 56
Pr(X = 1) = (16)(
5
6)(
5
6) + (
5
6)(
1
6)(
5
6) + (
5
6)(
5
6)(
1
6)
Pr(X = 2) = (16)(
1
6)(
5
6) + (
1
6)(
5
6)(
1
6) + (
5
6)(
1
6)(
1
6)
Pr(X = 3) = 16 × 16 × 16
So the probability distribution of X is
x 0 1 2 3
Pr(X = x) 125216
75
216
15
216
1
216
.
23.3 Independence
We have seen that two events are independent
when the occurrence or non-occurrence of one
event does not affect the likelihood of the other
occurring. Similarly two random variables are
independent if the value of one does not affect
the likelihood that the other will take a certain
value.
Random variables X and Y are independent
if, for all x and y,
Pr(X = x ∧ Y = y) = Pr(X = x)Pr(Y = y).
Example. An integer is generated uniformly at
random from the set {10, 11, . . . , 29}. Let X and
Y be its first and second (decimal) digit. Then
X and Y are independent random variables be-
cause, for x ∈ {1, 2} and {0, 1, . . . , 9},
Pr(X = x ∧ Y = y) = 120
= 12 × 110
= Pr(X = x)Pr(Y = y).
23.4 Operations
From a random variable X, we can create new
random variables such as X + 1, 2X and X2.
These variables work as you would expect them
to.
Example. If X is the random variable with dis-
tribution
x −1 0 1
Pr(X = x) 16
1
3
1
2
,
then the distributions of X + 1, 2X and X2 are
y 0 1 2
Pr(X + 1 = y) 16
1
3
1
2
y −2 0 2
Pr(2X = y) 16
1
3
1
2
y 0 1
Pr(X2 = y) 13
2
3
.
23.5 Sums and products
From random variables X and Y we can define a
new random variable Z = X + Y . Working out
the distribution of Z can be complicated, how-
ever. We give an example below of doing this
when X and Y are independent.
Example. Let X and Y be independent ran-
dom variables with
x 0 1 2
Pr(X = x) 14
1
2
1
4
y 0 1 2 3
Pr(Y = y) 16
1
3
1
3
1
6
.
Let Z = X + Y . To find Pr(Z = z) for some
value of z, we must consider all the ways that
X + Y could equal z. For example, X + Y = 3
could occur as (X,Y ) = (0, 3), (X,Y ) = (1, 2)
or (X,Y ) = (2, 1). Because X and Y are inde-
pendent, we can find the probability that each
of these occur
Pr(X = 0 ∧ Y = 3) = 14 × 16 = 124 ,
Pr(X = 1 ∧ Y = 2) = 12 × 13 = 16 ,
Pr(X = 2 ∧ Y = 1) = 14 × 13 = 112 .
So, because the three are mutually exclusive,
Pr(X = 3) =
1
24
+
1
6
+
1
12
=
7
24
.
Doing similar calculations for each possible
value, we see that the distribution of Z is
z 0 1 2 3 4 5
Pr(Z = z) 124
1
6
7
24
7
24
1
6
1
24
.
The distribution of a product of two indepen-
dent random variables can be found in a similar
way.
Finding the distribution of sums or prod-
ucts of dependent random variables is even more
complicated. In general, this requires knowing
the probability of each possible combination of
values the variables can take.
Questions
23.1 An elevator is malfunctioning. Every
minute it is equally likely to ascend one
floor, descend one floor, or stay where it
is. When it begins malfunctioning it is on
level 5. Let X be the level it is on three
minutes later. Find the probability distri-
bution for X.
23.2 An integer is generated uniformly at ran-
dom from the set {11, 12, . . . , 30}. Let X
and Y be its first and second (decimal)
digit. Are the random variables X and Y
independent?
23.3 Let X and Y be independent random vari-
ables with distributions
x 0 2
Pr(X = x) 14
3
4
y 0 1 2 3
Pr(Y = y) 14
1
4
1
4
1
4
.
Find the probability distribution of Z =
X + Y .
Lecture 24: Expectation and variance
A standard die is rolled some number of
times and the average of the rolls is calculated.
If the die is rolled only once this average is just
the value rolled and is equally likely to be 1, 2,
3, 4, 5 or 6. If the die is rolled ten times, then
the average might be between 1 and 2 but this
is pretty unlikely – it’s much more likely to be
between 3 and 4. If the die is rolled ten thou-
sand times, then we can be almost certain that
the average will be very close to 3.5. We will
see that 3.5 is the expected value of a random
variable representing the die roll.
24.1 Expected value
When we said “average” above, we really meant
“mean”. Remember that the mean of a collec-
tion of numbers is the sum of the numbers di-
vided by how many of them there are. So the
mean of x1, . . . , xt is
x1+···+xt
t . The mean of 2,2,3
and 11 is 2+2+3+114 = 4.5, for example.
The expected value of a random variable is
calculated as a weighted average of its possible
values.
If X is a random variable with distribution
x x1 x2 · · · xt
Pr(X = x) p1 p2 · · · pt
,
then the expected value of X is
E[X] = p1x1 + p2x2 + · · ·+ ptxt.
Example. If X is a random variable represent-
ing a die roll, then
E[X] =
1
6
× 1 + 1
6
× 2 + · · ·+ 1
6
× 6 = 3.5.
Example. Someone estimates that each year
the share price of Acme Corporation has a 10%
chance of increasing by $10, a 50% chance of in-
creasing by $4, and a 40% chance of falling by
$10. Assuming that this estimate is good, are
Acme shares likely to increase in value over the
long term?
We can represent the change in the Acme
share price by a random variable X with distri-
bution
x −10 4 10
Pr(X = x) 25
1
2
1
10
.
Then
E[X] =
2
5
×−10 + 1
2
× 4 + 1
10
× 10 = −1
Because this value is negative, Acme shares will
almost certainly decrease in value over the long
term.
Notice that it was important that we
weighted our average using the probabilities
here. If we had just taken the average of -10,
4 and 10 we would have gotten the wrong an-
swer by ignoring the fact that some values were
more likely than others.
24.2 Law of large numbers
Our initial die-rolling example hinted that the
average of a large number of independent trials
will get very close to the expected value. This
is mathematically guaranteed by a famous the-
orem called the law of large numbers.
Let X1, X2, . . . be independent random vari-
ables, all with the same distribution and ex-
pected value µ. Then
lim
n→∞
1
n(X1 + · · ·+Xn) = µ.
24.3 Linearity of expectation
We saw in the last lecture that adding random
variables can be difficult. Finding the expected
value of a sum of random variables is easy if we
know the expected values of the variables.
If X and Y are random variables, then
E[X + Y ] = E[X] + E[Y ].
This works even ifX and Y are not independent.
Similarly, finding the expected value of a
scalar multiple of a random variable is easy if
we know the expected value of the variable.
If X is a random variable and s ∈ R, then
E[sX] = sE[X].
Example. Two standard dice are rolled. What
is the expected total?
Let X1 and X2 be random variables repre-
senting the first and second die rolls. From the
earlier example E[X1] = E[X2] = 3.5 and so
E[X1 +X2] = E[X1] + E[X2] = 3.5 + 3.5 = 7.
Example. What is the expected number of ‘11’
substrings in a binary string of length 5 chosen
uniformly at random?
For i = 1, . . . , 4, let Xi be a random vari-
able that is equal to 1 if the ith and (i + 1)th
bits of the string are both 1 and is equal to 0
otherwise. Then X1 + · · · + X4 is the number
of ‘11’ substrings in the string. Because the bits
are independent, Pr(Xi = 1) =
1
2 × 12 = 14 and
E[Xi] =
1
4 for i = 1, . . . , 4. So,
E[X1 + · · ·+X4] = E[X1]+ · · ·+E[X4] = 4
4
= 1.
Note that the variables X1, . . . , X4 in the above
example were not independent, but we were still
allowed to use linearity of expectation.
24.4 Variance
Think of the random variables X, Y and Z
whose distributions are given below.
x −1 99
Pr(X = x) 99100
1
100
y −1 1
Pr(Y = y) 12
1
2
z −50 50
Pr(Z = z) 12
1
2
These variables are very different. Perhaps X
corresponds to buying a raffle ticket, Y to mak-
ing a small bet on a coin flip, and Z to making
a large bet on a coin flip. However, if you only
consider expected value, all of these variables
look the same – they each have expected value
0.
To give a bit more information about a ran-
dom variable we can define its variance, which
measures how “spread out” its distribution is.
If X is a random variables with E[X] = µ,
Var[X] = E[(X − µ)2].
So the variance is a measure of how much we
expect the variable to differ from its expected
value.
Example. The variable X above will be 1
smaller than its expected value with probabil-
ity 99100 and will be 99 larger than its expected
value with probability 1100 . So
Var[X] =
99
100
× (−1)2 + 1
100
× 992 = 99.
Similarly,
Var[Y ] = 12 × (−1)2 + 12 × 12 = 1
Var[Z] = 12 × (−50)2 + 12 × 502 = 2500.
Notice that the variance of X is much smaller
than the variance of Z because X is very likely
to be close to its expected value whereas Z will
certainly be far from its expected value.
Example. Let X be a random variable with
distribution given by
x 0 2 6
Pr(X = x) 16
1
2
1
3
.
Then the expected value of X is
E[X] =
1
6
× 0 + 1
2
× 2 + 1
3
× 6 = 3.
So, the variance of X is
Var[X] =
1
6
×(0−3)2+1
2
×(2−3)2+1
3
×(6−3)2 = 5.
Questions
24.1 Do you agree or disagree with the following
statement? “The expected value of a ran-
dom variable is the value it is most likely
to take.”
24.2 Let X be the sum of 1000 spins of the spin-
ner from Lecture 21, and let Y be 1000
times the result of a single spin. Find E[X]
and E[Y ]. Which of X and Y do you think
would have greater variance?
24.3 Let X be the number of heads occurring
when three fair coins are flipped. Find
E[X] and Var[X].
Lecture 25: Discrete distributions
In this lecture we’ll introduce some of the
most common and useful (discrete) probability
distributions. These arise in various different
real-world situations.
25.1 Discrete uniform distribution
This type of distribution arises when we choose
one of a set of consecutive integers so that all
choices are equally likely.
The discrete uniform distribution with pa-
rameters a, b ∈ Z (a 6 b) is given by
Pr(X = k) = 1b−a+1 for k ∈ {a, a+ 1, . . . , b}.
We have E[X] = a+b2 and Var[X] =
(b−a+1)2−1
12 .
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
k
P
r(
X
=
k
)
Uniform distribution with a = 3, b = 8
25.2 Bernoulli distribution
This type of distribution arises when we have a
single process that succeeds with probability p
and fails otherwise. Such a process is called a
Bernoulli trial.
The Bernoulli distribution with parameter
p ∈ [0, 1] is given by
Pr(X = k) =
{
p for k = 1
1− p for k = 0.
We have E[X] = p and Var[X] = p(1− p).
25.3 Geometric distribution
This distribution gives the probability that, in a
sequence of independent Bernoulli trials, we see
exactly k failures before the first success.
The geometric distribution with parameter
p ∈ [0, 1] is given by
Pr(X = k) = p(1− p)k for k ∈ N.
We have E[X] = 1−pp and Var[X] =
1−p
p2
.
0 2 4 6 8 10 12
0
0.1
0.2
0.3
0.4
0.5
k
P
r(
X
=
k
)
Geometric distribution with p = 0.5
Example. If every minute there is a 1% chance
that your internet connection fails then the
probability of staying online for exactly x con-
secutive minutes is approximated by a geometric
distribution with p = 0.01. It follows that the
expected value is 1−0.010.01 = 99 minutes and the
variance is 1−0.01
(0.01)2
= 9900.
25.4 Binomial distribution
This distribution gives the probability that, in
a sequence of n independent Bernoulli trials, we
see exactly k successes.
The binomial distribution with parameters
n ∈ Z+ and p ∈ [0, 1] is given by
Pr(X = k) =
(
n
k
)
pk(1− p)n−k
for k ∈ {0, . . . , n}.
We have E[X] = np and Var[X] = np(1− p).
0 5 10 15 20
0
0.05
0.1
0.15
0.2
k
P
r(
X
=
k
)
Binomial distribution with n = 20, p = 0.5
Example. If 1000 people search a term on a
certain day and each of them has a 10% chance
of clicking a sponsored link, then the number of
clicks on that link is approximated by a binomial
distribution with n = 1000 and p = 0.1. It fol-
lows that the expected value is 1000×0.1 = 100
clicks and the variance is 1000× 0.1× 0.9 = 90.
25.5 Poisson distribution
In many situations where we know that an aver-
age of λ events occur per time period, this dis-
tribution gives a good model of the probability
that k events occur in a time period.
The Poisson distribution with parameter
λ ∈ R (where λ > 0) is given by
Pr(X = k) =
λke−λ
k!
for k ∈ N.
We have E[X] = λ and Var[X] = λ.
0 2 4 6 8 10 12 14 16
0
0.05
0.1
0.15
0.2
k
P
r(
X
=
k
)
Poisson distribution with λ = 4
Example. If a call centre usually receives 6 calls
per minute, then a Poisson distribution with
λ = 6 approximates the probability it receives
k calls in a certain minute. It follows that the
expected value is 6 calls and the variance is 6.
Questions
25.1 There is a 95% chance of a packet being
received after being sent down a noisy line,
and the packet is resent until it is received.
What is the probability that the packet is
received within the first three attempts?
25.2 A factory aims to have at most 2% of the
components it makes be faulty. What is
the probability of a quality control test of
20 random components finding that 2 or
more are faulty, if the factory is exactly
meeting its 2% target?
25.3 The number of times a machine needs ad-
justing during a day approximates a Pois-
son distribution, and on average the ma-
chine needs to be adjusted three times per
day. What is the probability it does not
need adjusting on a particular day?
Lecture 26: Recursion
Just as the structure of the natural numbers
supports induction as a method of proof, it sup-
ports induction as a method of definition or of
computation.
When used in this way, induction is usually
called recursion, and one speaks of a recursive
definition or a recursive algorithm.
26.1 Recursive Definitions
Many well known functions f(n) are most easily
defined in the “base step, induction step” for-
mat, because f(n + 1) depends in some simple
way on f(n).
The induction step in the definition is more
commonly called the recurrence relation for f ,
and the base step the initial value.
Example. The factorial f(n) = n!
Initial value. 0! = 1.
Recurrence relation. (k + 1)! = (k + 1)× k!
Many programming languages allow this
style of definition, and the value of the func-
tion is then computed by a descent to the initial
value.
For example, to compute 4!, the machine
successively computes
4! = 4× 3!
= 4× (3× 2!)
= 4× (3× (2× (1!)))
= 4× (3× (2× (1× 0!)))
which can finally be evaluated since 0! = 1.
Remark. The numbers 4, 3, 2, 1 have to be
stored on a “stack” before the program reaches
the initial value 0! = 1 which finally enables it
to evaluate 4!.
Thus a recursive program, though short,
may run slowly and even cause “stack overflow.”
Example. The Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, . . .
The nth number F (n) in this sequence is defined
by
Initial values. F (0) = 0, F (1) = 1.
Recurrence relation. F (k+1) = F (k)+F (k−1).
Remark. Using a recursive program to com-
pute Fibonacci numbers can easily lead to stack
overflow, because each value depends on two
previous values (each of which depends on an-
other two, and so on).
A more efficient way to use the recursive defi-
nition is to use three variables to store F (k+1),
F (k) and F (k − 1). The new values of these
variables, as k increases by 1, depend only on
the three stored values, not on all the previous
values.
26.2 Properties of recursively defined
functions
These are naturally proved by induction, using
a base step and induction step which parallel
those in the definition of the function.
Example. For n > 5, 10 divides n!
Proof Base step.
5! = 5× 4× 3× 2× 1 = 10× 4× 3,
hence 10 divides 5!.
Induction step. We have to show
10 divides k! =⇒ 10 divides (k + 1)!
Since (k + 1)! = (k + 1) × k! by the recurrence
relation for factorial, the induction step is clear,
and hence the induction is complete.
Example. F (0) + F (1) + · · · + F (n) = F (n +
2)− 1.
Proof Base step. F (0) = 0 = F (2)− 1,
because F (2) = 1.
Induction step. We have to show
F (0) + F (1) + · · ·+ F (k)
= F (k + 2)− 1
⇒ F (0) + F (1) + · · ·+ F (k + 1)
= F (k + 3)− 1.
Well,
F (0) + F (1) + · · ·+ F (k)
= F (k + 2)− 1
⇒ F (0) + F (1) + · · ·+ F (k + 1)
= F (k + 2) + F (k + 1)− 1,
by adding F (k + 1) to both sides
⇒ F (0) + F (1) + · · ·+ F (k + 1)
= F (k + 3)− 1
since F (k + 2) + F (k + 1) = F (k + 3)
by the Fibonacci recurrence relation
This completes the induction.
26.3 Problems with recursive solu-
tions
Sometimes a problem about n reduces to a prob-
lem about n− 1 (or smaller numbers), in which
case the solution may be a known recursively
defined function.
Example. Find how many n-bit binary
strings contain no two consecutive 0s.
We can divide this problem into two cases.
1. Strings which end in 1, e.g. 1101101.
In this case, the string before the 1 (110110
here) can be any (n−1) bit string with no
consecutive 0s.
2. Strings which end in 0, e.g. 1011010.
In this case, the string must actually end
in 10, to avoid consecutive 0s, but the
string before the 10 (10110 here) can be
any (n− 2) bit string with no consecutive
0s.
Thus among strings with no consecutive 0s we
find
1. Those with n bits ending in 1 correspond
to those with (n− 1) bits.
2. Those with n bits ending in 0 correspond
to those with (n− 2) bits.
Hence if we let f(n) be the number of such
strings with n bits we have
f(n) = f(n− 1) + f(n− 2).
This is the Fibonacci recurrence relation.
It can also be checked that
f(1) = 2 = F (3) and f(2) = 3 = F (4),
hence it follows (by induction) that
f(n) = number of n bit strings
with no consecutive 0s
= F (n+ 2).
Questions
26.1 A function s(n) is defined recursively by
Initial value: s(0) = 0
Recurrence relation: s(n+ 1) = s(n) + 2n+ 1
Write down the first few values of s(n),
and guess what function s is.
26.2 Check that the function s you guessed in
Question 26.1 satisfies
s(0) = 0 and s(n+ 1) = s(n) + 2n+ 1
(This proves by induction that your guess
is correct.)
26.3 If a sequence satisfies the Fibonacci recur-
rence relation,
f(n) = f(n− 1) + f(n− 2),
must it agree with the Fibonacci sequence
from some point onward?
Lecture 27: Recursive Algorithms
Recursion may be used to define functions
whose definition normally involves “ · · · ”, to give
algorithms for computing these functions, and to
prove some of their properties.
27.1 Sums
Example. 1 + 2 + 3 + · · ·+ n
This is the function f(n) defined by
Initial value. f(1) = 1
Recurrence relation. f(k + 1) = f(k) + (k + 1)
Example. 1 + a+ a2 + · · ·+ an
This is the function g(n) defined by
Initial value. g(0) = 1
Recurrence relation. g(k + 1) = g(k) + ak+1
We can use this relation to prove by induc-
tion that g(n) = a
n+1−1
a−1 (a formula for the sum
of a geometric series), provided a 6= 1.
Proof
Base step. For n = 0, 1 = g(0) = a
0+1−1
a−1 , as
required.
Induction step. We want to prove
g(k) =
ak+1 − 1
a− 1 ⇒ g(k + 1) =
ak+2 − 1
a− 1 .
Well,
g(k) =
ak+1 − 1
a− 1
⇒ g(k + 1) =a
k+1 − 1
a− 1 + a
k+1
⇒ g(k + 1) =a
k+1 − 1 + (a− 1)ak+1
a− 1
=
ak+2 + ak+1 − ak+1 − 1
a− 1
=
ak+2 − 1
a− 1 as required.
This completes the induction.
27.2 Products
Example. 1× 2× 3× · · · × n
This is the function n! defined recursively by
Initial value. 0! = 1
Recurrence relation. (k + 1)! = (k + 1)× k!
27.3 Sum and product Notation
1 + 2 + 3 + · · ·+ n is written
n∑
k=1
k,
1 + a+ a2 + · · ·+ an is written
n∑
k=0
ak.
Σ is capital sigma, standing for “sum.”
1× 2× 3× · · · × n is written
n∏
k=1
k.
Π is capital pi, standing for “product.”
27.4 Binary search algorithm
Given a list of n numbers in order
x1 < x2 < · · · < xn,
we can find whether a given number a is in the
list by repeatedly “halving” the list.
The algorithm binary search is specified
recursively by a base step and a recursive step.
Base step. If the list is empty,
report ‘a is not in the list.’
Recursive step If the list is not empty, see
whether its middle element is a. If so, report
‘a found.’
Otherwise, if the middle element m > a,
binary search the list of elements < m.
And if the middle element m < a, binary
search the list of elements > m.
27.5 Correctness
We prove that the algorithm works on a list of
n items by strong induction on n.
Base step. The algorithm works correctly on
a list of 0 numbers, by reporting that a is not in
the list.
Induction step. Assuming the algorithm
works correctly on any list of < k + 1 numbers,
suppose we have a list of k + 1 numbers.
The recursive step either finds a as the mid-
dle number in the list, or else produces a list of
< k+1 numbers to search, which by assumption
it will do correctly.
This completes the induction.
Remark. This example shows how easy it is to
prove correctness of recursive algorithms, which
may be why they are popular despite the prac-
tical difficulties in implementing them.
27.6 Running time
log2 n is the number x such that
n = 2x.
For example, 1024 = 210, and therefore
log2 1024 = 10.
Similarly log2 512 = 9, and hence log2 1000 is
between 9 and 10.
Repeatedly dividing 1000 by 2 (and discard-
ing remainders of 1) runs for 9 steps:
500, 250, 125, 62, 31, 15, 7, 3, 1
The 10 halving steps for 1024 are
512, 256, 128, 64, 32, 16, 8, 4, 2, 1
This means that the binary search algorithm
would do at most 9 “halvings” in searching a
list of 1000 numbers and at most 10 “halvings”
for 1024 numbers.
More generally, binary search needs at most
blog2 nc “halvings” to search a list of n numbers,
where blog2 nc is the floor of log2 n, the greatest
integer 6 log2 n.
Remark. In a telephone book with 1,000,000
names, which is just under 220, it takes at most
20 halvings (using alphabetical order) to find
whether a given name is present.
27.7 20 questions
A mathematically ideal way to play 20 questions
would be to divide the number of possibilities in
half with each question.
E.g. if the answer is an integer, do binary
search on the list of possible answers. If the
answer is a word, do binary search on the list
of possible answers (ordered alphabetically). If
this is done, then 20 questions suffice to find the
correct answer out of 220 = 1, 048, 576 possibili-
ties.
Questions
27.1 Rewrite the following sums using

nota-
tion.
• 1 + 4 + 9 + 16 + · · ·+ n2
• 1− 2 + 3− 4 + · · · − 2n
27.2 Which of the proofs in this lecture uses
strong induction?
27.3 Imagine a game where the object is to
identify a natural number between 1 and
220 using 20 questions with YES-NO an-
swers. The lecture explains why 20 ques-
tions are sufficient to identify any such
number.
Explain why less than 20 YES-NO ques-
tions are not always sufficient.
Lecture 28: Recursion, lists and sequences
A list or sequence of objects from a set X is
a function f from {1, 2, . . . , n} to X, or
(if infinite) from {1, 2, 3, . . .} to X.
We usually write f(k) as xk and the list as
〈x1, x2, . . . , xn〉, or 〈x1, x2, x3, . . .〉. Thus
f(1) = x1 = first item on list
f(2) = x2 = second item on list
...
The empty list is written 〈〉.
Example.
〈m, a, t, h, s〉 is a function f from {1, 2, 3, 4, 5}
into the English alphabet, with f(1) = m,
f(2) = a, etc.
28.1 Sequences
A sequence is also a list, but when we use the
term sequence we are usually interested in the
rule by which the successive terms t1, t2, . . . are
defined.
Often, the rule is a recurrence relation.
Example. Arithmetic sequence
a, a+ d, a+ 2d, a+ 3d, . . .
This is defined by
Initial value. t1 = a
Recurrence relation. tk+1 = tk + d
“Unfolding” this recurrence relation from tn
back to t1, we see that d gets added n−1 times,
hence
tn = a+ (n− 1)d.
Example. Geometric sequence
a, ar, ar2, ar3, . . .
Initial value. t1 = a
Recurrence relation. tk+1 = rtk
“Unfolding” tn, we see that multiplication
by r is done n− 1 times, hence
tn = ar
n−1.
The above recurrence relations are called
first order because tk+1 depends on only the
previous value, tk. (Or, because the values of
all terms follow from one initial value.)
A second order recurrence relation requires
two initial values, and is usually harder to un-
fold.
Example. A simple sequence in disguise
Initial values. t0 = 1, t1 = 2
Recurrence relation. tk+1 = 2tk − tk−1
Calculating the first values, we find
t2 = 2t1 − t0 = 2× 2− 1 = 3,
t3 = 2t2 − t1 = 2× 3− 2 = 4,
t4 = 2t3 − t2 = 2× 4− 3 = 5.
It looks like tn = n + 1, and indeed we can
prove this by induction. For the base step we
have the initial values t0 = 1 = 0 + 1 and
t1 = 2 = 1 + 1. We do the induction step by
strong induction: assuming tn = n + 1 for all
n 6 k, we deduce that tk+1 = k + 2.
In fact we have
tk+1 = 2tk − tk−1
by the recurrence relation
= 2(k + 1)− k
by our assumption
= 2k + 2− k = k + 2
as required.
This completes the induction.
Example. Fibonacci sequence
Initial values. t0 = 0, t1 = 1
Recurrence relation. tk+1 = tk + tk−1
It is possible to write tn directly as a func-
tion of n. The function is not at all obvious,
because it involves

5:
tn =
1√
5
((1 +√5
2
)n − (1−√5
2
)n)
.
We do not go into how to find such a formula in
this unit. However, if someone gave you such a
formula you could prove it is correct by induc-
tion.
28.2 Relations – homogeneous and in-
homogeneous
Recurrence relations such as
tk+1 = 2tk
and
tk+1 = tk + tk−1
in which each term is a multiple of some tj , are
called homogeneous.
The characteristic property of any homoge-
neous equation is that if tn = f(n) is a solution,
then so is tn = cf(n), for any constant c.
E.g. tn = 2
n is a solution of tk+1 = 2tk, and
so is tn = c2
n, for any constant c.
Relations like tk+1 = tk + 3, in which there
is a term other than the tj terms, are called in-
homogeneous.
Homogeneous recurrence relations are usu-
ally easier to solve, and in fact there is a gen-
eral method for solving them (which we will not
cover in this unit).
There is no general method for solving in-
homogeneous recurrence relations, though they
can often be solved if the term other than the tj
terms is simple.
Questions
28.1 Find the next four values of each of the fol-
lowing recurrence relations. What order is
each recurrence relation? Which are ho-
mogeneous and which are inhomogeneous?
(a) rk+1 = rk + k
2, r0 = 0.
(b) sk+1 = 3sk − 2sk−1, s0 = 1, s1 = 2.
(c) tk+1 = tk+tk−2+1, t0 = 1, t1 = 1, t2 = 1.
28.2 Let Tn be the number of ways of tiling a
2× n strip with 2× 1 tiles (which may be
rotated so they are 1 × 2). Find Tn for
n = 1, 2, 3, 4. Find a recurrence relation
for Tn.
28.3 Let Cn be the number of ways of writ-
ing down n pairs of brackets so that they
are correctly matched. Find Cn for n =
1, 2, 3, 4. Can you explain why Ck+1 =∑k
i=0CiCk−i?
Lecture 29: Graphs
A graph consists of a set of objects called ver-
tices and a list of pairs of vertices, called edges.
Graphs are normally represented by pic-
tures, with vertex A represented by a dot la-
belled A and each edge AB represented by a
curve joining A and B.
Such pictures are helpful for displaying data
or relationships, and they make it easy to recog-
nise properties which might otherwise not be no-
ticed.
The description by lists of vertices and edges
is useful when graphs have to be manipulated by
computer. It is also a useful starting point for
precise definitions of graph concepts.
29.1 Examples of graphs
Description Picture
Vertices: A,B,C
Edges: AB,BC,CA
B C
A
Such a graph, with at most one edge be-
tween each pair of vertices, and no vertex joined
to itself, is called a simple graph.
Description Picture
Vertices: A,B,C,D
Edges: AB,AB,BC,BC,
AD,BD,CD
A
D
C
B
The two edges AB which join the same pair
of vertices are called parallel edges.
Description Picture
Vertices: A,B
Edges: AA,AB,AB
A B
The edge joining A to A is called a loop.
The name multigraph is used when loops
and/or parallel edges are allowed.
Warning: A graph can be rep-
resented by pictures that look
very different. This last example
could be redrawn as:
A B
29.2 Problems given by graphs
Many problems require vertices to be connected
by a “path” of successive edges. We shall define
paths (and related concepts) next lecture, but
the following examples illustrate the idea and
show how often it comes up.
They also show how helpful it is to have
graph pictures when searching for paths.
1. Gray Codes
The binary strings of length n are taken as the
vertices, with an edge joining any two vertices
that differ in only one digit. This graph is called
the n-cube.
E.g. the 2-digit binary strings form a square
(a “2-cube”).
00 10
1101
and the 3-digit binary strings form an ordinary
cube (a “3-cube”).
000 010
011001
100 110
111101
A Gray code of length n is a path which in-
cludes each vertex of the n-cube exactly once.
E.g. here is a path in the 3-cube which gives the
Gray code
000, 001, 011, 010, 110, 111, 101, 100
000 010
011001
100 110
111101
Remark. The n-cube has been popular as a
computer architecture in recent years. Proces-
sors are placed at the vertices of an n-cube (for
n = 15 or so) and connected along its edges.
2. Travelling salesman problem
Vertices are towns. Two towns are joined by
an edge labelled ` if there is a road of length `
between them. E.g.
D
6
C
5
A
3
6
B
4
The problem is to find a path of minimal
length which includes all towns, in this case
BADC.
3. Jug problem
Suppose we have three jugs, which hold ex-
actly 3, 4 and 6 litres respectively. We fill the 6-
litre jug, and then pour from one jug to another,
always stopping when the jug being poured to
becomes full or when the jug being poured from
becomes empty.
Is it possible to reach a state where one jug
contains 1 litre and another contains 5 litres?
We represent each possible state by a triple
(a, b, c), where
a = number of litres in the 3-litre jug
b = number of litres in the 4-litre jug
c = number of litres in the 6-litre jug
Each state is a vertex, and if state (a′, b′, c′) can
be reached from (a, b, c) by pouring as described
above, we put a directed edge in the graph:
(a, b, c) (a′, b′, c′)
If (a, b, c) can also be reached from (a′, b′, c′), we
join them by an ordinary edge.
Then, listing the states that can be reached
from (0, 0, 6), we find the following graph.
(0, 4, 2)
(3, 1, 2)
(3, 3, 0)
(0, 0, 6)
(3, 0, 3)
(0, 3, 3)
(0, 1, 5)
(2, 4, 0)
A graph like this, with some directed edges, is
called a directed graph or digraph.
This particular digraph shows that
(0, 0, 6)→ (0, 4, 2)→ (3, 1, 2)→ (0, 1, 5)
hence we can start with a full 6-litre jug and in
three pourings get 1 litre in the 4-litre jug and
5 litres in the 6-litre jug.
Questions
29.1 Write down descriptions of the following
graphs
picture
B C
A D
D C
A B
C D
A B
description
29.2 Draw pictures of graphs with the following
descriptions.
description
A,B,C
AA,BB,CC
A,B,C,D
AB,AC,AD
picture
29.3 Use the following picture of the 4-cube to
find a Gray code of length 4.
Lecture 30: Walks, paths and trails
There are several ways of “travelling” around
the edges of a graph.
A walk is a sequence
V1, e1, V2, e2, V3, e3, . . . , en−1, Vn,
where each ei is an edge joining vertex Vi to ver-
tex Vi+1. (In a simple graph, where at most one
edge joins Vi and Vi+1, it is sufficient to list the
vertices alone.)
If Vn = V1 the walk is said to be closed.
A path is a walk with no repeated vertices.
A trail is a walk with no repeated edges.
30.1 Examples
In these pictures, a walk is indicated by a di-
rected curve running alongside the actual edges
in the walk.
A walk which is not a trail or a path. (Repeated
edge, repeated vertex.)
A trail which is not a path. (Repeated vertex.)
A nonclosed walk and a closed walk.
30.2 Adjacency matrix
If two vertices are joined by an edge we say that
they are adjacent.
A simple graphG with vertices V1, V2, . . . , Vn
is described by an adjacency matrix which has
(i, j) entry (ith row and jth column) aij given by
aij =
{
1 if Vi is adjacent to Vj in G,
0 otherwise.
For example, the graph
V2 V3
V1
has adjacency matrix
 0 0 10 0 1
1 1 0

30.3 Adjacency matrix powers
The product of matricesa11 a12 · · ·a21 a22 · · ·
· · · · · · · · ·
×
b11 b12 · · ·b21 b22 · · ·
· · · · · · · · ·

is the matrix whose (i, j) entry is
ai1b1j + ai2b2j + ai3b3j + · · · ,
–the “dot product” of the ith row
ai1 ai2 ai3 · · ·
of the matrix on the left with the jth column
b1j
b2j
b3j
...
of the matrix on the right.
The (i, j) entry in the kth power of the ad-
jacency matrix gives the number of walks of
length k between Vi and Vj . The length of a
walk is the number of “steps” (edges) in it.
For example, suppose we want the number of
walks of length 2 from V3 to V3 in the graph
V2 V3
V1
The adjacency matrix M tells us that the fol-
lowing edges exist.· · · · · · 1· · · · · · 1
1 1 0
 ← V1 toV3← V2 toV3
↑ ↑
V3 V3
to to
V1 V2
So when we square this matrix, the (3, 3) entry
in M2
(
1 1 0
)11
0
 = 1× 1 + 1× 1 = 2
counts the walks from V3 to V3, namely
V3 → V1 → V3 and V3 → V2 → V3.
Similarly, the (i, j) entry in M2 is the num-
ber of walks of length 2 from Vi to Vj . The (i, j)
entry in M3 is the number of walks of length 3
from to Vi to Vj , and so on.
In fact,
M2 ×M =
 1 1 01 1 0
0 0 2
×
 0 0 10 0 1
1 1 0

has (3, 2) entry
(
0 0 2
)00
1
 = 2.
Hence the number of walks of length 3 from V3
to V2 is 2.
30.4 General adjacency matrix
The adjacency matrix can be generalised to
multigraphs by making the (i, j) entry the num-
ber of edges from Vi to Vj (special case: count
each loop twice).
For example, the graph V1 V2
has adjacency matrix
N =
(
2 2
2 0
)
and so
N2 =
(
8 4
4 4
)
.
The (1, 1) entry 2×2+2×2 in N2, for example,
indicates that there are 8 walks of length 2 from
V1 to V1: 4 walks twice around the loop, and 4
walks from V1 to V2 (2 ways) then V2 to V1 (2
ways).
This count distinguishes between different
directions around the loop. It may help to re-
gard the loop as a pair of opposite directed loops.
We can generalise the adjacency matrix M
to directed graphs by letting the (i, j) entry be
the number of directed edges (which include or-
dinary edges) from Vi to Vj .
With this definition, the (i, j) entry of Mk
gives the number of directed walks of length k
from Vi to Vj (i.e. walks that obey the directions
of edges).
Questions
30.1 Draw the graph/digraph/multigraph with
adjacency matrix
M =
1 0 10 1 0
1 0 1
 ,
using V1, V2 and V3 as names for the ver-
tices corresponding to columns 1, 2 and 3
respectively.
30.2 Calculate M2, and use it to find the num-
ber of walks of length 2 from V1 to V3.
Does this calculation give the number you
would expect from the graph?
30.3 Without any calculation, show that the
middle row of any power Mk is (0 1 0).
Lecture 31: Degree
The degree of a vertex A in a graph G is the
number of times A occurs in the list of edges of
G.
For example, if G is A B
then the list of edges is AA, AB, AB, and hence
degree(A) = 4.
Intuitively speaking, degree(A) is the num-
ber of “ends” of edges occurring at A. In par-
ticular, a loop at A contributes 2 to the degree
of A.
31.1 The handshaking lemma
In any graph,
sum of degrees = 2× number of edges.
The reason for the name is that if each edge
is viewed as a handshake,
Lecture 31: Degree
Lecture 28. Degree
The degree of a vertex A in a graph G is the
number of times A occurs in the list of edges of
G. r'>v---.... For example, if G is � B
then the list of edges is AA, AB, AB and hence
degree(A) = 4.
Intuitively speaking, degree(A) is the number
of "ends" of edges occurring at A. In particular,
a loop at A contributes 2 to the degree of A.
31.1 The handshaking lemma
In any graph,
sum of degrees = 2 x number of edges.
The reason for the name is that if each edge is
viewed as a handshake, . � .
then at each vertex V
Hence
degree(V) = number of hands.
sum of degrees
= total number of hands
= 2 x number of handshakes
An important consequence
The handshaking lemma implies that in any
graph the sum of degrees is even (being
2xsomething). Thus it is impossible, e.g. for
a graph to have degrees 1,2,3,4,5.
31.2 The seven
Konigsberg
bridges of
In 18th century Konigsberg there were seven
bridges connecting islands in the river to the
banks as follows.
56
The question came up: is it possible for a walk
to cross all seven bridges without crossing the
same bridge twice?
An equivalent question is whether there is a
trail which includes all edges in the following
graph. A
� B D
31.3 Euler's solution
Euler (1737) observed that the answer is no, be­
cause
1. Each time a walk enters and leaves a vertex
it "uses up" 2 from the degree.
2. Hence if all edges are used by the walk, all
vertices except the first and last must have
even degree.
3. The seven bridges graph in fact has four ver­
tices of odd degree.
31.4 Euler's theorem
The same argument shows in general that
A graph with > 2 odd degree vertices
has no trail using all its edges.
And a similar argument shows
A graph with odd degree vertices
has no closed trail using all its edges.
(Because in this case the first and last vertex
are the same, and its degree is ''used up" by a
closed trail as follows: 1 at the start, 2 each time
through, 1 at the end.)
C
then at each vertex V
degree(V ) = number of hands.
Hence
sum of degrees
= total number of hands
= 2 × number of h ndshakes
An import n consequence
The handshaking lemma implies that in any
graph the sum of de rees is even (being
2×something). Thus it is impossible, e.g. for
a graph to have degrees 1,2,3,4,5.
31.2 The seven bridges of Ko¨nigsberg
In 18th century Ko¨nigsberg there were seven
bridges connecting islands in the river to the
banks as follows.
The question came up: is it possible for a
walk to cross all seven bridges without crossing
the same bridge twice?
An equivalent question is whether there is
a trail which ncludes all edges in the following
graph.
A
D
C
B
31.3 Euler’s solution
Euler (1737) observed that the answer is no, be-
cause
1. Each time a walk enters and leaves a ver-
tex it “uses up” 2 from the degree.
2. Hence if all edges are used by the walk, all
vertic s except the first and last must have
even degree.
3. The seven bridges graph in fact has four
vertices of odd degree.
31.4 Euler’s theorem
The same argument shows in general that
A graph with > 2 odd degree vertices has no
trail using all its edges.
And a similar argument shows
A graph with odd degree vertices has no
closed trail using all its edges.
(Because in this case the first and last vertex
are the same, and its degree is “used up” by a
closed trail as follows: 1 at the start, 2 each time
through, 1 at the end.)
31.5 The converse theorem
If, conversely, we have a graph G whose vertices
all have even degree, is there a closed trail using
all the edges of G?
Not necessarily. For example, G might be
disconnected :
We say a graph is connected if any two of
its vertices are connected by a walk ( or equiva-
lently, by a trail or a path). We call a trail using
all edges of a graph an Euler trail. Then we have
A connected graph with no odd degree ver-
tices has a closed Euler trail.
Such a closed trail can be constructed as follows:
1. Starting at any vertex V1 , follow a trail t1
as long as possible.
2. The trail t1 eventually returns to V1, be-
cause it can leave any other vertex it en-
ters. (Immediately after the start, V1 has
one “used” edge, and hence an odd num-
ber of “unused” edges. Any other vertex
has an even number of “unused” edges.)
3. If t1 does not use all edges, retrace it to
the first vertex V2 where t1 meets an edge
not in t1.
4. At V2 add a “detour” to t1 by following a
trail out of V2 as long as possible, not using
edges in t1. As before, this trail eventually
returns to its starting point V2, where we
resume the trail t1. Let t2 be the trail t1
plus the detour from V2.
5. If t2 does not use all the edges, retrace t2
to the first vertex V3 where t2 meets an
edge not in t2. Add a detour at V3, and so
on.
6. Since a graph has only a finite number of
edges, this process eventually halts. The
result will be a closed trail which uses all
the edges (this requires the graph to be
connected, since any unused edge would
be connected to used ones, and thus would
have eventually been used).
31.6 Bridges
A bridge in a connected graph G is an edge
whose removal disconnects G. E.g. the edge
B is a bridge in the following graph.
B
The construction of an Euler trail is improved
by the doing the following (Fleury’s algorithm).
• Erase each edge as soon as it is used.
• Use a bridge in the remaining graph only
if there is no alternative.
It turns out, when this algorithm is used, that it
is not necessary to make any detours. The im-
provement, however, comes at the cost of need-
ing an algorithm to recognise bridges.
Questions
31.1 For each of the following sequences, con-
struct a graph whose vertices have those
degrees, or explain why no such graph ex-
ists.
• 1, 2, 3, 4
• 1, 2, 1, 2, 1
• 1, 2, 2, 2, 1
31.2 A graph G has adjacency matrix
0 1 1 0 0
1 0 0 0 0
1 0 0 0 0
0 0 0 2 1
0 0 0 1 2

Decide, without drawing the graph,
whether G is connected or not.
31.3 A graph H has adjacency matrix
2 0 1 0 1
0 2 0 1 0
1 0 2 0 1
0 1 0 2 0
1 0 1 0 2

What are the degrees of its vertices? Does
H have a closed Euler trail?
Lecture 32: Trees
A cycle is a closed trail (with at least one
edge) that doesn’t repeat any vertex except that
it ends where it started. A tree is a connected
graph with no cycles.
For example,
is a tree.
32.1 The number of edges in a tree
A tree with n vertices has n− 1 edges.
The proof is by strong induction on n.
Base step. A tree with 1 vertex has 0 edges
(since any loop would itself be a cycle).
Induction step. Supposing any tree with
j 6 k vertices has j−1 edges, we have to deduce
that a tree with k + 1 vertices has k edges.
Well, given a tree Tk+1 with k + 1 vertices,
we consider any edge e in Tk+1, e.g.
A Be
Removing e disconnects the ends A and B
of e. (If they were still connected, by some path
p, then p and e together would form a cycle in
Tk+1, contrary to its being a tree.)
Thus Tk+1−{e} consists of two trees, say Ti
and Tj with i and j vertices respectively. We
have i + j = k + 1 but both i, j 6 k, so our
induction assumption gives
Ti has i− 1 edges, Tj has j − 1 edges.
But then Tk+1 = Ti ∪ Tj ∪ {e} has
(i− 1) + (j − 1) + 1 = (i+ j)− 1 = k edges, as
required.
Remarks
1. This proof also shows that any edge in a
tree is a bridge.
2. Since a tree has one more vertex than edge,
it follows that m trees have m more ver-
tices than edges.
3. The theorem also shows that adding any
edge to a tree (without adding a vertex)
creates a cycle. (Since the graph remains
connected, but has too many edges to be
a tree.)
These remarks can be used to come up with
several equivalent definitions of tree.
Next we see how any connected graph can
be related to trees.
32.2 Spanning trees
A spanning tree of a graph G is a tree contained
in G which includes all vertices of G.
For example,
is a spanning tree of
Any connected graph G contains a spanning
tree.
This is proved by induction on the number of
edges.
Base step. If G has no edge but is connected
then it consists of a single vertex. Hence G itself
is a spanning tree of G.
Induction step. Suppose any connected
graph with 6 k edges has a spanning tree, and
we have to find a spanning tree of a connected
graph Gk+1 with k + 1 edges.
If Gk+1 has no cycle then Gk+1 is itself a
tree, hence a spanning tree of itself.
If Gk+1 has a cycle p we can remove any edge
e from p and Gk+1 − {e} is connected (because
vertices previously connected via e are still con-
nected via the rest of p). Since Gk+1 − {e} has
one edge less, it contains a spanning tree T by
induction, and T is also a spanning tree of Gk+1.
Remark It follows from these two theorems
that a graph G with n vertices and n− 2 edges
(or less) is not connected.
If it were, G would have a spanning tree T ,
with the same n vertices. But then T would
have n− 1 edges, which is impossible, since it is
more than the number of edges of G.
32.3 The greedy algorithm
Given a connected graph with weighted edges,
a minimal weight spanning tree T of G may be
constructed as follows.
1. Start with T empty.
2. While T is not a spanning tree for
G, add to T an edge ek+1 of minimal
weight among those which do not close
a cycle in T , together with the vertices
of ek+1.
This is also known as Kruskal’s algorithm.
Remarks
1. T is not necessarily a tree at all steps of
the algorithm, but it is at the end.
2. For a graph with n vertices, the algorithm
runs for n − 1 steps, because this is the
number of edges in a tree with n vertices.
3. The algorithm is called “greedy” because
it always takes the cheapest step available,
without considering how this affects future
steps. For example, an edge of weight 4
may be chosen even though this prevents
an edge of length 5 being chosen at the
next step.
4. The algorithm always works, though this
is not obvious, and the proof is not re-
quired for this course. (You can find it,
e.g. in Chartrand’s Introductory Graph
Theory.)
5. Another problem which can solved by a
“greedy” algorithm is splitting a natural
number n into powers of 2. Begin by
subtracting the largest such power 2m 6
n from n, then repeat the process with
n− 2m, etc.
Questions
32.1 Which of the following graphs are trees?
In each case we insist that m 6= n.
• vertices 1, 2, 3, 5, 7
an edge between m and n
if m divides n or n divides m
• vertices 1, 2, 3, 4, 5
an edge between m and n
if m divides n or n divides m
• vertices 2, 3, 4, 5, 6
an edge between m and n
if m divides n or n divides m
32.2 Find spanning trees of the following
graphs (cube and dodecahedron).
32.3 Also find spanning trees of the cube and
dodecahedron which are paths.
Lecture 33: Trees, queues and stacks
To search a graph G systematically, it helps
to have a spanning tree T , together with an or-
dering of the vertices of T .
33.1 Breadth first ordering
The easiest ordering to understand is called
breadth first, because it orders vertices “across”
the tree in “levels.”
Level 0 is a given “root” vertex.
Level 1 is the vertices one edge
away from the root.
Level 2 are the vertices two edges
away from the root,
. . . and so on.
Example.
A,B,C,D,E, F,G
is a breadth first ordering of
A
B C
DEF G
33.2 Queues
Breadth first ordering amounts to putting ver-
tices in a queue - a list processed on a “first
come, first served” or “first in, first out” basis.
• The root vertex is first in the queue (hence
first out).
• Vertices adjacent to the head vertex v in
the queue go to the tail of the queue (hence
they come out after v), if they are not al-
ready in it.
• The head vertex v does not come out of
the queue until all vertices adjacent to v
have gone in.
33.3 Breadth first algorithm
For any connected graph G, this algorithm not
only orders the vertices of G in a queue Q, it
also builds a spanning tree T of G by attaching
each vertex v to a “predecessor” among the ad-
jacent vertices of v already in T . An arbitrary
vertex is chosen as the root V0 of T .
1. Initially, T = tree with just one vertex V0,
Q = the queue containing only V0.
2. While Q is nonempty
2.1. Let V be the vertex at the head of Q
2.2. If there is an edge e = VW in G
where W is not in T
2.2.1. Add e and W to T
2.2.2. Insert W in Q (at the tail).
2.3. Else remove V from Q.
Remarks
1. If the graph G is not connected, the al-
gorithm gives a spanning tree of the con-
nected component containing the root ver-
tex A, the part of G containing all vertices
connected to A.
2. Thus we can recognise whether G is con-
nected by seeing whether all its vertices
are included when the algorithm termi-
nates.
3. Being able to recognise connectedness en-
ables us, e.g., to recognise bridges.
Example.
If G =
D
B C
A
E
, with root vertex A.
Then Q and T grow as follows:
Step Q T
1 A A
2 AB B
A
3 ABC B
A
C
4 BC
5 BCD
B
D
A
C
6 BCDE
B
D
A
C
E
7 CDE
8 DE
9 E
33.4 Depth first algorithm
This is the same except it has a stack S instead
of a queue Q. S is “last in, first out,” so we
insert and remove vertices from the same end of
S (called the top of the stack).
1. Initially, T = tree with just one vertex V0,
S = the stack containing only V0.
2. While S is nonempty
2.1. Let V be the vertex at the top of S
2.2. If there is an edge e = VW in G
where W is not in T
2.2.1. Add e and W to T
2.2.2. Insert W in S (at the top).
2.3. Else remove V from S.
Remark. The breadth first and depth first al-
gorithms give two ways to construct a spanning
tree of a connected graph.
Example.
We use the same G, and take the top of S to be
its right hand end.
Step S T
1 A A
2 AB B
A
3 ABC B
A
C
4 ABCE
B
A
C
E
4 ABCED
B
D
A
C
E
6 ABCE
7 ABC
8 AB
9 A
Questions
33.1 The following list gives the state, at suc-
cessive stages, of either a queue or a stack.
A
AB
ABC
BC
BCD
CD
D
Which is it: a queue or a stack?
33.2 Construct a breadth first spanning tree for
the graph
DB
C
A
E
33.3 Construct a depth first spanning tree for
the graph in Question 33.2.
Useful notation
Sets of numbers
N the set of natural numbers {0, 1, 2, 3, . . .}
Z the set of integers {. . . ,−2,−1, 0, 1, 2, . . .}
Q the set of rational numbers {ab : a, b ∈ Z, b 6= 0}
R the set of real numbers
Number Theory
a | b a divides b b = qa for some q ∈ Z
gcd(a, b) greatest common divisor of a and b
a ≡ b (mod n) a and b are congruent modulo n n | (a− b)
Logic
¬p not p
p ∧ q p and q
p ∨ q p or q
p→ q p implies q
∀x for all x
∃x there exists x
Sets
x ∈ A x is an element of A
{x : P (x)} the set of x such that P (x)
|A| the number of elements in A
A ⊆ B A is a subset of B
A ∩B A intersect B {x : x ∈ A ∧ x ∈ B}
A ∪B A union B {x : x ∈ A ∨ x ∈ B}
A−B set difference A minus B {x : x ∈ A ∧ x /∈ B}
Functions
f : A→ B f is a function from A to B
Probability
Pr(E) probability of E
Pr(A|B) conditional probability of A given B
E[X] expected value of X
Var[X] variance of X
Sums and products
b∑
i=a
f(i) sum of f(i) from i = a to i = b f(a) + f(a+ 1) + · · ·+ f(b)
b∏
i=a
f(i) product of f(i) from i = a to i = b f(a)× f(a+ 1)× · · · × f(b)
Useful formulas
Logic Laws
p↔ q ≡ (p→ q) ∧ (q → p)
p→ q ≡ (¬p) ∨ q
¬¬p ≡ p
p ∧ p ≡ p
p ∨ p ≡ p
p ∧ q ≡ q ∧ p
p ∨ q ≡ q ∨ p
p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r
p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r
p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r)
p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r)
¬(p ∧ q) ≡ (¬p) ∨ (¬q)
¬(p ∨ q) ≡ (¬p) ∧ (¬q)
p ∧ T ≡ p
p ∨ F ≡ p
p ∧ F ≡ F
p ∨ T ≡ T
p ∧ (¬p) ≡ F
p ∨ (¬p) ≡ T
p ∧ (p ∨ q) ≡ p
p ∨ (p ∧ q) ≡ p
¬∀xP (x) ≡ ∃x¬P (x)
¬∃xP (x) ≡ ∀x¬P (x)
Ordered selections without repetition
n(n− 1) · · · (n− r + 1) = n!
(n− r)!
Unordered selections without repetition
n(n− 1) · · · (n− r + 1)
r!
=
n!
r!(n− r)! =
(
n
r
)
Ordered selections with repetition
nr
Unordered selections with repetition(
n+ r − 1
r
)
=
(n+ r − 1)!
r!(n− 1)!
Binomial theorem
(x+ y)n =
(
n
0
)
xny0 +
(
n
1
)
xn−1y1 +
(
n
2
)
xn−2y2+
· · ·+ ( nn−1)x1yn−1 + (nn)x0yn
Conditional probability
Pr(A|B) = Pr(A ∩B)
Pr(B)
Bayes’ theorem
Pr(A|B) = Pr(B|A)Pr(A)
Pr(B|A)Pr(A) + Pr(B|A)Pr(A)
Discrete uniform distribution
Pr(X = k) =
1
b− a+ 1 for k ∈ {a, a+ 1, . . . , b}
E[X] = a+b2 ,Var[X] =
(b−a+1)2−1
12
Bernoulli distribution
Pr(X = k) =
{
p for k = 1
1− p for k = 0
E[X] = p, Var[X] = p(1− p)
Geometric distribution
Pr(X = k) = p(1− p)k for k ∈ N
E[X] = 1−pp , Var[X] =
1−p
p2
Binomial distribution
Pr(X = k) =
(
n
k
)
pk(1− p)n−k for k ∈ {0, . . . , n}
E[X] = np, Var[X] = np(1− p)
Poisson distribution
Pr(X = k) =
λke−λ
k!
for k ∈ N
E[X] = λ, Var[X] = λ

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468