Contents Lecture 1: What is MAT1830 about? Lecture 2: Divisors and primes Lecture 3: Congruences Lecture 4: Logic Lecture 5: Tautologies and logical equivalence Lecture 6: Rules of inference Lecture 7: Predicates and quantifiers Lecture 8: Predicate logic Lecture 9: Mathematical induction Lecture 10: Induction and well-ordering Lecture 11: Sets Lecture 12: Operations on sets Lecture 13: Functions Lecture 14: Examples of functions Lecture 15: Composition and inversion Lecture 16: Relations Lecture 17: Equivalence relations Lecture 18: Order relations Lecture 19: Selections and arrangements Lecture 20: Pascal’s triangle Lecture 21: Probability Lecture 22: Conditional probability and Bayes’ theorem Lecture 23: Random variables Lecture 24: Expectation and variance Lecture 25: Discrete distributions Lecture 26: Recursion Lecture 27: Recursive algorithms Lecture 28: Recursion, lists and sequences Lecture 29: Graphs Lecture 30: Walks, paths and trails Lecture 31: Degree Lecture 32: Trees Lecture 33: Trees, queues and stacks Unit Information MAT1830 – Semester 2 2018 Course coordinator • Name: Prof. David Wood • Office: 9Rnf/424 (Clayton) • Email:
[email protected] Lecturer (Malaysia) • Name: Dr. Tham Weng Kee • Email:
[email protected] Moodle Page The course’s moodle page can be accessed through the my.monash portal at https://my.monash.edu.au/ Unit Guide The full unit guide can be found on the moodle page. It contains a topic outline and information on assessment, special consideration, plagiarism etc. Course Information • Lectures: Three hours per week (lectures run in weeks 1–12). – Wed 4:00 – 6:00 – Thu 3:00 – 4:00 • Support classes: One per week as allocated (support classes run in weeks 2–12). • Required materials: Course notes booklet – available as a pdf from the course Moodle page. There is no required textbook for the course. Recordings of the lectures will be available through the moodle page. Assessment • Ten assignments worth 4% each (one due each week from week 3 to week 12). • Final examination worth 60% (held in the examination period). In order to pass the course you must receive at least 50% overall AND at least 40% for the assignments AND at least 40% for the exam. The assignments will be issued in lectures and will be available from the course Moodle page. Assign- ments are to be submitted to your tutor during your support class. They will then be marked and returned to you in your next support class. No calculators or other materials will be allowed in the final exam. Lecture 1: What is MAT1830 about? Discrete mathematics studies objects which have distinct separated values (e.g. integers), as opposed to objects which vary smoothly (e.g. real numbers). You can think of it as being “digital” mathematics rather than “analogue” mathematics. Discrete mathematics is particularly impor- tant in computer science and the two fields are very closely linked. This course covers a wide range of topics in discrete mathematics including the following: • Numbers • Logic • Induction and recursion • Sets, functions and relations • Probability • Graph theory 1.1 What to expect What we do here might be a bit different to a lot of the maths you’ve done in the past. We’ll be concentrating on really understanding the con- cepts, rather than simply learning how to solve certain types of questions. For a lot of the questions we ask, there won’t be a single fixed procedure you can apply to get the answer. Instead, you’ll have to think care- fully about what the question is asking and try to work out what is really going on. Don’t be afraid to try different things, play around, and look at examples. We’ll also be emphasising the importance of proving results. 1.2 Proofs A proof is essentially just a water-tight argu- ment that a certain statement must be true. As we’ll see, even if you are pretty sure that some- thing is true, it can be really useful to have a proof of it, for a number of reasons. 1.3 Maths in computer science As we mentioned above, maths and computer science are very closely related. The topics in this course all have many applications to com- puter science. For example: • Number theory is used in cryptography to enable secure communication, identity verification, online banking and shopping etc. • Logic is used in digital circuit design and in program control flow. • Induction and recursion are used to study algorithms and their effectiveness. • Functions are important in the theory of programming and relations are vital in database theory and design. • Probability is vital for understanding ran- domised algorithms and for creating sys- tems to deal with uncertain situations. • Graph theory is used in software which solves allocation and scheduling problems. Questions 1.1 What maths that you’ve done in the past would count as discrete? What would count as continuous instead? Are there grey areas? 1.2 Why might proofs be important to math- ematicians and computer scientists? 1.3 Can you think of other links between maths and computer science? Lecture 2: Divisors and primes We say that integer a divides integer b if b = qa for some integer q. Example. 2 divides 6 because 6 = 3× 2. This is the same as saying that division with remainder gives remainder 0. Thus a does not divide b when the remainder is 6= 0. Example. 3 does not divide 14 because it leaves remainder 2: 14 = 4× 3 + 2. When a divides b we also say: • a is a divisor of b, • a is a factor of b, • b is divisible by a, • b is a multiple of a. 2.1 Primes A positive integer p > 1 is a prime if its only positive integer divisors are 1 and p. Thus the first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, . . . The number 1 is not counted as a prime, as this would spoil the Fundamental Theorem of Arithmetic. Each integer > 1 can be expressed in exactly one way, up to order, as a product of primes. Example. 210 = 2 × 3 × 5 × 7, and this is the only product of primes which equals 210. This would not be true if 1 was counted as a prime, because many factorisations involve 1. E.g. 210 = 1× 2× 3× 5× 7 = 12× 2× 3× 5× 7 = . . . 2.2 Recognising primes If an integer n > 1 has a divisor, it has a divisor 6 √n, because for any divisor a > √n we also have the divisor n/a, which is < √ n. Thus to test whether 10001 is prime, say, we only have to see whether any of the numbers 2, 3, 4, . . . 6 100 divide 10001, since √ 10001 < 101. (The least divisor found is in fact 73, be- cause 10001 = 73× 137.) This explains a common algorithm for recog- nising whether n is prime: try dividing n by a = 2, 3, . . . while a 6 √n. The algorithm is written with a boolean vari- able prime, and n is prime if prime = T (true) when the algorithm terminates. assign a the value 2. assign prime the value T. while a 6 √n and prime= T if a divides n give prime the value F else increase the value of a by 1. 2.3 Finding divisors This algorithm also finds a prime divisor of n. Either the least a 6 √n which divides n, or, if we do not find a divisor among the a 6 √n, n itself is prime. 2.4 The greatest common divisor of two numbers It is remarkable that we can find the greatest common divisor of positive integers m and n, gcd(m,n), without finding their prime divisors. This is done by the famous Euclidean al- gorithm, which repeatedly divides the greater number by the smaller, keeping the smaller num- ber and the remainder. Euclidean Algorithm. Input: positive integers m and n withm > n Output: gcd(m,n) a := m, b := n r := remainder when a is divided by b while r 6= 0 do a := b b := r r := remainder when a is divided by b end return b Example. m = 237, n = 105 The first values are a = 237, b = 105, so r = 237− 2× 105 = 27. The next values are a = 105, b = 27, so r = 105− 3× 27 = 24. The next values are a = 27, b = 24, so r = 27− 1× 24 = 3. The next values are a = 24, b = 3, so r = 24− 8× 3 = 0. Thus the final value of b is 3, which is gcd(237, 105). This can be set out more neatly: 237 = 2 × 105 + 27 105 = 3 × 27 + 24 27 = 1 × 24 + 3 24 = 8 × 3 + 0 2.5 The Euclidean algorithm works! We start with the precondition m > n > 0. Then the division theorem tells us there is a re- mainder r < b when a = m is divided by b = n. Repeating the process gives successively smaller remainders, and hence the algorithm eventually returns a value. That the value returned value is actually gcd(m,n) relies on the following fact. Fact. If a, b and k are integers, then gcd(a− kb, b) = gcd(a, b). By using this fact repeatedly, we can show that after each execution of the while loop in the al- gorithm gcd(b, r) = gcd(m,n). When the algo- rithm terminates, this means b = gcd(b, 0) = gcd(m,n). (Equivalently, in the neat set out given above, the gcd of the numbers in the last two columns is always gcd(m,n).) 2.6 Extended Euclidean algorithm If we have used the Euclidean algorithm to find that gcd(m,n) = d, we can “work backwards” through its steps to find integers a and b such that am+ bn = d. Example. For our m = 237, n = 105 example above: 3 = 27− 1×24 3 = 27− 1(105− 3×27) = −105 + 4×27 3 = −105 + 4(237− 2×105) = 4×237− 9×105 So we see that a = 4 and b = −9 is a solution in this case. Our first line above was a rearrangement of the second last line of our original Euclidean al- gorithm working. In the second line we made a substitution for 24 based on the second line of our original Euclidean algorithm working. In the third line we made a substitution for 27 based on the first line of our original Euclidean algorithm working. Questions 2.1 Write down multiples of 13, and multiples of 21, until you find a multiple of 13 and a multiple of 21 which differ by 1. 2.2 Can a multiple of 15 and a multiple of 21 differ by 1? If not, what is the small- est positive difference between such mul- tiples? 2.3 Find gcd(13, 21) and gcd(15, 21), and sug- gest how they are related to the results in Questions 2.1 and 2.2. 2.4 Work out the prime factorisations of 999 and 1000. 2.5 You should find no common prime factor of 999 and 1000. How could you see this without factorising the numbers? (Hint: a common divisor of 1000 and 999 is also a common divisor of . . . what?) Lecture 3: Congruences We’re used to classifying the integers as ei- ther even or odd. The even integers are those that can be written as 2k for some integer k. The odd integers are those that can be written as 2k + 1 for some integer k. even . . . ,−6,−4,−2, 0, 2, 4, 6, . . . odd . . . ,−5,−3,−1, 1, 3, 5, . . . This classification is useful because even and odd integers have particular properties. For ex- ample, the sum of any two odd integers is even. Similarly we can split the integers into three classes: those that are 3k for some integer k, those that are 3k + 1 for some integer k, and those that are 3k + 2 for some integer k. 3k . . . ,−9,−6,−3, 0, 3, 6, 9, . . . 3k + 1 . . . ,−8,−5,−2, 1, 4, 7, 10, . . . 3k + 2 . . . ,−7,−4,−1, 2, 5, 8, 11, . . . These classes also have particular properties. For example, the sum of an integer in the sec- ond class and an integer in the third class will always be in the first class. We don’t have to stop with 3. We could di- vide integers into 4 different classes according to their remainders when divided by 4, and so on. 3.1 Congruences Let n > 2 be an integer. We say integers a and b are congruent modulo n and write a ≡ b (mod n) when n divides a− b. Example. 19 ≡ 13 (mod 6) because 6 divides 19-13 12 ≡ 20 (mod 4) because 4 divides 20-12 22 ≡ 13 (mod 3) because 3 divides 22-13 3.2 Working with congruences When working with congruences modulo some fixed integer n, we can “substitute in” just like we can with equalities. If a ≡ b (mod n) and b ≡ c (mod n), then a ≡ c (mod n). Example. Suppose x ≡ 13 (mod 7). Then x ≡ 6 (mod 7) because 13 ≡ 6 (mod 7). We can add, subtract and multiply congru- ences just like we can with equations. If a1 ≡ b1 (mod n) and a2 ≡ b2 (mod n), then • a1 + a2 ≡ b1 + b2 (mod n) • a1 − a2 ≡ b1 − b2 (mod n) • a1a2 ≡ b1b2 (mod n). Example. If x ≡ 3 (mod 8) and y ≡ 2 (mod 8), then • x+ y ≡ 5 (mod 8) • x− y ≡ 1 (mod 8) • xy ≡ 6 (mod 8). We can also deduce that x+ 4 ≡ 7 (mod 8), that 4x ≡ 12 (mod 8) and so on, because obvi- ously 4 ≡ 4 (mod 8). Note as well that 4x ≡ 12 (mod 8) can be simplified to 4x ≡ 4 (mod 8). In some situations we can also “divide through” a congruence by an integer. If a ≡ b (mod n) and d divides a, b and n, then a d ≡ bd (mod nd ). 3.3 Solving linear congruences Think of a congruence like 7x ≡ 5 (mod 9). This will hold if 9 divides 7x− 5 or in other words if there is an integer y such that 7x − 5 = 9y. So to solve our original congruence we can find an integer solution to 7x− 9y = 5. Some congruences don’t have solutions. For example, there is no solution to 10x ≡ 6 (mod 20) because there are no integers x and y such that 10x− 20y = 6. We can find an expression for all the integers x that satisfy a congruence like ax ≡ b (mod n) in the following way: 1. Find d = gcd(a, n). 2. If d doesnt divide b, then there are no so- lutions. 3. If d divides b, then divide through the con- gruence by d to get an equivalent congru- ence adx ≡ bd (mod nd ) 4. Find integers x′ and y′ such that adx ′ − n dy ′ = bd . The integers x that satisfy the original congruence are exactly those for which x ≡ x′ (mod nd ). Example. Find all integers x such that 36x ≡ 10 (mod 114). Using the Euclidean algorithm we find gcd(36, 114) = 6. So 6 divides 36x − 114y for any integers x and y, and consequently 36x − 114y 6= 10. This means that there are no integers x such that 36x ≡ 10 (mod 114). Example. Find all integers x such that 24x ≡ 8 (mod 44). Using the Euclidean algorithm we find gcd(24, 44) = 4. So we divide through by 4 to get the equivalent congruence 6x ≡ 2 (mod 11). Using the extended euclidean algorithm we see that 2×6−1×11 = 1, and hence 4×6−2×11 = 2. Thus the integers x such that 24x ≡ 8 (mod 44) are exactly the integers x ≡ 4 (mod 11). 3.4 Modular inverses A modular multiplicative inverse of an inte- ger a modulo n is an integer x such that ax ≡ 1 (mod n). From the last section we know that such an in- verse will exist if and only if gcd(a, n) = 1. If inverses do exist then we can find them using the extended Euclidean algorithm (there will be lots of inverses, but they will all be in one congruence class modulo n). These inverses have important applications to cryptography and random number generation. Example. 8 should have a multiplicative in- verse modulo 45 because gcd(8, 45) = 1. Using the extended Euclidean algorithm we see that −3 × 45 + 17 × 8 = 1. So 8 × 17 ≡ 1 (mod 45). This means that 17 is a multiplicative inverse of 8 modulo 45. Questions 3.1 Are the following true or false? • 6 ≡ 3 (mod 3) • 9 ≡ 18 (mod 8) • 5x+ 6 ≡ 2x (mod 3) 3.2 Prove all of the facts about congruences that were stated in this lecture (use the definition of congruence modulo n and the definition of divides). 3.3 Find an expression for all the integers x that satisfy 9x ≡ 36 (mod 60). Lecture 4: Logic The simplest and most commonly used part of logic is the logic of “and”, “or” and “not”, which is known as propositional logic. A proposition is any sentence which has a definite truth value (true= T or false= F), such as 1 + 1 = 2, or 10 is a prime number. but not What is your name? or This sentence is false. Propositions are denoted by letters such as p, q, r, . . . , and they are combined into com- pound propositions by connectives such as ∧ (and), ∨ (or) and ¬ (not). 4.1 Connectives ∧,∨ and ¬ ∧,∨ and ¬ are called “connectives” because they can be used to connect two sentences p and q into one. These particular connectives are de- fined so that they agree with the most com- mon interpretations of the words “and”, “or” and “not”. To define p∧ q, for example, we only have to say that p ∧ q is true only when p is true and q is true. We define ∧ by the following truth table: p q p ∧ q T T T T F F F T F F F F Similarly, p ∨ q is true when p is true or q is true, but now we have to be more precise, be- cause “or” has at least two meanings in ordinary speech. We define ∨ by the truth table p q p ∨ q T T T T F T F T T F F F This is the inclusive sense of “p or q” (often writ- ten “p and/or q” and meaning at least one of p, q is true). Finally, “not” ¬ (also called negation) is de- fined as follows. We define ¬ by the truth table p ¬p T F F T The connectives ∧, ∨ and ¬, are functions of the propositional variables p and q, which can take the two values T and F. For this reason, ∧, ∨ and ¬ are also called truth functions. 4.2 Implication Another important truth function is p → q, which corresponds to “if p then q” or “p implies q” in ordinary speech. In ordinary speech the value of p → q de- pends only on what happens when p is true. For example to decide whether MCG flooded→ the cricket is off it is enough to see what happens when the MCG is flooded. Thus we agree that p → q is true when p is false. We define → by the truth table p q p→ q T T T T F F F T T F F T 4.3 Other connectives Two other important connectives are↔ (“if and only if”) and ∨ (“exclusive or”). The sentence p↔ q is true exactly when the truth values of p and q agree. We define ↔ by the truth table p q p↔ q T T T T F F F T F F F T We could also write p↔ q as (p→ q)∧ (q → p). We’ll see how to prove this in the next lecture. The sentence p ∨ q is true exactly when the truth values of p and q disagree. We define ∨ by the truth table p q p ∨ q T T F T F T F T T F F F 4.4 Remarks 1. The symbols ∧ and ∨ are intentionally sim- ilar to the symbols ∩ and ∪ for set intersection and union because x ∈ A ∩B ⇔ (x ∈ A) ∧ (x ∈ B) x ∈ A ∪B ⇔ (x ∈ A) ∨ (x ∈ B) (We study sets later.) 2. The “exclusive or” function ∨ is written XOR in some programming languages. 3. If we write 0 for F and 1 for T then ∨ becomes the function p q p ∨ q 1 1 0 1 0 1 0 1 1 0 0 0 This is also known as the “mod 2 sum”, because 1 + 1 = 2 ≡ 0 (mod 2). (It could also be called the “mod 2 difference” because a+b is the same a− b, mod 2). 4. The mod 2 sum occurs in many homes where two switches p, q control the same light. The truth value of p ∨ q tells whether the light is on or not, and the light can be switched to the op- posite state by switching the value of either p or q. Questions 4.1 Which of the following are propositions? 1 + 1 = 3, 1 + 1, 3 divides 7, 3÷ 7 4.2 Let f be the proposition “foo” and let b be the proposition “bar”. Write the following propositions in symbols, using f, b,→ and ¬. • if foo, then bar. • bar if foo. • bar only if foo. • foo implies not bar. • foo is sufficient for bar. • foo is necessary for bar. 4.3 In the following examples, is the “or” in- tended to be inclusive or exclusive? • Would you like coffee or tea? • Oranges or lemons are a good source of vitamin C. • He will arrive in a minute or two. Lecture 5: Tautologies and logical equivalence A major problem in logic is to recognise statements that are “always true” or “always false”. 5.1 Tautologies and contradictions A sentence φ in propositional logic is a formula with variables p, q, r, . . . which can take the val- ues T and F. The possible interpretations of φ are all possible assignments of values to its vari- ables. A sentence in propositional logic is • a tautology if it has value T under all interpretations; • a contradiction if it has value F under all interpretations. We can check whether φ is a tautology, a contradiction, or neither by computing its value for all possible values of its variables. Example. (¬p) ∨ p For p = T we have (¬T) ∨ T = F ∨ T = T, using the known values of the ¬ and ∨ functions. For p = F we have (¬F) ∨ F = T ∨ F = T, hence (¬p) ∨ p is a tautology. (It is known as the law of the excluded middle). We can similarly compute the values of any truth function φ, so this is an algorithm for recognising tautologies. However, if φ has n variables, they have 2n sets of values, so the amount of computation grows rapidly with n. One of the biggest unsolved problems of logic and computer science is to find an efficient algo- rithm for recognising tautologies. 5.2 Logical equivalence Sentences φ and ψ are logically equivalent if they are the same truth function, which also means φ ↔ ψ is a tautology. This relation between sentences is written φ⇔ ψ or φ ≡ ψ. Example. p→ q ≡ (¬p) ∨ q We know p→ q has the truth table p q p→ q T T T T F F F T T F F T Now (¬p) ∨ q has the truth table p q ¬p (¬p) ∨ q T T F T T F F F F T T T F F T T So p → q and (¬p) ∨ q have the same truth table (looking just at their columns). It follows from this that p → q can always be rewritten as (¬p) ∨ q. In fact, all truth functions can be expressed in terms of ∧,∨, and ¬. This is like finding identities in algebra – one uses known equivalences to rearrange, expand and simplify. 5.3 Useful equivalences The following equivalences are the most fre- quently used in this “algebra of logic”. Equivalence law p↔ q ≡ (p→ q) ∧ (q → p) Implication law p→ q ≡ (¬p) ∨ q Double Negation law ¬¬p ≡ p Idempotent laws p ∧ p ≡ p p ∨ p ≡ p Commutative laws p ∧ q ≡ q ∧ p p ∨ q ≡ q ∨ p Associative laws p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r Distributive laws p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r) p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r) De Morgan’s laws ¬(p ∧ q) ≡ (¬p) ∨ (¬q) ¬(p ∨ q) ≡ (¬p) ∧ (¬q) Identity laws p ∧ T ≡ p p ∨ F ≡ p Annihilation laws p ∧ F ≡ F p ∨ T ≡ T Inverse laws p ∧ (¬p) ≡ F p ∨ (¬p) ≡ T Absorption laws p ∧ (p ∨ q) ≡ p p ∨ (p ∧ q) ≡ p Remarks 1. The commutative laws are used to rear- range terms, as in ordinary algebra. The law p ∨ q ≡ q ∨ p is like p + q = q + p in ordinary algebra, and p ∧ q ≡ q ∧ p is like pq = qp. 2. The associative laws are used to remove brackets. Since p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r, we can write either side as p ∨ q ∨ r. This is like p+ (q+ r) = (p+ q) + r = p+ q+ r in ordinary algebra. 3. The distributive laws are used to “expand” combinations of ∧ and ∨. p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r) is like p(q + r) = pq + pr. The other distributive law is not like anything in ordinary algebra. 4. Some of these laws are redundant, in the sense that other laws imply them. For example, the absorption law p ∧ (p ∨ q) ≡ p follows from the distributive, idempotent, iden- tity and annihilation laws: p ∧ (p ∨ q) ≡ (p ∧ p) ∨ (p ∧ q) by distributive law ≡ p ∨ (p ∧ q) by idempotent law ≡ (p ∧ T) ∨ (p ∧ q) by identity law ≡ p ∧ (T ∨ q) by distributive law ≡ p ∧ T by annihilation law ≡ p by identity law Questions 5.1 Explain why there are 8 ways to assign truth values to variables p, q, r; 16 ways to assign truth values to variables p, q, r, s; and in general 2n ways to assign truth val- ues to n variables. 5.2 Use truth tables to verify the de Morgan’s laws and absorption laws. 5.3 If p∨ q is the “exclusive or” discussed last lecture, see whether it satisfies the dis- tributive laws p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r) p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r) Lecture 6: Rules of inference Last time we saw how to recognise tautolo- gies and logically equivalent sentences by com- puting their truth tables. Another way is to in- fer new sentences from old by rules of inference. 6.1 The rule of replacement This rule says that any sentence may be replaced by an equivalent sentence. Any series of such re- placements therefore leads to a sentence equiv- alent to the one we started with. Using replacement is like the usual method of proving identities in algebra – make a series of replacements until the left hand side is found equal to the right hand side. Examples 1. x→ y ≡ (¬y)→ (¬x) Proof. x→ y ≡ (¬x) ∨ y by implication law ≡ y ∨ (¬x) by commutative law ≡ (¬¬y) ∨ (¬x) by law of double negation ≡ (¬y)→ (¬x) by implication law x→ y ≡ (¬y)→ (¬x) (¬y)→ (¬x) is the contrapositive of x→ y. Example. The contrapositive of MCG flooded→ cricket is off is Cricket is on→ MCG not flooded. An implication and its contrapositive are equivalent: they mean the same thing! 2. p→ (q → p) ≡ T Proof. p→ (q → p) ≡ (¬p) ∨ (q → p) by implication law ≡ (¬p) ∨ ((¬q) ∨ p) by implication law ≡ (¬p) ∨ (p ∨ (¬q)) by commutative law ≡ ((¬p) ∨ p) ∨ (¬q) by associative law ≡ (p ∨ (¬p)) ∨ (¬q) by commutatve law ≡ T ∨ (¬q) by inverse law ≡ T by annihilation law 3. ((p→ q) ∧ p)→ q ≡ T Proof. ((p→ q) ∧ p)→ q ≡ ¬((p→ q) ∧ p) ∨ q by implication law ≡ (¬(p→ q) ∨ (¬p)) ∨ q by de Morgan’s law ≡ ¬(p→ q) ∨ ((¬p) ∨ q) by associative law ≡ ¬(p→ q) ∨ (p→ q) by implication law ≡ (p→ q) ∨ ¬(p→ q) by commutative law ≡ T by inverse law This tautology says that “if p implies q and p is true then q is true”. 6.2 Modus ponens The tautology ((p→ q) ∧ p)→ q also translates into a rule of inference known as modus ponens: from sentences p → q and p we can infer the sentence q. 6.3 Logical consequence A sentence ψ is a logical consequence of a sen- tence φ, if ψ = T whenever φ = T. We write this as φ⇒ ψ. It is the same to say that φ→ ψ is a tautol- ogy, but φ⇒ ψ makes it clearer that we are dis- cussing a relation between the sentences φ and ψ. Any sentence ψ logically equivalent to φ is a logical consequence of φ, but not all conse- quences of ψ are equivalent to it. Example. p ∧ q ⇒ p p is a logical consequence of p∧q, because p = T whenever p ∧ q = T. However, we can have p ∧ q = F when p = T (namely, when q = F). Hence p ∧ q and p are not equivalent. This example shows that ⇒ is not symmet- ric: (p ∧ q)⇒ p but p; (p ∧ q) This is where⇒ differs from ≡, because if φ ≡ ψ then ψ ≡ φ. In fact, we build the relation ≡ from ⇒ the same way ↔ is built from →: φ ≡ ψ means (φ⇒ ψ) and (ψ ⇒ φ). Questions 6.1 The slogan “no pain, no gain” stands for an implication. What is it? 6.2 What is the contrapositive of “no pain, no gain”? 6.3 Write the following sentences as implica- tions, and then write their contrapositives. • You can’t make an omelette without breaking eggs. • If n is even, so is n2 • Haste makes waste. 6.4 Show that p → (q → (r → p)) is a tautol- ogy using the laws of logic. 6.5 Find a tautology with n variables which is p → (q → p) for n = 2 and p → (q → (r → p)) for n = 3. Lecture 7: Predicates and quantifiers We get a more expressive language than propositional logic by admitting predicates like P (n), Q(x, y), R(a, b, c) These stand for properties or relations such as P (n) : n is prime Q(x, y) : x 6 y R(a, b, c) : a+ b = c. Those with one variable, such as “n is prime,” are usually called properties, while those with two or more variables, such as “x 6 y,” are usu- ally called relations. 7.1 Predicates A predicate such as “n is prime” is not a sen- tence because it is neither true nor false. Rather, it is a function P (n) of n with the Boolean val- ues T (true) or F (false). In this case, P (n) is a function of natural numbers defined by P (n) = { T if n is prime F otherwise. Similarly, the “x 6 y” predicate is a function of pairs of real numbers, defined by R(x, y) = { T if x 6 y F otherwise. Since most of mathematics involves properties and relations such as these, only a language with predicates is adequate for mathematics (and computer science). 7.2 Building sentences from predi- cates One way to create a sentence from a predicate is to replace its variables by constants. For ex- ample, when P (n) is the predicate “n is prime,” P (3) is the sentence “3 is prime.” Another way is to use quantifiers: • ∀ (meaning “for all”) and • ∃ (meaning “there exists” or “there is”). Example. ∃nP (n) is the (true) sentence there exists an n such that n is prime. ∀nP (n) is the (false) sentence for all n, n is prime. Note that when ∃n is read “there exists an n” we also add a “such that.” 7.3 Quantifiers and connectives We can also combine quantifiers with connec- tives from propositional logic. Example. Let Sq(n) be the predicate “n is a square,” and let Pos(n) be the predicate “n is positive” as above. Then we can symbolise the following sentences: There is a positive square: ∃n(Pos(n) ∧ Sq(n)). There is a positive integer which is not a square: ∃n(Pos(n) ∧ ¬Sq(n)) All squares are positive: ∀n(Sq(n)→ Pos(n)) Notice that the “All. . . are” combination in English actually involves an implication. This is needed because we are making a claim only about squares and the implication serves to “narrow down” the set we are examining. 7.4 Alternating quantifiers Combinations of quantifiers like ∀x∃y . . . , “for all x there is a y . . .” are common in mathe- matics, and can be confusing. It helps to have some examples in mind to recall the difference between ∀x∃y . . . and ∃y∀x . . . The relation x < y is convenient to illustrate such combinations; we write x < y as the pred- icate L(x, y) Then ∀x∃yL(x, y) is the (true) sentence for all x there is a y such that x < y, which says that there is no greatest number. But with the opposite combination of quan- tifiers we have ∃y∀xL(x, y) is the false sentence there is a y such that for all x, x < y, which says there is a number greater than all numbers. Even though these statements are usually written without brackets they are effectively bracketed “from the centre”. So ∀x∃yL(x, y) means ∀x(∃yL(x, y)) and ∃y∀xL(x, y) means ∃y(∀xL(x, y)). 7.5 An example from Abraham Lin- coln You can fool all of the people some of the time and you can fool some of the people all of the time but you can’t fool all of the people all of the time. Let F (p, t) be the predicate: person p can be fooled at time t. Then ∀p∃tF (p, t) says you can fool all of the people some of the time, ∃p∀tF (p, t) says you can fool some of the people all of the time, ¬∀p∀tF (p, t) says you can’t fool all of the people all of the time. Hence Lincoln’s sentence in symbols is: ∀p∃tF (p, t) ∧ ∃p∀tF (p, t) ∧ ¬∀p∀tF (p, t) Remark. Another way to say “you can’t fool all of the people all of the time” is ∃p∃t¬F (p, t). Questions 7.1 Write “roses are red” in the language of predicate logic, using rose(x) for “x is a rose” red(x) for “x is red.” 7.2 If P (n) stands for “n is prime” and E(n) stands for “n is even,” what does P (n) ∧ (¬E(n)) say about n? 7.3 Using the predicates pol(x) for “x is a politician” liar(x) for “x is a liar” • all politicians are liars • some politicians are liars • no politicians are liars • some politicians are not liars. Are any of these sentences logically equiv- alent? Lecture 8: Predicate logic 8.1 Valid sentences The language of predicate logic is based on predicate symbols, variables, constants, brack- ets, ∀,∃ and connectives. The examples from last lecture illustrate how these ingredients are used to form sentences. A sentence in predicate logic is valid if it has value T under all interpretations. This is similar to the definition of a tautology in propositional logic. But now “all interpreta- tions” means all interpretations of the predicate symbols, which is more complicated. The inter- pretation of a symbol P (n), say, must include both the range of the variable n, as well as say- ing those n for which P (n) is true. 8.2 Interpretations For example, one interpretation of P (n) is “n is positive,” where n ranges over the real numbers. Under this interpretation, ∀nP (n) is false. A different interpretation of P (n) is “n is positive,” where n ranges over the numbers > 2. Under this interpretation, ∀nP (n) is true. Unlike in propositional logic, there are in- finitely many different interpretations of each formula. Thus there is no truth table method for predicate logic. We cannot decide whether a formula is valid by testing all interpretations. 8.3 Recognising valid sentences Nevertheless, in some cases, we can see that a sentence is true for all interpretations. Example. ∀xP (x) → ∃xP (x) is true for all properties P , and hence is valid. Likewise, we can sometimes see that a sen- tence is not valid by finding an interpretation which makes it false. Example. The sentence ∀x∃yQ(x, y)→ ∃x∀yQ(x, y) is false if we interpret Q(x, y) as x 6 y on the real numbers. With this interpretation ∀x∃yQ(x, y) is true (for any number there is a larger number), but ∃x∀yQ(x, y) is false (there is no number 6 all numbers). Hence the implication is false. 8.4 Consequence and equivalence As in propositional logic, a sentence ψ is a logi- cal consequence of a sentence φ if any interpre- tation which makes φ true makes ψ true. Again we write φ ⇒ ψ if ψ is a consequence of φ, and this is the same as saying φ→ ψ is valid. Example. Any interpretation which makes ∀nP (n) true makes ∃nP (n) true, and this is why ∀xP (x)→ ∃xP (x) is valid. Similarly, sentences ψ and φ are equivalent, written ψ ≡ φ, if each is a consequence of the other. Some sentences are equivalent for “propo- sitional logic reasons.” Example. We have ∀x(P (x) ∧Q(x)) ≡ ∀x(Q(x) ∧ P (x)) simply because P (x) ∧Q(x) ≡ Q(x) ∧ P (x) for any x. However there are also equivalences that genuinely involve quantifiers. 8.5 Useful equivalences Two important equivalences involving quanti- fiers are ¬∀xP (x) ≡ ∃x¬P (x) ¬∃xP (x) ≡ ∀x¬P (x) These make sense intuitively. For example, ¬∀xP (x) means P (x) is not true for all x, hence there is an x for which P (x) is false, that is, ∃x¬P (x). They can also be viewed as “infinite De Mor- gan’s laws.” If x ranges over {1, 2, 3, . . .} for ex- ample, then ∀xP (x) ≡ P (1) ∧ P (2) ∧ P (3) ∧ · · · and ∃xP (x) ≡ P (1) ∨ P (2) ∨ P (3) ∨ · · · Hence ¬∀xP (x) ≡ ¬ (P (1) ∧ P (2) ∧ P (3) ∧ · · · ) ≡ (¬P (1)) ∨ (¬P (2)) ∨ (¬P (3)) ∨ · · · by de Morgan’s law ≡ ∃x¬P (x). And similarly ¬∃xP (x) ≡ ¬ (P (1) ∨ P (2) ∨ P (3) ∨ · · · ) ≡ (¬P (1)) ∧ (¬P (2)) ∧ (¬P (3)) ∧ · · · by de Morgan’s law ≡ ∀x¬P (x). 8.6 Simplification The infinite de Morgan’s laws allow a certain simplification of predicate formulas by “pushing ¬ inside quantifiers.” Example. ¬∀x∃yQ(x, y) ≡ ∃x¬∃yQ(x, y) ≡ ∃x∀y¬Q(x, y). It is in fact possible to transform any quanti- fied statement in predicate logic to an equivalent with all quantifiers at the front. 8.7* Completeness and undecidabil- ity In 1930, Go¨del proved that there is a complete set of rules of inference for predicate logic. This means, in particular, that there is an algorithm to list the valid sentences. However, in 1936, Church and Turing proved that there is no algorithm to list the logically false sentences. This means, in particular, that predicate logic is undecidable: there is no algo- rithm which, for any sentence φ, decides whether φ is valid or not. This negative result is due to the power of predicate logic: it can express all mathemati- cal or computational problems, and it is known that some of these problems cannot be solved by algorithm. Questions 8.1 Give interpretations which make the fol- lowing sentences false. ∃nP (n)→ ∀nP (n) ∀x∀y (R(x, y)→ R(y, x)) ∀m∃nS(m,n) 8.2 Give interpretations which show that the sentences ∃x (P (x) ∧ L(x)) and ∃x (P (x) ∧ ¬L(x)) are not equivalent. 8.3 Is ∃y∀xR(x, y) a logical consequence of ∀x∃yR(x, y)? If so, explain why. If not, give an interpre- tation which makes ∀x∃yR(x, y) true and ∃y∀xR(x, y) false. 8.4 Is ∀x∃yR(x, y) a logical consequence of ∃y∀xR(x, y)? If so, explain why. If not, give an interpre- tation which makes ∃y∀xR(x, y) true and ∀x∃yR(x, y) false. 8.5 Explain why ¬∀p∀tF (p, t) ≡ ∃p∃t¬F (p, t). Lecture 9: Mathematical induction Since the natural numbers 0, 1, 2, 3, . . . are generated by a process which begins with 0 and repeatedly adds 1, we have the following. Property P is true for all natural numbers if 1. P (0) is true. 2. P (k)⇒ P (k + 1) for all k ∈ N. This is called the principle of mathematical induction. It is used in a style of proof called proof by induction, which consists of two steps. Base step: Proof that the required property P is true for 0. Induction step: Proof that if P (k) is true then P (k + 1) is true, for each k ∈ N. 9.1 Examples Example 1. Prove that 3 divides n3 + 2n for all n ∈ N Let P (n) be “3 divides n3 + 2n”. Base step. 3 divides 03 + 2 × 0 = 0, so P (0) is true. Induction step. We want to prove 3 divides k3 + 2k ⇒ 3 divides (k + 1)3 + 2(k + 1). Well, (k + 1)3 + 2(k + 1) = k3 + 3k2 + 3k + 1 + 2k + 2 = k3 + 2k + 3k2 + 3k + 3 = k3 + 2k + 3(k2 + k + 1). Therefore, 3 divides k3 + 2k ⇒ 3 divides k3 + 2k + 3(k2 + k + 1) ⇒ 3 divides (k + 1)3 + 2(k + 1) as required. This completes the induction step, and hence completes the proof. Example 2. Prove there are 2n n-bit binary strings. Let P (n) be “there are 2n n-bit binary strings”. Base step. There is 20 = 1 0-bit binary string (the empty string) so P (0) is true. Induction step. We want to prove that there are 2k k-bit binary strings ⇒ there are 2k+1 (k + 1)-bit binary strings Well, a (k+ 1)-bit binary string is either W0 or W1, where W is any k-bit binary string. Thus if there are 2k k-bit binary strings W , there are 2× 2k = 2k+1 (k + 1)-bit binary strings. This completes the induction step, and hence completes the proof. 9.2 Starting the base step higher It is not always appropriate to start the induc- tion at 0. Some properties are true only from a certain positive integer upwards, in which case the induction starts at that integer. Example 3. Prove n! > 2n for all integers n > 4 Let P (n) be “n! > 2n”. Base step. 4! = 4 × 3 × 2 = 24 > 16 = 24, so P (4) is true. Induction step. We want to prove k! > 2k ⇒ (k + 1)! > 2k+1 for all integers k > 4. Now, for k > 4, if k! > 2k, (k+1)! = (k+1)×k! > (k+1)×2k > 2×2k = 2k+1. (The first > holds because we are assuming k! > 2k and the second holds because k > 4.) Thus k! > 2k ⇒ (k + 1)! > 2k+1, as required to complete the induction. So n! > 2n for all n > 4. Example 4. Prove any integer value n > 8 (in cents) is obtainable with 3c and 5c stamps. Let P (n) be “n cents is obtainable with 3c and 5c stamps”. Base step. 8c can be obtained by a 3c plus a 5c stamp. So P (8) is true. Induction step. We have to show that if k cents is obtainable, so is (k + 1) cents, when k > 8. Case 1. The k cents is obtained using a 5c stamp (among others). Replace the 5c stamp by two 3c stamps, thus obtaining k + 1 cents. Case 2. If the k cents is obtained using only 3c stamps, there are at least three of them (since k > 8). In this case, replace three 3c stamps by two 5c stamps, again obtaining k + 1 cents. Thus in either case, when k > 8, P (k) ⇒ P (k + 1). This completes the induction proof that n cents are obtainable from 3c and 5c stamps, for all integers n > 8. 9.3 Sums of series Induction is often used to prove that sum for- mulas are correct. Example 5. Prove 1+2+3+· · ·+n = n(n+1)/2 for all integers n > 1. Let P (n) be “1 + 2 + 3 + · · ·+ n = n(n+ 1)/2”. Base step. When n = 1, the left hand side is 1, and the right hand side is 1(1 + 1)/2 = 2/2 = 1, so P (1) is true. Induction step. We have to prove that 1 + 2 + · · ·+ k = k(k+1)2 ⇒ 1 + 2 + · · ·+ k + (k + 1) = (k+1)(k+2)2 . Now, if P (k) is true, 1 + 2 + · · ·+ k + (k + 1) = (1 + 2 + · · ·+ k) + (k + 1) = k(k+1)2 + (k + 1) using P (k) = (k + 1)(k2 + 1) = (k+1)(k+2)2 as required. This completes the induction proof. Remark. Another proof is to write down 1 + 2 + 3 + · · ·+ (n− 1) + n n+ (n− 1) + · · ·+ 3 + 2 + 1 and observe that each of the n columns sums to n + 1. Thus the sum of twice the series is n(n + 1), and hence the sum of the series itself is n(n + 1)/2. One could argue that this proof uses induction stealthily, to prove that the sum of each column is the same. Questions In most induction problems set for students we skip the experimental part, which is finding what to prove. Before trying to prove that 3 divides n3 + 2n, for example, someone needs to guess that it is true, perhaps by trying n = 1, 2, 3, 4. 9.1 In this question, try to guess what ? stands for, by trying a few values of n. • ? divides n2 + n • The sum of the first n odd numbers is ? • 11×2 + 12×3 + 13×4 + . . .+ 1n(n+1) = 1−? 9.2 If you correctly guessed the sum 1 1×2 + 1 2×3 + 1 3×4 + . . .+ 1 n(n+1) , you might wonder why it is so simple. Here is a clue: 1 1×2 = 1 1 − 12 . What is 12×3? 1 3×4? How does this lead to a simple formula for 1 1×2 + 1 2×3 + 1 3×4 + . . .+ 1 n(n+1)? OK, if we can guess formulas correctly, why bother proving them by induction? The reason is that a statement which fits many values of n can still be wrong. 9.3 Show that n2 + n+ 41 is a prime number for n = 1, 2, 3, 4 (and go further, if you like). Do you think n2 + n + 41 is prime for all natural numbers n? Lecture 10: Induction and well-ordering In the previous lecture we were able to prove a property P holds for 0, 1, 2, . . . as follows: Base step. Prove P (0) Induction step. Prove P (k) ⇒ P (k + 1) for each natural number k. This is sufficient to prove that P (n) holds for all natural numbers n, but it may be difficult to prove that P (k + 1) follows from P (k). It may in fact be easier to prove the induction step P (0) ∧ P (1) ∧ · · · ∧ P (k)⇒ P (k + 1). That is, it may help to assume P holds for all numbers before k + 1. Induction with this style of induction step is sometimes called the strong form of mathematical induction. 10.1 Examples of strong induction Example 1. Prove that every integer > 2 is a product of primes. (Just a prime by itself counts as a “product”.) Let P (n) be “n is a product of primes”. Base step. 2 is a prime, hence a product of (one) prime. So P (2) is true. Induction step. Suppose 2, 3, . . . , k are products of primes. We wish to prove that k+1 is a prod- uct of primes. This is certainly true if k + 1 is a prime. If not k + 1 = i× j, for some natural numbers i and j less than k+1. But then i and j are products of primes by our assumption, hence so is i× j = k + 1. This completes the induction proof. Example 2. Prove that every positive integer is a sum of distinct powers of 2. (Just a power of two by itself counts as a “sum”.) The idea behind this proof is to repeatedly sub- tract the largest possible power of 2. We illus- trate with the number 27. 27− largest power of 2 less than 27 = 27− 16 = 11 11− largest power of 2 less than 11 = 11− 8 = 3 3− largest power of 2 less than 3 = 3− 2 = 1 Hence 27 = 16 + 8 + 2 + 1 = 24 + 23 + 21 + 20. (It is only interesting to find distinct powers of 2, because of course each integer > 1 is a sum of 1s, and 1 = 20.) The strong induction proof goes as follows. Let P (n) be “n is a sum of distinct powers of 2”. Base step. 1 = 20, so 1 is a sum of (one) power of 2. Thus P (1) is true. Induction step. Suppose each of the numbers 1, 2, 3, . . . , k is a sum of distinct powers of 2. We wish to prove that k+1 is a sum of distinct pow- ers of 2. This is certainly true if k + 1 is a power of 2. If not, let 2j be the greatest power of 2 less than k + 1. Then i = k + 1− 2j is one of the numbers 1, 2, 3, ..., k, and hence it is a sum of distinct powers of 2. Also, the powers of 2 that sum to i are all less than 2j , otherwise 2j is less than half k+ 1, contrary to the choice of 2j as the largest power of 2 less than k + 1. Hence k + 1 = 2j+ powers of 2 that sum to i is a sum of distinct powers of 2. This completes the induction proof. 10.2 Well-ordering and descent Induction expresses the fact that each natural number n can be reached by starting at 0 and going upwards (e.g. adding 1) a finite number of times. Equivalent facts are that it is only a finite number of steps downwards from any natural number to 0, that any descending sequence of natural numbers is finite, and that any set of natural numbers has a least element. This property is called well-ordering of the natural numbers. It is often convenient to ar- range a proof to “work downwards” and appeal to well-ordering by saying that the process of working downwards must eventually stop. Such proofs are equivalent to induction, though they are sometimes called “infinite de- scent” or similar. 10.3 Proofs by descent Example 1. Prove that any integer > 2 has a prime divisor. If n is prime, then it is a prime divisor of itself. If not, let n1 < n be a divisor of n. If n1 is prime, it is a prime divisor of n. If not, let n2 < n1 be a divisor of n1 (and hence of n). If n2 is prime, it is a prime divisor of n. If not, let n3 < n2 be a divisor of n2, etc. The sequence n > n1 > n2 > n3 > · · · must eventually terminate, and this means we find a prime divisor of n. Example 2. Prove √ 2 is irrational. Suppose that √ 2 = m/n for natural numbers m and n. We will show this is impossible. Since the square of an odd number is odd, we can argue as follows √ 2 = m/n ⇒ 2 = m2/n2 squaring both sides ⇒ m2 = 2n2 ⇒ m2 is even ⇒ m is even since the square of an odd number is odd ⇒ m = 2m1 say ⇒ 2n2 = m2 = 4m21 ⇒ n2 = 2m21 ⇒ n is even, = 2n1 say But then √ 2 = m1/n1, and we can repeat the argument to show that m1 and n1 are both even, so m1 = 2m2 and n1 = 2n2, and so on. Since the argument can be repeated indefi- nitely, we get an infinite descending sequence of natural numbers m > m1 > m2 > · · · which is impossible. Hence there are no natural numbers m and n with √ 2 = m/n. Questions 10.1 For each of the following statements, say which is likely to require strong induction for its proof. • 1 + a+ a2 + · · ·+ an = an+1−1a−1 • ¬ (p1 ∨ p2 ∨ p3 ∨ · · · ∨ pn) ≡ (¬p1) ∧ (¬p2) ∧ (¬p3) ∧ · · · ∧ (¬pn) • Each fraction nm < 1 is a sum of distinct fractions with numerator 1( for example, 1112 = 1 2 + 1 3 + 1 12 ) . 10.2 There is something else which tells you ev- ery integer > 1 is a sum of distinct powers of 2. What is it? 10.3 Is every integer > 1 a sum of distinct pow- ers of 3? Lecture 11: Sets Sets are vital in expressing mathematics for- mally and are also very important data struc- tures in computer science. A set is basically just an unordered collection of distinct objects, which we call its elements or members. Note that there is no notion of order for a set, even though we often write down its elements in some order for convenience. Also, there is no notion of multiplicity: an object is either in a set or not – it cannot be in the set multiple times. Sets A and B are equal when every element of A is an element of B and vice-versa. 11.1 Set notation • x ∈ S means x is an element of set S. • {x1, x2, x3, . . .} is the set with elements x1, x2, x3, . . . . • {x : P (x)} is the set of all x with property P . Example. 17 ∈ {x : x is prime} = {2, 3, 5, 7, 11, 13, . . .} {1, 2, 3} = {3, 1, 2} {1, 1, 1} = {1} For a finite set S, we write |S| for the number of elements of S. 11.2 Universal set The idea of a “set of all sets” leads to logical difficulties. Difficulties are avoided by always working within a local “universal set” which in- cludes only those objects under consideration. For example, when discussing arithmetic it might be sufficient to work just with the num- bers 0, 1, 2, 3, . . .. Our universal set could then be taken as N = {0, 1, 2, 3, . . .}, and other sets of interest, e.g. {x : x is prime}, are parts of N. 11.3 Subsets We say that A is a subset of B and write A ⊆ B when each element of A is an element of B. Example. The set of primes forms a subset of N, that is {x : x is prime} ⊆ N. 11.4 Characteristic functions A subset A of B can be specified by its charac- teristic function χA, which tells which elements of B are in A and which are not. χA(x) = { 1 if x ∈ A 0 if x /∈ A Example. The subset A = {a, c} of B = {a, b, c} has the characteristic function χA with χA(a) = 1, χA(b) = 0, χA(c) = 1. We also write this function more simply as a b c 1 0 1 In fact we can list all characteristic functions on {a, b, c}, and hence all subsets of {a, b, c}, by listing all sequences of three binary digits: characteristic function subset a b c 0 0 0 {} 0 0 1 {c} 0 1 0 {b} 0 1 1 {b, c} 1 0 0 {a} 1 0 1 {a, c} 1 1 0 {a, b} 1 1 1 {a, b, c} We could similarly list all the subsets of a four-element set, and there would be 24 = 16 of them, corresponding to the 24 sequences of 0s and ls. In the same way, we find that an n-element set has 2n subsets, because there are 2n binary sequences of length n. (Each of the n places in the sequence can be filled in two ways.) 11.5 Power set The set of all subsets of a set U is called the power set P(U) of U . Example. We see from the previous table that P({a, b, c}) is the set{{}, {c}, {b}, {b, c}, {a}, {a, c}, {a, b}, {a, b, c}}. If U has n elements, then P(U) has 2n ele- ments. (The reason P(U) is called the “power” set is probably that the number of its elements is this power of 2. In fact, the power set of U is sometimes written 2U .) 11.6 Sets and properties We mentioned at the beginning that {x : P (x)} stands for the set of objects x with property P . Thus sets correspond to properties. Properties of the natural numbers 0, 1, 2, 3, . . . , for example, correspond to sub- sets of the set N = {0, 1, 2, 3, . . .}. Thus the subset {0, 2, 4, 6, . . .} = {n ∈ N : n is even} corresponds to the property of being even. Similarly, the set {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, . . .} corresponds to the property of being prime. The power set P(N) corresponds to all possible prop- erties of natural numbers. 11.7* What are numbers? The most common approach to building mathe- matics up from logical foundations considers all mathematical objects to be fundamentally made of sets. One simple way to define numbers using sets (due to von Neumann) is the following. 0 = {} 1 = {0} 2 = {0, 1} ... n+ 1 = {0, 1, 2, . . . , n} We are not going to use this definition in this course. Still, it is interesting that numbers can be defined in such a simple way. Questions 11.1 Suppose E(x) stands for “x is even” and F (x) stands for “5 divides x.” • What is the set {x : E(x) ∧ F (x)}? • Write a formula using E(x) and F (x) which describes the set {5, 15, 25, 35, . . .}. 11.2 How many subsets does the set {2, 5, 10, 20} have? 11.3 Consider the infinitely many sets {x : 0 < x < 1} {x : 0 < x < 12} {x : 0 < x < 13} {x : 0 < x < 14} ... Do they have any element in common? Lecture 12: Operations on sets There is an “arithmetic” of sets similar to or- dinary arithmetic. There are operations similar to addition, subtraction and multiplication. 12.1 Venn diagrams The simple operations on sets can be visualised with the help of Venn diagrams, which show sets A,B,C, . . . as disks within a rectangle represent- ing the universal set U . A B U 12.2 Union A ∪B The union A ∪ B of sets A and B consists of the elements in A or B, and is indicated by the shaded region in the following Venn diagram. A B U 12.3 Intersection A ∩B The intersection A∩B of sets A and B consists of the elements in A and B, indicated by the shaded region in the following Venn diagram. A B U 12.4 Difference A−B The difference A−B of sets A and B consists of the elements in A and not in B, indicated by the shaded region in the following Venn diagram. A B U The difference U −B relative to the univer- sal set U is called the complement B of B. Here is the Venn diagram of B. A B U 12.5 Symmetric difference A4B The union of A − B and B − A is called the symmetric difference A4B of A and B. A B U A4B consists of the elements of one of A,B but not the other. It is clear from the diagram that we have not only A4B = (A−B) ∪ (B −A), but also A4B = (A ∪B)− (A ∩B). 12.6 Ordered Pairs Sometimes we do want order to be important. In computer science arrays are ubiquitous ex- amples of ordered data structures. In maths, ordered pairs are frequently used. An ordered pair (a, b) consists simply of a first object a and a second object b. The objects a and b are some- times called the entries or coordinates of the or- dered pair. Two ordered pairs (a, b) and (c, d) are equal if and only if a = c and b = d. Example. {0, 1} = {1, 0} but (0, 1) 6= (1, 0). There’s no reason we need to stop with pairs. We can similarly define ordered triples, quadru- ples, and so on. When there are k coordinates, we call the object an ordered k-tuple. Two or- dered k-tuples are equal if and only if their ith coordinates are equal for i = 1, 2, . . . , k. 12.7 Cartesian product A×B The set of ordered pairs A×B = {(a, b) : a ∈ A and b ∈ B} is the cartesian product of sets A and B. The commonest example is where A = B = R (the set of real numbers, or the number line). Then the pairs (a, b) are points in the plane, so R× R is the plane. a b (a, b) O Because Descartes used this idea in geome- try, the cartesian product is named after him. 12.8 A×B and multiplication If A has |A| elements and B has |B| elements, then A×B has |A| × |B| elements. Similarly, if L is a line of length l, and W is a line of length w, then L×W is a rectangle of area l × w. In fact, we call it an “l × w rectan- gle.” This is probably the reason for using the × sign, and for calling A×B a “product.” Questions 12.1 Draw a Venn diagram for A ∩B. What is another name for this set? 12.2 Check the de Morgan laws by drawing Venn diagrams for A ∪B,A ∩ B,A ∩B and A ∪B 12.3 Find which of the following is true by drawing suitable Venn diagrams. A ∩ (B4C) = (A ∩B)4(A ∩ C)? A4(B ∩ C) = (A4B) ∩ (A4C)? 12.4 If plane = line × line, what do you think line × circle is? What about circle × cir- cle? Lecture 13: Functions A function can be thought of as a “black box” which accepts inputs and, for each input, produces a single output. 13.1 Defining functions via sets Formally we represent a function f as a set X of possible inputs, a set Y so that every out- put of f is guaranteed to be in Y , and a set of (input,output) pairs from X × Y . The vital property of a function is that each input gives exactly one output. A function f consists of a domain X, a codomain Y , and a set of ordered pairs from X × Y which has exactly one ordered pair (x, y) for each x ∈ X. When (a, b) is in this set we write f(a) = b. The set of y values occurring in these pairs is the image of f . Note that the image of a function is always a subset of its codomain but they may or may not be equal. If the image of a function is equal to its codomain, we say the function is onto. Examples. 1. The squaring function square(x)= x2 with domain R, codomain R, and pairs {(x, x2) : x ∈ R}, which form what we usually call the plot of the squaring function. -1 0 1 0 1 The image of this function (the set of y val- ues) is the set R>0 of real numbers > 0. 2. The square root function sqrt(x) = √ x with domain R>0, codomain R, and pairs {(x,√x) : x ∈ R and x > 0}. -1 0 1 2 3 -1 2 0 1 The image of this function (the set of y values) is the set R>0. 3. The cubing function cube(x) = x3 with do- main R, codomain R, and pairs {(x, x3) : x ∈ R}, -1 0 1 -1 0 1 The image of this function is the whole of the codomain R, so it is onto. 13.2 Arrow notation If f is a function with domain A and codomain B we write f : A→ B, and we say that f is from A to B. For example, we could define square : R→ R. We could also define square : R→ R>0. Likewise, we could define cube : R→ R. However we could not define cube : R→ R>0, because for some x ∈ R, cube(x) is negative. For example, cube(−1) = −1. 13.3 One-to-one functions A function f : X → Y is one-to-one if for each y in the image of f there is only one x ∈ X such that f(x) = y. For example, the function cube(x) is one-to- one because each real number y is the cube of exactly one real number x. The function square: R → R is not one-to- one because the real number 1 is the square of two different real numbers, 1 and −1. (In fact each real y > 0 is the square of two different real numbers, √ y and −√y) On the other hand, square : R>0 → R is one- to-one because each real number y in R>0 is the square of only one real number in R>0, namely√ y. The last example shows that the domain of a function is an important part of its descrip- tion, because changing the domain can change the properties of the function. 13.4 Proving a function is one-to-one There is an equivalent way of phrasing the def- inition of one-to-one: a function f : X → Y is one-to-one when, for all x1, x2 ∈ X, f(x1) = f(x2) ⇒ x1 = x2. This can be useful for proving that some functions are or are not one-to-one. Example. The function f : R → R given by f(x) = 6x+ 2 is one-to-one because f(x1) = f(x2) ⇒ 6x1 + 2 = 6x2 + 2 ⇒ 6x1 = 6x2 ⇒ x1 = x2. Example. The function f : R → R given by f(x) = x2 + 1 is not one-to-one because f(−1) = 2 and f(1) = 2 and so f(−1) = f(1). Questions 13.1 Some of the following “rules” do not define genuine functions. Which are they? • For each set S of natural numbers, let f(S) be the least element of S. • For each set X of real numbers be- tween 0 and 1, let g(X) be the least element of X. • For each circle C in the (x, y) plane, let h(C) be the minimum distance from C to the x-axis. • For each pair A,B of sets of real num- bers, let s(A,B) be the smallest set containing both A and B. • For each pair A,B of sets of real num- bers, let t(A,B) be the largest set contained in both A and B. 13.2 For each of the following, say which can be defined with domain R and codomain R. x2, 1/x, log x, √ x, 3 √ x Lecture 14: Examples of functions The functions discussed in the last lecture were familiar functions of real numbers. Many other examples occur elsewhere, however. 14.1 Functions of several variables We might define a function sum : R× R→ R by sum(x, y) = x+ y. Because the domain of this function is R×R, the inputs to this function are ordered pairs (x, y) of real numbers. Because its codomain is R, we are guaranteed that each output will be a real number. This function can be thought of as a function of two variables x and y. Similarly we might define a function binomial : R× R× N→ R by binomial(a, b, n) = (a+ b)n. Here the inputs are ordered triples (x, y, n) such that x and y are real numbers and n is a natural number. We can think of this as a function of three variables. 14.2 Sequences An infinite sequence of numbers, such as 1, 12 , 1 4 , 1 8 , 1 16 , . . . , can be viewed as the function f : N → R de- fined by f(n) = 2−n. In this case, the inputs to f are natural numbers, and its outputs are real numbers. Any infinite sequence a0, a1, a2, a3, . . . can be viewed as a function g(n) = an from N to some set containing the values an. 14.3 Characteristic functions A subset of N = {0, 1, 2, 3, . . .} can be repre- sented by its characteristic function. For exam- ple, the set of squares is represented by the func- tion χ : N→ {0, 1} defined by χ(n) = { 1 if n is a square 0 if n is not a square which has the following sequence of values 110010000100000010000000010000000000100 . . . (with 1s at the positions of the squares 0, 1, 4, 9, 16, 25, 36, . . .). Any property of natural numbers can like- wise be represented by a characteristic function. For example, the function χ above represents the property of being a square. Thus any set or property of natural numbers is represented by a function χ : N→ {0, 1}. Characteristic functions of two or more vari- ables represent relations between two or more objects. For example, the relation x 6 y be- tween real numbers x and y has the character- istic function χ : R× R→ {0, 1} defined by χ(x, y) = { 1 if x 6 y 0 otherwise. 14.4 Boolean functions The connectives ∧,∨ and ¬ are functions of vari- ables whose values come from the set B = {T,F} of Boolean values (named after George Boole). ¬ is a function of one variable, so ¬ : B→ B and it is completely defined by giving its values on T and F, namely ¬T = F and ¬F = T. This is what we previously did by giving the truth table of ¬. ∧ and ∨ are functions of two variables, so ∧ : B× B→ B and ∨ : B× B→ B They are completely defined by giving their val- ues on the pairs (T,T), (T,F), (F,T), (F,F) in B× B, which is what their truth tables do. 14.5* Characteristic functions and subsets of N Mathematicians say that two (possibly infinite) sets A and B have the same cardinality (size) if there is a one-to-one and onto function from A to B. This function associates each element of A with a unique element of B and vice-versa. With this definition, it is not too hard to show that, for example, N and Z have the same cardinality (they are both “countably infinite”). It turns out, though, that P(N) has a strictly greater cardinality than N. We can prove this by showing: no sequence f0, f1, f2, f3, . . . in- cludes all characteristic functions for subsets of N. (This shows that there are more characteris- tic functions than natural numbers.) In fact, for any infinite list f0, f1, f2, f3, . . . of characteristic functions, we can define a char- acteristic function f which is not on the list. Imagine each function given as the infinite se- quence of its values, so the list might look like this: f0 values 0101010101 . . . f1 values 0000011101 . . . f2 values 1111111111 . . . f3 values 0000000000 . . . f4 values 1001001001 . . . ... Now if we switch each of the underlined values to its opposite, we get a characteristic function f(n) = { 1 if fn(n) = 0 0 if fn(n) = 1 which is different from each function on the list. In fact, it has a different value from fn on the number n. For the given example, f has values 11011 . . . The construction of f is sometimes called a “di- agonalisation argument”, because we get its val- ues by switching values along the diagonal in the table of values of f0, f1, f2, f3, . . .. Questions 14.1 Suggest domains and codomains for the following functions, writing the domain as a cartesian product where applicable. gcd, reciprocal, remainder ∩, ∪ 14.2 If A and B are subsets of N with charac- teristic functions χA and χB respectively, what set does the function χA(n)χB(n) represent? 14.3 How many Boolean functions of n vari- ables are there? Lecture 15: Composition and inversion Complicated functions are often built from simple parts. For example, the function f : R→ R defined by f(x) = (x2 + 1)3 is computed by doing the following steps in succession: • square, • add 1, • cube. We say that f(x) = (x2 + 1)3 is the composite of the functions (from R to R) • square(x)=x2, • successor(x)=x+ 1, • cube(x)=x3. 15.1 Notation for composite func- tions In the present example we write f(x) = cube(successor(square(x))), or f = cube ◦ successor ◦ square. Let h : X → Y and g : Y → Z be functions. The function g ◦ h : X → Z is defined by g ◦ h(x) = g(h(x)) and is called the composite of g and h. Warning: Remember that g ◦ h means “do h first, then g.” g ◦h is usually different from h ◦ g. Example. square(successor(x)) = (x+ 1)2 = x2 + 2x+ 1 successor(square(x)) = x2 + 1 15.2 Conditions for composition Composite functions do not always exist. Example. If reciprocal : R − {0} → R is de- fined by reciprocal(x) = 1x and predecessor : R → R is defined by predecessor(x) = x − 1, then reciprocal ◦ predecessor does not exist, be- cause predecessor(1) = 0 is not a legal input for reciprocal. To avoid this problem, we demand that the codomain of h be equal to the domain of g for g ◦ h to exist. This ensures that each output of h will be a legal input for g. Let h : A → B and g : C → D be func- tions. Then g ◦ h : A→ D exists if and only if B = C. 15.3 The identity function On each set A the function iA : A → A defined by iA(x) = x, is called the identity function (on A). 15.4 Inverse functions Functions f : A→ A and g : A→ A are said to be inverses (of each other) if f ◦ g = g ◦ f = iA. Example. square and sqrt are inverses of each other on the set R>0 of reals > 0. sqrt(square(x)) = x and square(sqrt(x)) = x. In fact, this is exactly what sqrt is supposed to do – reverse the process of squaring. How- ever, this works only if we restrict the domain to R>0. On R we do not have sqrt(square(x)) = x because, for example, sqrt(square(−1)) = sqrt(1) = 1. This problem arises whenever we seek an in- verse for a function which is not one-to-one. The squaring function on R sends both 1 and −1 to 1, but we want a single value 1 for sqrt(1). Thus we have to restrict the squaring function to R>0. 15.5 Conditions for inversion A function f can have an inverse without its domain and codomain being equal. The inverse of a function f : A → B is a function f−1 : B → A such that f−1 ◦ f = iA and f ◦ f−1 = iB. Note that f−1 ◦ f and f ◦ f−1 are both iden- tity functions but they have different domains. Not every function has an inverse, but we can neatly classify the ones that do. Let f : A → B be a function. Then f−1 : B → A exists if and only if f is one-to-one and onto. Example: ex and log Consider f : R → R>0 − {0} defined by f(x) = ex. We know that ex is one-to-one (e.g. because it is strictly increasing), and onto. So it has an inverse f−1 on R>0 − {0}. -4 -3 -2 -1 0 1 0 1 2 3 Plot of y = ex. In fact, f−1 = log(y) where log : R>0 − {0} → R. Now elog x = x and log(ex) = x, so elog x and log(ex) are both identity functions, but they have different domains. The domain of elog x is R>0 − {0} (note log is defined only for reals > 0). The domain of log(ex) is R. 15.6 Operations An operation is a particular type of function, with domain A×A×A× . . .×A and codomain A, for some set A. For example, the addition function f(a, b) = a + b is called an operation on R, because f : R × R → R. (That is, addition is a function of two real variables, which takes real values.) An operation with one variable is called unary, an operation with two variables is called binary, an operation with three variables is called ternary, and so on. Examples 1. Addition is a binary operation on R. 2. Successor is a unary operation on N. 3. Intersection is a binary operation on P(A) for any set A. 4. Complementation is a unary operation on P(A) for any set A. Questions 15.1 Suppose f,m and s are the following func- tions on the set of people. m(x) = mother of x f(x) = father of x s(x) = spouse of x What are the English terms for the follow- ing composite functions? m ◦ s, f ◦ s, m ◦m, f ◦m, s ◦ s 15.2 Write the following functions as compos- ites of square(x), sqrt(x), successor(x) and cube(x). √ 1 + x3, x3/2, (1 + x)3, (1 + x3)2 15.3 What interesting feature do the following functions have in common? (Hint: con- sider their inverses.) • ¬ on B • The reciprocal, f(x) = 1x , on R−{0} • The function g(x) = xx−1 , on R−{1}. Lecture 16: Relations Mathematical objects can be related in var- ious ways, and any particular way of relating objects is called a relation on the set of objects in question. (This also applies to relations in the every- day sense. For example, “parent of” is a relation on the set of people.) A binary relation R on a set A consists of A and a set of ordered pairs from A×A. When (a, b) is in this set we write aRb. Similarly, a ternary relation on A would be defined by a set of ordered triples from A×A×A, and so on. (A unary relation on A is just a sub- set of A.) 16.1 Relations and functions Any function f : X → Y can be viewed as a relation R on X ∪ Y . The relation is defined by xRy if and only if y = f(x). However, not every relation is a function. Remember that a function must have exactly one output y for each input x in its domain. In a relation, on the other hand, an element x may be related to many elements y, or to none at all. 16.2 Examples 1. Equality on R. This is the relation consisting of the pairs (x, x) for x ∈ R. Thus it is the following subset of the plane. -1 0 1 0 1 -1 This relation is also a function (the identity function on R), since there is exactly one pair for each x ∈ R. 2. The < relation on R. This relation consists of all the pairs (x, y) with x < y. It is the following shaded subset of the plane. -1 0 1 0 1 -1 (The dashed line indicates that the points where x = y are omitted.) 3. Algebraic curves. An algebraic curve consists of the points (x, y) satisfying an equation p(x, y) = 0, where p is a polynomial. E.g. unit circle x2 + y2 − 1 = 0. -1 0 1 0 1 -1 Notice that this relation is not a function, because there are two pairs with the same x, e.g. (0, 1) and (0,−1). Likewise, the curve y2 = x2(x + 1) is not a function. -1 0 1 2 -2 -1 0 1 2 4. The subset relation ⊆. This consists of the ordered pairs of sets (A,B) such that A ⊆ B. A and B must both be subsets of some universal set U . 5. Congruence modulo n. For a fixed n, congruence modulo n is a bi- nary relation. It consists of all the ordered pairs of integers (a, b) such that n divides a− b. 16.3 Properties of congruence As the symbol ≡ suggests, congruence mod n is a lot like equality. Numbers a and b which are congruent mod n are not necessarily equal, but they are “equal up to multiples of n,” because they have equal remainders when divided by n. Because congruence is like equality, congru- ence a ≡ b (mod n) behave a lot like equations. In particular, they have the following three prop- erties. 1. Reflexive property. a ≡ a (mod n) for any number a. 2. Symmetric property. a ≡ b (mod n)⇒ b ≡ a (mod n) for any numbers a and b. 3. Transitive property. a ≡ b (mod n) and b ≡ c (mod n)⇒ a ≡ c (mod n) for any numbers a, b and c. These properties are clear if one remembers that a ≡ b (mod n) means a and b have the same remainder on division by n. Questions 16.1 Which of the following relations R(x, y) satisfy ∀x∃yR(x, y)? • x ∧ y = T (for propositions x, y) • x ⊆ y (for sets x, y of natural num- bers) • x > y (for real numbers x, y ) • x divides y (for natural numbers x, y) 16.2 Use logic symbols and the 6 relation to write a relation between real numbers x, y which says that the point (x, y) lies in the square with corners (0,0), (1,0), (0,1) and (1,1). Lecture 17: Equivalence relations An equivalence relation R on a set A is a bi- nary relation with the following three prop- erties. 1. Reflexivity. aRa for all a ∈ A. 2. Symmetry. aRb⇒ bRa for all a, b ∈ A. 3. Transitivity. aRb and bRc⇒ aRc for all a, b, c ∈ A. Equality and congruence mod n (for fixed n) are examples of equivalence relations. 17.1 Other equivalence relations 1. Equivalence of fractions. Two fractions are equivalent if they reduce to the same fraction when the numerator and denominator of each are divided by their gcd. E.g. 24 and 3 6 are equivalent because both re- duce to 12 . 2. Congruence of triangles. Triangles ABC and A′B′C ′ are congruent if AB = A′B′, BC = B′C ′ and CA = C ′A′. E.g. the following triangles are congruent. A B C A′B′ C ′ 3. Similarity of triangles. Triangles ABC and A′B′C ′ are similar if AB A′B′ = BC B′C ′ = CA C ′A′ . E.g. the following triangles are similar A B C A′B′ C ′ 4. Parallelism of lines. The relation L‖M (L is parallel to M) is an equivalence relation. Remark In all these cases the relation is an equivalence because it says that objects are the same in some respect. 1. Equivalent fractions have the same re- duced form. 2. Congruent triangles have the same side lengths. 3. Similar triangles have the same shape. 4. Parallel lines have the same direction. Sameness is always reflexive (a is the same as a), symmetric (if a is the same as b, then b is the same as a) and transitive (if a is the same as b and b is the same as c, then a is the same as c). 17.2 Equivalence classes Conversely, we can show that if R is a reflexive, symmetric and transitive relation then aRb says that a and b are the same in some respect: they have the same R-equivalence class. If R is an equivalence relation we define the R-equivalence class of a to be [a] = {s : sRa}. Thus [a] consists of all the elements related to a. It can also be defined as {s : aRs}, because sRa if and only if aRs, by symmetry of R. Examples • The parallel equivalence class of a line L consists of all lines parallel to L. • The equivalence class of 1 for congruence mod 2 is the set of all odd numbers. 17.3 Equivalence class properties Claim. If two elements are related by an equiv- alence relation R on a set A, their equivalence classes are equal. Proof. Suppose a, b ∈ A and aRb. Now s ∈ [a] ⇒ sRa by definition of [a] ⇒ sRb by transitivity of R since sRa and aRb ⇒ s ∈ [b] by definition of [b]. Thus all elements of [a] belong to [b]. Similarly, all elements of [b] belong to [a], hence [a] = [b]. Claim. If R is an equivalence relation on a set A, each element of A belongs to exactly one equivalence class. Proof. Suppose a, b, c ∈ A, and c ∈ [a] ∩ [b]. c ∈ [a] and c ∈ [b] ⇒ cRa and cRb by definition of [a] and [b] ⇒ aRc and cRb by symmetry ⇒ aRb by transitivity ⇒ [a] = [b] by the previous claim. 17.4 Partitions and equivalence classes A partition of a set S is a set of subsets of S such that each element of S is in exactly one of the subsets. Using what we showed in the last section, we have the following. If R is an equivalence relation on a set A, then the equivalence classes of R form a par- tition of A. Two elements of A are related if and only if they are in the same equivalence class. Example. Let R be the relation on Z defined by aRb if and only if a ≡ b (mod 3). The three equivalence classes of R are {x : x ≡ 0 (mod 3)} = {3k : k ∈ Z} {x : x ≡ 1 (mod 3)} = {3k + 1 : k ∈ Z} {x : x ≡ 2 (mod 3)} = {3k + 2 : k ∈ Z}. These partition the set Z. Questions 17.1 Which of the following relations between integers x and y are equivalence relations? • |x| = |y| • x3 − y3 = 0 • x3 − y3 = 1 • x divides y • 5 divides x− y 17.2 For those relations in Question 17.1 that are not equivalence relations, say which properties of equivalence they fail to sat- isfy. 17.3 For those that are equivalence relations, say what is the “same” about the related objects. 17.4 Also, for those relations that are equiva- lence relations, describe their equivalence classes. Lecture 18: Order relations 18.1 Partial order relations A partial order relation R on a set A is a bi- nary relation with the following three prop- erties. 1. Reflexivity. aRa for all a ∈ A. 2. Antisymmetry. aRb and bRa⇒ a = b for all a, b ∈ A. 3. Transitivity. aRb and bRc⇒ aRc for all a, b, c ∈ A. Examples. 1. 6 on R. Reflexive: a 6 a for all a ∈ R. Antisymmetric: a 6 b and b 6 a⇒ a = b for all a, b ∈ R. Transitive: a 6 b and b 6 c ⇒ a 6 c for all a, b, c ∈ R. 2. ⊆ on P(N). Reflexive: A ⊆ A for all A ∈ P(N). Antisymmetric: A ⊆ B and B ⊆ A⇒ A = B for all A,B ∈ P(N). Transitive: A ⊆ B and B ⊆ C ⇒ A ⊆ C for all A,B,C ∈ P(N). 3. Divisibility on N. The relation “a divides b” on natural num- bers is reflexive, antisymmetric and transi- tive. We leave checking this as an exercise. 4. Alphabetical order of words. Words on the English alphabet are alphabet- ically ordered by comparing the leftmost let- ter at which they differ. We leave checking that this relation is reflexive, antisymmetric and transitive as an exercise. 18.2 Total order relations A total order relation is a special kind of partial order relation that “puts everything in order”. A total order relation R on a set A is a par- tial order relation that also has the property aRb or bRa for all a, b ∈ A. Examples. 1. 6 on R This is a total order relation because for all real numbers a and b we have a 6 b or b 6 a. 2. ⊆ on P(N). This is not a total order because, for example, {1, 2} * {1, 3} and {1, 3} * {1, 2}. 3. Divisibility on N. This is not a total order because, for exam- ple, 2 does not divide 3 and 3 does not divide 2. 4. Alphabetical order of words. This is a total order because given any two different words, one will appear before the other in alphabetical order. 18.3 Hasse diagrams A partial order relationR on a finite set A can be represented as a Hasse diagram. The elements of A are written on the page and connected by lines so that, for any a, b ∈ A, aRb exactly when b can be reached from a by travelling upward along the lines. Example. A Hasse diagram for the relation ⊆ on the set P({1, 2}) can be drawn as follows. {1, 2} {1} {2} ∅ Example. A Hasse diagram for the relation “di- vides” on the set {1, 2, 3, 5, 6, 10, 15, 30} can be drawn as follows. 30 6 10 15 3 2 5 1 Example. A Hasse diagram for the relation 6 on the set {1, 2, 3, 4, 5} can be drawn as follows. 5 4 3 2 1 Notice how this last Hasse diagram can be simply drawn as a vertical chain, when the pre- vious two are “wider” and more complicated. This corresponds to the fact that the last exam- ple was of a total order relation but the previous two were not of total order relations. 18.4 Well-ordering A well-order relation on a set is a total order relation that also has the property that each nonempty set of its elements contains a least el- ement. A well-order relation R on a set A is a to- tal order relation such that, for all nonempty S ⊆ A, there exists an ` ∈ S such that `Ra for all a ∈ S. Example. The relation 6 on N is a well-order relation because every nonempty subset of N has a least element. The well-ordering of N is the basis of proofs by induction. Example. The relation 6 on Z is not a well- order relation. For example, Z itself has no least element. Example. The relation 6 on {x : x ∈ R, x > 0} is not a well-order relation. For example, the subset {x : x ∈ R, x > 3} has no least element. Questions 18.1 Explain why “antisymmetric” does not mean “not symmetric”. Give an example of a relation which is neither symmetric nor antisymmetric. 18.2 Draw a diagram of the positive divisors of 42 under the relation “divides.” Why does it resemble the diagram for the positive di- visors of 30}? 18.3 Invent a partial order relation on N × N. Is your ordering a total ordering? Is your ordering a well-ordering? Lecture 19: Selections and Arrangements 19.1 Ordered selections without repetition A reviewer is going to compare ten phones and list, in order, a top three. In how many ways can she do this? More generally, how many ways are there to arrange r objects chosen from a set of n objects? In our example, the reviewer has 10 options for her favourite, but then only 9 for her second- favourite, and 8 for third-favourite. So there are 10× 9× 8 ways she could make her list. For an ordered selection without repetition of r elements from a set of n elements there are n options for the 1st element n− 1 options for the 2nd element n− 2 options for the 3rd element ... ... n− r + 1 options for the rth element. So we have the following formula. The number of ordered selections without repetition of r elements from a set of n el- ements (0 6 r 6 n) is n(n− 1) · · · (n− r + 1) = n! (n− r)! . When r = n and all the elements of a set S are ordered, we just say that this is a permuta- tion of S. Our formula tells us there are n! such permutations. For example, there are 3! = 6 permutations of the set {a, b, c}: (a, b, c), (a, c, b), (b, a, c), (b, c, a), (c, a, b), (c, b, a). 19.2 Unordered selections without repetition What if our reviewer instead chose an unordered top three? In how many ways could she do that? More generally, how many ways are there to choose (without order) r objects from a set of n objects? A combination of r elements from a set S is a subset of S with r elements. For every unordered list our reviewer could make there are 3! = 6 corresponding possible or- dered lists. And we’ve seen that she could make 10 × 9 × 8 ordered lists. So the number of un- ordered lists she could make is 10×9×86 . For every combination of r elements from a set of n elements there are r! corresponding per- mutations. So, using our formula for the number of permutations we have the following. The number of combinations of r elements from a set of n elements (0 6 r 6 n) is n(n− 1) · · · (n− r + 1) r! = n! r!(n− r)! = ( n r ) . Notice that the notation ( n r ) is used for n!r!(n−r)! . Expressions like this are called binomial coeffi- cients. We’ll see why they are called this in the next lecture. 19.3 Ordered selections with repetition An ordered selection of r elements from a set X is really just a sequence of length r with each term in X. If X has n elements, then there are n possibilities for each term and so: The number of sequences of r terms, each from some set of n elements, is n× n× · · · × n︸ ︷︷ ︸ r = nr. 19.4 Unordered selections with repetition A shop has a special deal on any four cans of soft drink. Cola, lemonade and sarsaparilla flavours are available. In how many ways can you select four cans? We can write a selection in a table, for ex- ample, C L S • •• • and C L S • ••• . We can change a table like this into a string of zeroes and ones, by moving from left to right reading a “•” as a 0 and a column separator as a 1. The tables above would be converted into 0 1 0 0 1 0 and 1 0 1 0 0 0 Notice that each string has four zeroes (one for each can selected) and two ones (one fewer than the number of flavours). We can choose a string like this by beginning with a string of six ones and then choosing four ones to change to zeroes. There are ( 6 4 ) ways to do this and so there are ( 6 4 ) possible can selections. An unordered selection of r elements, with repetition allowed, from a set X of n elements can be thought of as a multiset with r elements, each in X. As in the example, we can represent each such multiset with a string of r zeroes and n − 1 ones. We can choose a string like this by beginning with a string of n + r − 1 ones and then choosing r ones to change to zeroes. The number of multisets of r elements, each from a set of n elements, is( n+ r − 1 r ) = (n+ r − 1)! r!(n− 1)! . 19.5 The pigeonhole principle The pigeonhole principle is a reasonably obvious statement, but can still be very useful. If n items are placed in m containers with n > m, then at least one container has at least two items. Example. If a drawer contains only blue, black and white socks and you take out four socks without looking at them, then you are guaran- teed to have two of the same colour. We can generalise the pigeonhole principle as follows. If n items are placed in m containers, then at least one container has at least d nme items. In the above d nme means the smallest integer greater than nm (or n m “rounded up”). Example. If 21 tasks have been distributed be- tween four processor cores, the busiest core must have been assigned at least 6 tasks. Questions 19.1 A bank requires a PIN that is a string of four decimal digits. How many such PINs are there? How many are made of four different digits? 19.2 How many binary strings of length 5 are there? How many of these contain exactly two 1s? 19.3 In a game, each of ten players holds red, blue and green marbles, and places one marble in a bag. How many possibilities are there for the colours of marbles in the bag? If each player chooses their colour at random are all of these possibilities equally likely? 19.4 How many ways are there to partition (that is, split up) a set with 10 elements into one class of 5 elements, one class of 3 elements and one class of 2 elements? How many ways are there to partition a set with 10 elements into one class of 6 el- ements and two classes of 2 elements each? Lecture 20: Pascal’s triangle 20.1 Pascal’s triangle We can write the binomial coefficients in an (in- finite) triangular array as follows:( 0 0 )( 1 0 ) ( 1 1 )( 2 0 ) ( 2 1 ) ( 2 2 )( 3 0 ) ( 3 1 ) ( 3 2 ) ( 3 3 )( 4 0 ) ( 4 1 ) ( 4 2 ) ( 4 3 ) ( 4 4 )( 5 0 ) ( 5 1 ) ( 5 2 ) ( 5 3 ) ( 5 4 ) ( 5 5 )( 6 0 ) ( 6 1 ) ( 6 2 ) ( 6 3 ) ( 6 4 ) ( 6 5 ) ( 6 6 ) ... ... ... Here are the first ten rows with the entries as integers: 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 1 6 15 20 15 6 1 1 7 21 35 35 21 7 1 1 8 28 56 70 56 28 8 1 1 9 36 84 126 126 84 36 9 1 1 10 45 120 210 252 210 120 45 10 1 This triangular array is often called Pascal’s triangle (although Pascal was nowhere near the first to discover it). 20.2 Patterns Writing the binomial coeffcients this way reveals a lot of different patterns in them. Perhaps the most obvious is that every row reads the same left-to-right and right-to-left. Choosing r ele- ments from a set of n elements to be in a combi- nation is equivalent to choosing n − r elements from the same set to not be in the combination. So: ( n r ) = ( n n− r ) for 0 6 r 6 n. This shows that every row reads the same left- to-right and right-to-left. Another pattern is that every “internal” en- try in the triangle is the sum of the two entries above it. To see why this is, we’ll begin with an example. Example. Why is ( 6 2 ) = ( 5 2 ) + ( 5 1 ) ? There are ( 6 2 ) combinations of 2 elements of {1, 2, 3, 4, 5, 6}. Every such combination either • does not contain a 6, in which case it is one of the ( 5 2 ) combinations of 2 elements of {1, 2, 3, 4, 5}; or • does contain a 6, in which case the rest of the combination is one of the ( 5 1 ) combi- nations of 1 element from {1, 2, 3, 4, 5}. So ( 6 2 ) = ( 5 2 ) + ( 5 1 ) . We can make a similar argument in general. Let X be a set of n elements and x is a fixed element of X. For any r ∈ {1, . . . , n}, there are( n−1 r ) combinations of r elements of X that do not contain x and there are ( n−1 r−1 ) combinations of r elements of X that do contain x. So:( n r ) = ( n− 1 r ) + ( n− 1 r − 1 ) for 1 6 r 6 n. This shows that every internal entry in Pascal’s triangle is the sum of the two above it. 20.3 The binomial theorem (x+ y)0 = 1 (x+ y)1 = x+ y (x+ y)2 = x2 + 2xy + y2 (x+ y)3 = x3 + 3x2y + 3xy2 + y3 (x+ y)4 = x4 + 4x3y + 6x2y2 + 4xy3 + y4 (x+y)5 = x5+5x4y+10x3y2+10x2y3+5xy4+y5 Notice that the coefficients on the right are exactly the same as the entries in Pascal’s tri- angle. Why does this happen? Think about expanding (x+ y)3 and finding the coefficient of xy2, for example. (x+ y)(x+ y)(x+ y) = xxx+ xxy + xyx+ xyy +yxx+yxy+yyx+yyy = x3 + 3x2y + 3xy2 + y3 The coefficient of xy2 is 3 because we have three terms in the sum above that contain two y’s (those underlined). This is because there are( 3 2 ) ways to choose two of the three factors in a term to be y’s. The same logic holds in general. The coeffi- cient of xn−ryr in (x + y)n will be ( n r ) because there will be ( n r ) ways to choose r of the n fac- tors in a term to be y’s. This fact is called the binomial theorem. Binomial theorem For any n ∈ N, (x+y)n = ( n 0 ) xny0+ ( n 1 ) xn−1y1+ ( n 2 ) xn−2y2+ · · ·+( nn−1)x1yn−1 +(nn)x0yn. 20.4 Inclusion-exclusion A school gives out prizes to its best ten students in music and its best eight students in art. If three students receive prizes in both, how many students get a prize? If we try to calculate this as 10 + 8 then we have counted the three over- achievers twice. To compensate we need to sub- tract three and calculate 10 + 8− 3 = 15. In general, if A and B are finite sets then we have |A ∪B| = |A|+ |B| − |A ∩B|. With a bit more care we can see that if A, B and C are sets then we have |A ∪B ∪ C| = |A|+ |B|+ |C| − |A ∩B| − |A ∩ C| − |B ∩ C|+ |A ∩B ∩ C|. This is part of a more general law called the inclusion-exclusion principle. LetX1, X2, . . . , Xt be finite sets. To calculate |X1 ∪X2 ∪ · · · ∪Xt|: • add the sizes of the sets; • subtract the sizes of the 2-way intersections; • add the sizes of the 3-way intersections; • subtract the sizes of the 4-way intersections; ... • add/subtract the size of the t-way intersection. To see why this works, think of an element x that is in n of the sets X1, X2, . . . , Xt. It is counted( n 1 ) − ( n 2 ) + ( n 3 ) − ( n 4 ) + · · · ± ( n n ) times. By the Binomial theorem with x = 1 and y = −1 (see Question 20.1), this is equal to 1. So each element is counted exactly once. Questions 20.1 Substitute x = 1 and y = −1 into the statement of the binomial theorem. What does this tell you about the rows of Pas- cal’s triangle? 20.2 Find a pattern in the sums of the rows in Pascal’s triangle. Prove your pattern holds using the binomial theorem. Also prove it holds by considering the powerset of a set. 20.3 Use inclusion-exclusion to work out how many numbers in the set {1, . . . , 100} are divisible by 2 or 3 or 5. Lecture 21: Probability Probability gives us a way to model ran- dom processes mathematically. These processes could be anything from the rolling of dice, to radioactive decay of atoms, to the performance of a stock market index. The mathematical en- vironment we work in when dealing with prob- abilities is called a probability space. 21.1 Probability spaces We’ll start with a formal definition and then look at some examples of how the definition is used. A probability space consists of • a set S called a sample space which con- tains all the possible outcomes of the ran- dom process; and • a probability function Pr : S → [0, 1] such that the sum of the probabilities of the out- comes in S is 1. Each time the process occurs it should produce exactly one outcome (never zero or more than one). The probability of an outcome is a mea- sure of the likeliness that it will occur. It is given as a real number between 0 and 1 inclu- sive, where 0 indicates that the outcome cannot occur and 1 indicates that the outcome must oc- cur. Example. 1 4 3 2 The spinner above might be modeled by a prob- ability space with sample space S = {1, 2, 3, 4} and probability function given as follows. Pr(s) = 1 2 for s = 1 1 4 for s = 2 1 8 for s = 3 1 8 for s = 4. It can be convenient to give this as a table: s 1 2 3 4 Pr(s) 12 1 4 1 8 1 8 . Example. Rolling a fair six-sided die could be modeled by a probability space with sample space S = {1, 2, 3, 4, 5, 6} and probability func- tion Pr given as follows. s 1 2 3 4 5 6 Pr(s) 16 1 6 1 6 1 6 1 6 1 6 . A sample space like this one where every out- come has an equal probability is sometimes called a uniform sample space. Outcomes from a uniform sample space are said to have been taken uniformly at random. 21.2 Events An event is a subset of the sample space. An event is just a collection of outcomes we are interested in for some reason. Example. In the die rolling example with S = {1, 2, 3, 4, 5, 6}, we could define the event of rolling at least a 3. Formally, this would be the set {3, 4, 5, 6}. We could also define the event of rolling an odd number as the set {1, 3, 5}. The probability of an event A is the sum of the probabilities of the outcomes in A. Example. In the spinner example, for the event A = {1, 2, 4}, we have Pr(A) = Pr(1) + Pr(2) + Pr(4) = 12 + 1 4 + 1 8 = 78 . In a uniform sample space (where all out- comes are equally likely) the probability of an event A can be calculated as: Pr(A) = number of outcomes in A number of outcomes = |A| |S| . 21.3 Operations on events Because events are defined as sets we can per- form set operations on them. If A and B are events for a sample space S, then • A ∪B is the event “A or B,” • A ∩B is the event “A and B,” • A is the event “not A.” We always take the sample space as our univer- sal set, so A means S −A. 21.4 Probabilities of unions We saw in the section on the inclusion-exclusion principal that |A ∪B| = |A|+ |B| − |A ∩B| for finite sets A and B. We have a similar law in probability. For any two events A and B, Pr(A ∪B) = Pr(A) + Pr(B)− Pr(A ∩B). Example. In our die rolling example, let A = {1, 2} and B = {2, 3, 4} be events. Then Pr(A∪B) = |A||S|+ |B| |S| − |A ∩B| |S| = 2 6 + 3 6 −1 6 = 2 3 . Two events A and B are mutually exclusive if Pr(A ∩ B) = 0, that is, if A and B cannot occur together. For mutually exclusive events, we have Pr(A ∪B) = Pr(A) + Pr(B). 21.5 Independent events We say that two events are independent when the occurrence or non-occurrence of one event does not affect the likelihood of the other occur- ring. Two events A and B are independent if Pr(A ∩B) = Pr(A)Pr(B). Example. A binary string of length 3 is gener- ated uniformly at random. The event A that the first bit is a 1 is independent of the event B that the second bit is a 1. But A is not independent of the event C that the string contains exactly two 1s. Formally, the sample space is S = {000, 001, 010, 011, 100, 101, 110, 111} and Pr(s) = 18 for any s ∈ S. So, A = {100, 101, 110, 111} Pr(A) = 12 B = {010, 011, 110, 111} Pr(B) = 12 C = {011, 101, 110} Pr(C) = 38 A ∩B = {110, 111} Pr(A ∩B) = 14 A ∩ C = {101, 110} Pr(A ∩ C) = 14 So Pr(A ∩ B) = Pr(A)Pr(B) but Pr(A ∩ C) 6= Pr(A)Pr(C). 21.6 Warning Random processes can occur in both discrete and continuous settings, and probability theory can be applied in either setting. In this lecture, and in the next four lectures, we are discussing only the discrete case. Many of the definitions and results we state apply only in this case. Our definition of a probability space, for example, is actually the definition of a discrete probability space, and so on. The discrete setting provides a good envi- ronment to learn most of the vital concepts and intuitions of probability theory. What you learn here is very useful in itself, and will act as a good base if you go on to study continuous probabil- ity. Questions An integer is chosen uniformly at random from the set {1, 2, . . . , 30}. Let A be the event that the integer is at most 20. Let B be the event that the integer is divisible by 6. Let C be the event that the integer’s last digit is a 5. 21.1 Write A,B and C as sets, and find their probabilities. 21.2 Find the probabilities of A∪B, A∪C and B ∪ C. Which pairs of A,B,C are mutu- ally exclusive? 21.3 Find the probabilities of A∩B, A∩C and B ∩ C. Which pairs of A,B,C are inde- pendent? Lecture 22: Conditional probability and Bayes’ theorem Your friend believes that Python coding has become more popular than AFL in Melbourne. She bets you $10 that the next person to pass you on the street will be a Python program- mer. You feel confident about this bet. How- ever, when you see a man in a “Hello, world!” t-shirt approaching, you don’t feel so confident any more. Why is this? We can think about this with a diagram. The rectangle represents the set of people in Melbourne, the circle P is the set of Python coders, and the circle T is the set of “Hello, world!” t-shirt owners. T P Initially, you feel confident because the circle P takes up a small proportion of the rectan- gle. But when you learn that your randomly selected person is in the circle T , you feel bad because the circle P covers almost all of T . In mathematical language, the probability that a random Melbournian is a Python coder is low, but the probability that a random Melbournian is a Python coder given that they own a “Hello, world!” t-shirt is high. 22.1 Conditional probability Conditional probabilities measure the likelihood of an event, given that some other event occurs. For events A and B, the conditional probabil- ity of A given B is Pr(A|B) = Pr(A∩B)Pr(B) . This definition also implies that Pr(A ∩B) = Pr(A|B)Pr(B). Example. The spinner from the last lecture is spun. Let A be the event that the result was at least 3 and B be the event that the result was even. What is Pr(A|B)? Pr(A ∩B) = Pr(4) = 18 Pr(B) = Pr(2) + Pr(4) = 14 + 1 8 = 3 8 Thus, Pr(A|B) = Pr(A∩B)Pr(B) = (18)/(38) = 13 . Example. A binary string of length 6 is gener- ated uniformly at random. Let A be the event that the first bit is a 1 and B be the event that the string contains two 1s. What is Pr(A|B)? There are 26 strings in our sample space. Now A∩B occurs when the first bit is 1 and the rest of the string contains 1 one. There are ( 5 1 ) such strings and so Pr(A ∩ B) = (51)/26. Also, there are ( 6 2 ) strings containing two 1s and so Pr(B) = ( 6 2 ) /26. Thus, Pr(A|B) = Pr(A∩B)Pr(B) = ( 5 1 ) / ( 6 2 ) = 13 . 22.2 Independence again Our definition of conditional probability gives us another way of defining independence. We can say that events A and B are independent if Pr(A) = Pr(A|B). This makes sense intuitively: it is a formal way of saying that the likelihood of A does not de- pend on whether or not B occurs. 22.3 Independent repeated trials Generally if we perform exactly the same action multiple times, the results for each trial will be independent of the others. For example, if we roll a die twice, then the result of the first roll will be independent of the result of the second. For two independent repeated trials, each from a sample space S, our overall sample space is S × S and our probability function will be given by Pr((s1, s2)) = Pr(s1)Pr(s2). For three independent repeated trials the sample space is S × S × S and the probability function Pr((s1, s2, s3)) = Pr(s1)Pr(s2)Pr(s3), and so on. Example. The spinner from the previous ex- ample is spun twice. What is the probability that the results add to 5? A total of 5 can be obtained as (1,4), (4,1), (2,3) or (3,2). Because the spins are independent: Pr((1, 4)) = Pr((4, 1)) = 12 × 18 = 116 Pr((2, 3)) = Pr((3, 2)) = 14 × 18 = 132 So, because (1,4), (4,1), (2,3) and (3,2) are mu- tually exclusive, the probability of the total be- ing 5 is 116 + 1 16 + 1 32 + 1 32 = 3 16 . 22.4 Bayes’ theorem Bayes’ theorem gives a way of calculating the conditional probability of an event A given an event B when we already know the probabilities of A, of B given A, and of B given A. Bayes’ theorem. For the events A and B, Pr(A|B) = Pr(B|A)Pr(A) Pr(B|A)Pr(A) + Pr(B|A)Pr(A) . Note that the denominator above is simply an expression for Pr(B). The fact that Pr(B) = Pr(B|A)Pr(A) + Pr(B|A)Pr(A) is due to the law of total probability. 22.5 Bayes’ theorem examples Example. Luke Skywalker discovers that some porgs have an extremely rare genetic mutation that makes them powerful force users. He de- velops a test for this mutation that is right 99% of the time and decides to test all the porgs on Ahch-To. Suppose there are 100 mutant porgs in the population of 24 million. We would guess that the test would come up positive for 99 of the 100 mutants, but also for 239 999 non-mutants. We are assuming that the conditional prob- ability of a porg testing positive given its a mu- tant is 0.99. But what is the conditional prob- ability of it being a mutant given that it tested positive? From our guesses, we would expect this to be 9999+239999 ≈ 0.0004. Bayes’ theorem gives us a way to formalise this: Pr(M |P ) = Pr(P |M)Pr(M) Pr(P |M)Pr(M)+Pr(P |M)Pr(M) = 100 24000000 ×0.99 100 24000000 ×0.99+(1− 100 24000000 )×0.01 = 9999+239999 ≈ 0.0004. Example. A binary string is created so that the first bit is a 0 with probability 13 and then each subsequent bit is the same as the preceding one with probability 34 . What is the probability that the first bit is 0, given that the second bit is 0? Let F be the event that the first bit is 0 and let S be the event that the second bit is 0. So Pr(F ) = 13 . If F occurs then the second bit will be 0 with probability 34 and so Pr(S|F ) = 34 . If F does not occur then the second bit will be 0 with probability 14 and so Pr(S|F ) = 14 . So, by Bayes theorem, Pr(F |S) = Pr(F )Pr(S|F ) Pr(F )Pr(S|F )+Pr(F )Pr(S|F ) = 1 3 × 3 4 1 3 × 3 4 + 2 3 × 1 4 = (14)/( 5 12) = 35 . Questions 22.1 An integer is selected uniformly at random from the set {1, 2, . . . , 15}. What is the probability that it is divisible by 5, given that it is odd? 22.2 A standard die is rolled twice. What is the probability that the first roll is a 1, given that the sum of the rolls is 6? 22.3 A bag contains three black marbles and two white marbles and they are randomly selected and removed, one at a time un- til the bag in empty. Use Bayes’ theo- rem to calculate the probability that the first marble selected is black, given that the second marble selected is black. Lecture 23: Random variables In a game, three standard dice will be rolled and the number of sixes will be recorded. We could let X stand for the number of sixes rolled. Then X is a special kind of variable whose value is based on a random process. These are called random variables. Because the value of X is random, it doesn’t make sense to ask whether X = 0, for exam- ple. But we can ask what the probability is that X = 0 or that X > 2. This is because “X = 0” and “X > 2” correspond to events from our sample space. 23.1 Formal definition Formally, a random variable is defined as a func- tion from the sample space to R. In the example above, X is a function from the process’s sample space that maps every outcome to the number of sixes in that outcome. Example. Let X be the number of 1s in a binary string of length 2 chosen uniformly at random. Formally, X is a function from {00, 01, 10, 11} to {0, 1, 2} such that X(00) = 0, X(01) = 1, X(10) = 1, X(11) = 2. For most purposes, however, we can think of X as simply a special kind of variable. 23.2 Probability distribution We can describe the behaviour of a random vari- able X by listing, for each value x that X can take, the probability that X = x. This gives the probability distribution of the random variable. Again, formally this listing is a function from the values of X to their probabilities. Example. Continuing with the last example, the probability distribution of X is given by Pr(X = x) = 1 4 if x = 0 1 2 if x = 1 1 4 if x = 2. It can be convenient to give this as a table: x 0 1 2 Pr(X = x) 14 1 2 1 4 . Example. A standard die is rolled three times. Let X be the number of sixes rolled. What is the probability distribution of X? Obviously X can only take values in {0, 1, 2, 3}. Each roll there is a six with probability 16 and not a six with probability 56 . The rolls are independent. Pr(X = 0) = 56 × 56 × 56 Pr(X = 1) = (16)( 5 6)( 5 6) + ( 5 6)( 1 6)( 5 6) + ( 5 6)( 5 6)( 1 6) Pr(X = 2) = (16)( 1 6)( 5 6) + ( 1 6)( 5 6)( 1 6) + ( 5 6)( 1 6)( 1 6) Pr(X = 3) = 16 × 16 × 16 So the probability distribution of X is x 0 1 2 3 Pr(X = x) 125216 75 216 15 216 1 216 . 23.3 Independence We have seen that two events are independent when the occurrence or non-occurrence of one event does not affect the likelihood of the other occurring. Similarly two random variables are independent if the value of one does not affect the likelihood that the other will take a certain value. Random variables X and Y are independent if, for all x and y, Pr(X = x ∧ Y = y) = Pr(X = x)Pr(Y = y). Example. An integer is generated uniformly at random from the set {10, 11, . . . , 29}. Let X and Y be its first and second (decimal) digit. Then X and Y are independent random variables be- cause, for x ∈ {1, 2} and {0, 1, . . . , 9}, Pr(X = x ∧ Y = y) = 120 = 12 × 110 = Pr(X = x)Pr(Y = y). 23.4 Operations From a random variable X, we can create new random variables such as X + 1, 2X and X2. These variables work as you would expect them to. Example. If X is the random variable with dis- tribution x −1 0 1 Pr(X = x) 16 1 3 1 2 , then the distributions of X + 1, 2X and X2 are y 0 1 2 Pr(X + 1 = y) 16 1 3 1 2 y −2 0 2 Pr(2X = y) 16 1 3 1 2 y 0 1 Pr(X2 = y) 13 2 3 . 23.5 Sums and products From random variables X and Y we can define a new random variable Z = X + Y . Working out the distribution of Z can be complicated, how- ever. We give an example below of doing this when X and Y are independent. Example. Let X and Y be independent ran- dom variables with x 0 1 2 Pr(X = x) 14 1 2 1 4 y 0 1 2 3 Pr(Y = y) 16 1 3 1 3 1 6 . Let Z = X + Y . To find Pr(Z = z) for some value of z, we must consider all the ways that X + Y could equal z. For example, X + Y = 3 could occur as (X,Y ) = (0, 3), (X,Y ) = (1, 2) or (X,Y ) = (2, 1). Because X and Y are inde- pendent, we can find the probability that each of these occur Pr(X = 0 ∧ Y = 3) = 14 × 16 = 124 , Pr(X = 1 ∧ Y = 2) = 12 × 13 = 16 , Pr(X = 2 ∧ Y = 1) = 14 × 13 = 112 . So, because the three are mutually exclusive, Pr(X = 3) = 1 24 + 1 6 + 1 12 = 7 24 . Doing similar calculations for each possible value, we see that the distribution of Z is z 0 1 2 3 4 5 Pr(Z = z) 124 1 6 7 24 7 24 1 6 1 24 . The distribution of a product of two indepen- dent random variables can be found in a similar way. Finding the distribution of sums or prod- ucts of dependent random variables is even more complicated. In general, this requires knowing the probability of each possible combination of values the variables can take. Questions 23.1 An elevator is malfunctioning. Every minute it is equally likely to ascend one floor, descend one floor, or stay where it is. When it begins malfunctioning it is on level 5. Let X be the level it is on three minutes later. Find the probability distri- bution for X. 23.2 An integer is generated uniformly at ran- dom from the set {11, 12, . . . , 30}. Let X and Y be its first and second (decimal) digit. Are the random variables X and Y independent? 23.3 Let X and Y be independent random vari- ables with distributions x 0 2 Pr(X = x) 14 3 4 y 0 1 2 3 Pr(Y = y) 14 1 4 1 4 1 4 . Find the probability distribution of Z = X + Y . Lecture 24: Expectation and variance A standard die is rolled some number of times and the average of the rolls is calculated. If the die is rolled only once this average is just the value rolled and is equally likely to be 1, 2, 3, 4, 5 or 6. If the die is rolled ten times, then the average might be between 1 and 2 but this is pretty unlikely – it’s much more likely to be between 3 and 4. If the die is rolled ten thou- sand times, then we can be almost certain that the average will be very close to 3.5. We will see that 3.5 is the expected value of a random variable representing the die roll. 24.1 Expected value When we said “average” above, we really meant “mean”. Remember that the mean of a collec- tion of numbers is the sum of the numbers di- vided by how many of them there are. So the mean of x1, . . . , xt is x1+···+xt t . The mean of 2,2,3 and 11 is 2+2+3+114 = 4.5, for example. The expected value of a random variable is calculated as a weighted average of its possible values. If X is a random variable with distribution x x1 x2 · · · xt Pr(X = x) p1 p2 · · · pt , then the expected value of X is E[X] = p1x1 + p2x2 + · · ·+ ptxt. Example. If X is a random variable represent- ing a die roll, then E[X] = 1 6 × 1 + 1 6 × 2 + · · ·+ 1 6 × 6 = 3.5. Example. Someone estimates that each year the share price of Acme Corporation has a 10% chance of increasing by $10, a 50% chance of in- creasing by $4, and a 40% chance of falling by $10. Assuming that this estimate is good, are Acme shares likely to increase in value over the long term? We can represent the change in the Acme share price by a random variable X with distri- bution x −10 4 10 Pr(X = x) 25 1 2 1 10 . Then E[X] = 2 5 ×−10 + 1 2 × 4 + 1 10 × 10 = −1 Because this value is negative, Acme shares will almost certainly decrease in value over the long term. Notice that it was important that we weighted our average using the probabilities here. If we had just taken the average of -10, 4 and 10 we would have gotten the wrong an- swer by ignoring the fact that some values were more likely than others. 24.2 Law of large numbers Our initial die-rolling example hinted that the average of a large number of independent trials will get very close to the expected value. This is mathematically guaranteed by a famous the- orem called the law of large numbers. Let X1, X2, . . . be independent random vari- ables, all with the same distribution and ex- pected value µ. Then lim n→∞ 1 n(X1 + · · ·+Xn) = µ. 24.3 Linearity of expectation We saw in the last lecture that adding random variables can be difficult. Finding the expected value of a sum of random variables is easy if we know the expected values of the variables. If X and Y are random variables, then E[X + Y ] = E[X] + E[Y ]. This works even ifX and Y are not independent. Similarly, finding the expected value of a scalar multiple of a random variable is easy if we know the expected value of the variable. If X is a random variable and s ∈ R, then E[sX] = sE[X]. Example. Two standard dice are rolled. What is the expected total? Let X1 and X2 be random variables repre- senting the first and second die rolls. From the earlier example E[X1] = E[X2] = 3.5 and so E[X1 +X2] = E[X1] + E[X2] = 3.5 + 3.5 = 7. Example. What is the expected number of ‘11’ substrings in a binary string of length 5 chosen uniformly at random? For i = 1, . . . , 4, let Xi be a random vari- able that is equal to 1 if the ith and (i + 1)th bits of the string are both 1 and is equal to 0 otherwise. Then X1 + · · · + X4 is the number of ‘11’ substrings in the string. Because the bits are independent, Pr(Xi = 1) = 1 2 × 12 = 14 and E[Xi] = 1 4 for i = 1, . . . , 4. So, E[X1 + · · ·+X4] = E[X1]+ · · ·+E[X4] = 4 4 = 1. Note that the variables X1, . . . , X4 in the above example were not independent, but we were still allowed to use linearity of expectation. 24.4 Variance Think of the random variables X, Y and Z whose distributions are given below. x −1 99 Pr(X = x) 99100 1 100 y −1 1 Pr(Y = y) 12 1 2 z −50 50 Pr(Z = z) 12 1 2 These variables are very different. Perhaps X corresponds to buying a raffle ticket, Y to mak- ing a small bet on a coin flip, and Z to making a large bet on a coin flip. However, if you only consider expected value, all of these variables look the same – they each have expected value 0. To give a bit more information about a ran- dom variable we can define its variance, which measures how “spread out” its distribution is. If X is a random variables with E[X] = µ, Var[X] = E[(X − µ)2]. So the variance is a measure of how much we expect the variable to differ from its expected value. Example. The variable X above will be 1 smaller than its expected value with probabil- ity 99100 and will be 99 larger than its expected value with probability 1100 . So Var[X] = 99 100 × (−1)2 + 1 100 × 992 = 99. Similarly, Var[Y ] = 12 × (−1)2 + 12 × 12 = 1 Var[Z] = 12 × (−50)2 + 12 × 502 = 2500. Notice that the variance of X is much smaller than the variance of Z because X is very likely to be close to its expected value whereas Z will certainly be far from its expected value. Example. Let X be a random variable with distribution given by x 0 2 6 Pr(X = x) 16 1 2 1 3 . Then the expected value of X is E[X] = 1 6 × 0 + 1 2 × 2 + 1 3 × 6 = 3. So, the variance of X is Var[X] = 1 6 ×(0−3)2+1 2 ×(2−3)2+1 3 ×(6−3)2 = 5. Questions 24.1 Do you agree or disagree with the following statement? “The expected value of a ran- dom variable is the value it is most likely to take.” 24.2 Let X be the sum of 1000 spins of the spin- ner from Lecture 21, and let Y be 1000 times the result of a single spin. Find E[X] and E[Y ]. Which of X and Y do you think would have greater variance? 24.3 Let X be the number of heads occurring when three fair coins are flipped. Find E[X] and Var[X]. Lecture 25: Discrete distributions In this lecture we’ll introduce some of the most common and useful (discrete) probability distributions. These arise in various different real-world situations. 25.1 Discrete uniform distribution This type of distribution arises when we choose one of a set of consecutive integers so that all choices are equally likely. The discrete uniform distribution with pa- rameters a, b ∈ Z (a 6 b) is given by Pr(X = k) = 1b−a+1 for k ∈ {a, a+ 1, . . . , b}. We have E[X] = a+b2 and Var[X] = (b−a+1)2−1 12 . 0 2 4 6 8 10 0 0.05 0.1 0.15 0.2 k P r( X = k ) Uniform distribution with a = 3, b = 8 25.2 Bernoulli distribution This type of distribution arises when we have a single process that succeeds with probability p and fails otherwise. Such a process is called a Bernoulli trial. The Bernoulli distribution with parameter p ∈ [0, 1] is given by Pr(X = k) = { p for k = 1 1− p for k = 0. We have E[X] = p and Var[X] = p(1− p). 25.3 Geometric distribution This distribution gives the probability that, in a sequence of independent Bernoulli trials, we see exactly k failures before the first success. The geometric distribution with parameter p ∈ [0, 1] is given by Pr(X = k) = p(1− p)k for k ∈ N. We have E[X] = 1−pp and Var[X] = 1−p p2 . 0 2 4 6 8 10 12 0 0.1 0.2 0.3 0.4 0.5 k P r( X = k ) Geometric distribution with p = 0.5 Example. If every minute there is a 1% chance that your internet connection fails then the probability of staying online for exactly x con- secutive minutes is approximated by a geometric distribution with p = 0.01. It follows that the expected value is 1−0.010.01 = 99 minutes and the variance is 1−0.01 (0.01)2 = 9900. 25.4 Binomial distribution This distribution gives the probability that, in a sequence of n independent Bernoulli trials, we see exactly k successes. The binomial distribution with parameters n ∈ Z+ and p ∈ [0, 1] is given by Pr(X = k) = ( n k ) pk(1− p)n−k for k ∈ {0, . . . , n}. We have E[X] = np and Var[X] = np(1− p). 0 5 10 15 20 0 0.05 0.1 0.15 0.2 k P r( X = k ) Binomial distribution with n = 20, p = 0.5 Example. If 1000 people search a term on a certain day and each of them has a 10% chance of clicking a sponsored link, then the number of clicks on that link is approximated by a binomial distribution with n = 1000 and p = 0.1. It fol- lows that the expected value is 1000×0.1 = 100 clicks and the variance is 1000× 0.1× 0.9 = 90. 25.5 Poisson distribution In many situations where we know that an aver- age of λ events occur per time period, this dis- tribution gives a good model of the probability that k events occur in a time period. The Poisson distribution with parameter λ ∈ R (where λ > 0) is given by Pr(X = k) = λke−λ k! for k ∈ N. We have E[X] = λ and Var[X] = λ. 0 2 4 6 8 10 12 14 16 0 0.05 0.1 0.15 0.2 k P r( X = k ) Poisson distribution with λ = 4 Example. If a call centre usually receives 6 calls per minute, then a Poisson distribution with λ = 6 approximates the probability it receives k calls in a certain minute. It follows that the expected value is 6 calls and the variance is 6. Questions 25.1 There is a 95% chance of a packet being received after being sent down a noisy line, and the packet is resent until it is received. What is the probability that the packet is received within the first three attempts? 25.2 A factory aims to have at most 2% of the components it makes be faulty. What is the probability of a quality control test of 20 random components finding that 2 or more are faulty, if the factory is exactly meeting its 2% target? 25.3 The number of times a machine needs ad- justing during a day approximates a Pois- son distribution, and on average the ma- chine needs to be adjusted three times per day. What is the probability it does not need adjusting on a particular day? Lecture 26: Recursion Just as the structure of the natural numbers supports induction as a method of proof, it sup- ports induction as a method of definition or of computation. When used in this way, induction is usually called recursion, and one speaks of a recursive definition or a recursive algorithm. 26.1 Recursive Definitions Many well known functions f(n) are most easily defined in the “base step, induction step” for- mat, because f(n + 1) depends in some simple way on f(n). The induction step in the definition is more commonly called the recurrence relation for f , and the base step the initial value. Example. The factorial f(n) = n! Initial value. 0! = 1. Recurrence relation. (k + 1)! = (k + 1)× k! Many programming languages allow this style of definition, and the value of the func- tion is then computed by a descent to the initial value. For example, to compute 4!, the machine successively computes 4! = 4× 3! = 4× (3× 2!) = 4× (3× (2× (1!))) = 4× (3× (2× (1× 0!))) which can finally be evaluated since 0! = 1. Remark. The numbers 4, 3, 2, 1 have to be stored on a “stack” before the program reaches the initial value 0! = 1 which finally enables it to evaluate 4!. Thus a recursive program, though short, may run slowly and even cause “stack overflow.” Example. The Fibonacci sequence 0, 1, 1, 2, 3, 5, 8, . . . The nth number F (n) in this sequence is defined by Initial values. F (0) = 0, F (1) = 1. Recurrence relation. F (k+1) = F (k)+F (k−1). Remark. Using a recursive program to com- pute Fibonacci numbers can easily lead to stack overflow, because each value depends on two previous values (each of which depends on an- other two, and so on). A more efficient way to use the recursive defi- nition is to use three variables to store F (k+1), F (k) and F (k − 1). The new values of these variables, as k increases by 1, depend only on the three stored values, not on all the previous values. 26.2 Properties of recursively defined functions These are naturally proved by induction, using a base step and induction step which parallel those in the definition of the function. Example. For n > 5, 10 divides n! Proof Base step. 5! = 5× 4× 3× 2× 1 = 10× 4× 3, hence 10 divides 5!. Induction step. We have to show 10 divides k! =⇒ 10 divides (k + 1)! Since (k + 1)! = (k + 1) × k! by the recurrence relation for factorial, the induction step is clear, and hence the induction is complete. Example. F (0) + F (1) + · · · + F (n) = F (n + 2)− 1. Proof Base step. F (0) = 0 = F (2)− 1, because F (2) = 1. Induction step. We have to show F (0) + F (1) + · · ·+ F (k) = F (k + 2)− 1 ⇒ F (0) + F (1) + · · ·+ F (k + 1) = F (k + 3)− 1. Well, F (0) + F (1) + · · ·+ F (k) = F (k + 2)− 1 ⇒ F (0) + F (1) + · · ·+ F (k + 1) = F (k + 2) + F (k + 1)− 1, by adding F (k + 1) to both sides ⇒ F (0) + F (1) + · · ·+ F (k + 1) = F (k + 3)− 1 since F (k + 2) + F (k + 1) = F (k + 3) by the Fibonacci recurrence relation This completes the induction. 26.3 Problems with recursive solu- tions Sometimes a problem about n reduces to a prob- lem about n− 1 (or smaller numbers), in which case the solution may be a known recursively defined function. Example. Find how many n-bit binary strings contain no two consecutive 0s. We can divide this problem into two cases. 1. Strings which end in 1, e.g. 1101101. In this case, the string before the 1 (110110 here) can be any (n−1) bit string with no consecutive 0s. 2. Strings which end in 0, e.g. 1011010. In this case, the string must actually end in 10, to avoid consecutive 0s, but the string before the 10 (10110 here) can be any (n− 2) bit string with no consecutive 0s. Thus among strings with no consecutive 0s we find 1. Those with n bits ending in 1 correspond to those with (n− 1) bits. 2. Those with n bits ending in 0 correspond to those with (n− 2) bits. Hence if we let f(n) be the number of such strings with n bits we have f(n) = f(n− 1) + f(n− 2). This is the Fibonacci recurrence relation. It can also be checked that f(1) = 2 = F (3) and f(2) = 3 = F (4), hence it follows (by induction) that f(n) = number of n bit strings with no consecutive 0s = F (n+ 2). Questions 26.1 A function s(n) is defined recursively by Initial value: s(0) = 0 Recurrence relation: s(n+ 1) = s(n) + 2n+ 1 Write down the first few values of s(n), and guess what function s is. 26.2 Check that the function s you guessed in Question 26.1 satisfies s(0) = 0 and s(n+ 1) = s(n) + 2n+ 1 (This proves by induction that your guess is correct.) 26.3 If a sequence satisfies the Fibonacci recur- rence relation, f(n) = f(n− 1) + f(n− 2), must it agree with the Fibonacci sequence from some point onward? Lecture 27: Recursive Algorithms Recursion may be used to define functions whose definition normally involves “ · · · ”, to give algorithms for computing these functions, and to prove some of their properties. 27.1 Sums Example. 1 + 2 + 3 + · · ·+ n This is the function f(n) defined by Initial value. f(1) = 1 Recurrence relation. f(k + 1) = f(k) + (k + 1) Example. 1 + a+ a2 + · · ·+ an This is the function g(n) defined by Initial value. g(0) = 1 Recurrence relation. g(k + 1) = g(k) + ak+1 We can use this relation to prove by induc- tion that g(n) = a n+1−1 a−1 (a formula for the sum of a geometric series), provided a 6= 1. Proof Base step. For n = 0, 1 = g(0) = a 0+1−1 a−1 , as required. Induction step. We want to prove g(k) = ak+1 − 1 a− 1 ⇒ g(k + 1) = ak+2 − 1 a− 1 . Well, g(k) = ak+1 − 1 a− 1 ⇒ g(k + 1) =a k+1 − 1 a− 1 + a k+1 ⇒ g(k + 1) =a k+1 − 1 + (a− 1)ak+1 a− 1 = ak+2 + ak+1 − ak+1 − 1 a− 1 = ak+2 − 1 a− 1 as required. This completes the induction. 27.2 Products Example. 1× 2× 3× · · · × n This is the function n! defined recursively by Initial value. 0! = 1 Recurrence relation. (k + 1)! = (k + 1)× k! 27.3 Sum and product Notation 1 + 2 + 3 + · · ·+ n is written n∑ k=1 k, 1 + a+ a2 + · · ·+ an is written n∑ k=0 ak. Σ is capital sigma, standing for “sum.” 1× 2× 3× · · · × n is written n∏ k=1 k. Π is capital pi, standing for “product.” 27.4 Binary search algorithm Given a list of n numbers in order x1 < x2 < · · · < xn, we can find whether a given number a is in the list by repeatedly “halving” the list. The algorithm binary search is specified recursively by a base step and a recursive step. Base step. If the list is empty, report ‘a is not in the list.’ Recursive step If the list is not empty, see whether its middle element is a. If so, report ‘a found.’ Otherwise, if the middle element m > a, binary search the list of elements < m. And if the middle element m < a, binary search the list of elements > m. 27.5 Correctness We prove that the algorithm works on a list of n items by strong induction on n. Base step. The algorithm works correctly on a list of 0 numbers, by reporting that a is not in the list. Induction step. Assuming the algorithm works correctly on any list of < k + 1 numbers, suppose we have a list of k + 1 numbers. The recursive step either finds a as the mid- dle number in the list, or else produces a list of < k+1 numbers to search, which by assumption it will do correctly. This completes the induction. Remark. This example shows how easy it is to prove correctness of recursive algorithms, which may be why they are popular despite the prac- tical difficulties in implementing them. 27.6 Running time log2 n is the number x such that n = 2x. For example, 1024 = 210, and therefore log2 1024 = 10. Similarly log2 512 = 9, and hence log2 1000 is between 9 and 10. Repeatedly dividing 1000 by 2 (and discard- ing remainders of 1) runs for 9 steps: 500, 250, 125, 62, 31, 15, 7, 3, 1 The 10 halving steps for 1024 are 512, 256, 128, 64, 32, 16, 8, 4, 2, 1 This means that the binary search algorithm would do at most 9 “halvings” in searching a list of 1000 numbers and at most 10 “halvings” for 1024 numbers. More generally, binary search needs at most blog2 nc “halvings” to search a list of n numbers, where blog2 nc is the floor of log2 n, the greatest integer 6 log2 n. Remark. In a telephone book with 1,000,000 names, which is just under 220, it takes at most 20 halvings (using alphabetical order) to find whether a given name is present. 27.7 20 questions A mathematically ideal way to play 20 questions would be to divide the number of possibilities in half with each question. E.g. if the answer is an integer, do binary search on the list of possible answers. If the answer is a word, do binary search on the list of possible answers (ordered alphabetically). If this is done, then 20 questions suffice to find the correct answer out of 220 = 1, 048, 576 possibili- ties. Questions 27.1 Rewrite the following sums using ∑ nota- tion. • 1 + 4 + 9 + 16 + · · ·+ n2 • 1− 2 + 3− 4 + · · · − 2n 27.2 Which of the proofs in this lecture uses strong induction? 27.3 Imagine a game where the object is to identify a natural number between 1 and 220 using 20 questions with YES-NO an- swers. The lecture explains why 20 ques- tions are sufficient to identify any such number. Explain why less than 20 YES-NO ques- tions are not always sufficient. Lecture 28: Recursion, lists and sequences A list or sequence of objects from a set X is a function f from {1, 2, . . . , n} to X, or (if infinite) from {1, 2, 3, . . .} to X. We usually write f(k) as xk and the list as 〈x1, x2, . . . , xn〉, or 〈x1, x2, x3, . . .〉. Thus f(1) = x1 = first item on list f(2) = x2 = second item on list ... The empty list is written 〈〉. Example. 〈m, a, t, h, s〉 is a function f from {1, 2, 3, 4, 5} into the English alphabet, with f(1) = m, f(2) = a, etc. 28.1 Sequences A sequence is also a list, but when we use the term sequence we are usually interested in the rule by which the successive terms t1, t2, . . . are defined. Often, the rule is a recurrence relation. Example. Arithmetic sequence a, a+ d, a+ 2d, a+ 3d, . . . This is defined by Initial value. t1 = a Recurrence relation. tk+1 = tk + d “Unfolding” this recurrence relation from tn back to t1, we see that d gets added n−1 times, hence tn = a+ (n− 1)d. Example. Geometric sequence a, ar, ar2, ar3, . . . Initial value. t1 = a Recurrence relation. tk+1 = rtk “Unfolding” tn, we see that multiplication by r is done n− 1 times, hence tn = ar n−1. The above recurrence relations are called first order because tk+1 depends on only the previous value, tk. (Or, because the values of all terms follow from one initial value.) A second order recurrence relation requires two initial values, and is usually harder to un- fold. Example. A simple sequence in disguise Initial values. t0 = 1, t1 = 2 Recurrence relation. tk+1 = 2tk − tk−1 Calculating the first values, we find t2 = 2t1 − t0 = 2× 2− 1 = 3, t3 = 2t2 − t1 = 2× 3− 2 = 4, t4 = 2t3 − t2 = 2× 4− 3 = 5. It looks like tn = n + 1, and indeed we can prove this by induction. For the base step we have the initial values t0 = 1 = 0 + 1 and t1 = 2 = 1 + 1. We do the induction step by strong induction: assuming tn = n + 1 for all n 6 k, we deduce that tk+1 = k + 2. In fact we have tk+1 = 2tk − tk−1 by the recurrence relation = 2(k + 1)− k by our assumption = 2k + 2− k = k + 2 as required. This completes the induction. Example. Fibonacci sequence Initial values. t0 = 0, t1 = 1 Recurrence relation. tk+1 = tk + tk−1 It is possible to write tn directly as a func- tion of n. The function is not at all obvious, because it involves √ 5: tn = 1√ 5 ((1 +√5 2 )n − (1−√5 2 )n) . We do not go into how to find such a formula in this unit. However, if someone gave you such a formula you could prove it is correct by induc- tion. 28.2 Relations – homogeneous and in- homogeneous Recurrence relations such as tk+1 = 2tk and tk+1 = tk + tk−1 in which each term is a multiple of some tj , are called homogeneous. The characteristic property of any homoge- neous equation is that if tn = f(n) is a solution, then so is tn = cf(n), for any constant c. E.g. tn = 2 n is a solution of tk+1 = 2tk, and so is tn = c2 n, for any constant c. Relations like tk+1 = tk + 3, in which there is a term other than the tj terms, are called in- homogeneous. Homogeneous recurrence relations are usu- ally easier to solve, and in fact there is a gen- eral method for solving them (which we will not cover in this unit). There is no general method for solving in- homogeneous recurrence relations, though they can often be solved if the term other than the tj terms is simple. Questions 28.1 Find the next four values of each of the fol- lowing recurrence relations. What order is each recurrence relation? Which are ho- mogeneous and which are inhomogeneous? (a) rk+1 = rk + k 2, r0 = 0. (b) sk+1 = 3sk − 2sk−1, s0 = 1, s1 = 2. (c) tk+1 = tk+tk−2+1, t0 = 1, t1 = 1, t2 = 1. 28.2 Let Tn be the number of ways of tiling a 2× n strip with 2× 1 tiles (which may be rotated so they are 1 × 2). Find Tn for n = 1, 2, 3, 4. Find a recurrence relation for Tn. 28.3 Let Cn be the number of ways of writ- ing down n pairs of brackets so that they are correctly matched. Find Cn for n = 1, 2, 3, 4. Can you explain why Ck+1 =∑k i=0CiCk−i? Lecture 29: Graphs A graph consists of a set of objects called ver- tices and a list of pairs of vertices, called edges. Graphs are normally represented by pic- tures, with vertex A represented by a dot la- belled A and each edge AB represented by a curve joining A and B. Such pictures are helpful for displaying data or relationships, and they make it easy to recog- nise properties which might otherwise not be no- ticed. The description by lists of vertices and edges is useful when graphs have to be manipulated by computer. It is also a useful starting point for precise definitions of graph concepts. 29.1 Examples of graphs Description Picture Vertices: A,B,C Edges: AB,BC,CA B C A Such a graph, with at most one edge be- tween each pair of vertices, and no vertex joined to itself, is called a simple graph. Description Picture Vertices: A,B,C,D Edges: AB,AB,BC,BC, AD,BD,CD A D C B The two edges AB which join the same pair of vertices are called parallel edges. Description Picture Vertices: A,B Edges: AA,AB,AB A B The edge joining A to A is called a loop. The name multigraph is used when loops and/or parallel edges are allowed. Warning: A graph can be rep- resented by pictures that look very different. This last example could be redrawn as: A B 29.2 Problems given by graphs Many problems require vertices to be connected by a “path” of successive edges. We shall define paths (and related concepts) next lecture, but the following examples illustrate the idea and show how often it comes up. They also show how helpful it is to have graph pictures when searching for paths. 1. Gray Codes The binary strings of length n are taken as the vertices, with an edge joining any two vertices that differ in only one digit. This graph is called the n-cube. E.g. the 2-digit binary strings form a square (a “2-cube”). 00 10 1101 and the 3-digit binary strings form an ordinary cube (a “3-cube”). 000 010 011001 100 110 111101 A Gray code of length n is a path which in- cludes each vertex of the n-cube exactly once. E.g. here is a path in the 3-cube which gives the Gray code 000, 001, 011, 010, 110, 111, 101, 100 000 010 011001 100 110 111101 Remark. The n-cube has been popular as a computer architecture in recent years. Proces- sors are placed at the vertices of an n-cube (for n = 15 or so) and connected along its edges. 2. Travelling salesman problem Vertices are towns. Two towns are joined by an edge labelled ` if there is a road of length ` between them. E.g. D 6 C 5 A 3 6 B 4 The problem is to find a path of minimal length which includes all towns, in this case BADC. 3. Jug problem Suppose we have three jugs, which hold ex- actly 3, 4 and 6 litres respectively. We fill the 6- litre jug, and then pour from one jug to another, always stopping when the jug being poured to becomes full or when the jug being poured from becomes empty. Is it possible to reach a state where one jug contains 1 litre and another contains 5 litres? We represent each possible state by a triple (a, b, c), where a = number of litres in the 3-litre jug b = number of litres in the 4-litre jug c = number of litres in the 6-litre jug Each state is a vertex, and if state (a′, b′, c′) can be reached from (a, b, c) by pouring as described above, we put a directed edge in the graph: (a, b, c) (a′, b′, c′) If (a, b, c) can also be reached from (a′, b′, c′), we join them by an ordinary edge. Then, listing the states that can be reached from (0, 0, 6), we find the following graph. (0, 4, 2) (3, 1, 2) (3, 3, 0) (0, 0, 6) (3, 0, 3) (0, 3, 3) (0, 1, 5) (2, 4, 0) A graph like this, with some directed edges, is called a directed graph or digraph. This particular digraph shows that (0, 0, 6)→ (0, 4, 2)→ (3, 1, 2)→ (0, 1, 5) hence we can start with a full 6-litre jug and in three pourings get 1 litre in the 4-litre jug and 5 litres in the 6-litre jug. Questions 29.1 Write down descriptions of the following graphs picture B C A D D C A B C D A B description 29.2 Draw pictures of graphs with the following descriptions. description A,B,C AA,BB,CC A,B,C,D AB,AC,AD picture 29.3 Use the following picture of the 4-cube to find a Gray code of length 4. Lecture 30: Walks, paths and trails There are several ways of “travelling” around the edges of a graph. A walk is a sequence V1, e1, V2, e2, V3, e3, . . . , en−1, Vn, where each ei is an edge joining vertex Vi to ver- tex Vi+1. (In a simple graph, where at most one edge joins Vi and Vi+1, it is sufficient to list the vertices alone.) If Vn = V1 the walk is said to be closed. A path is a walk with no repeated vertices. A trail is a walk with no repeated edges. 30.1 Examples In these pictures, a walk is indicated by a di- rected curve running alongside the actual edges in the walk. A walk which is not a trail or a path. (Repeated edge, repeated vertex.) A trail which is not a path. (Repeated vertex.) A nonclosed walk and a closed walk. 30.2 Adjacency matrix If two vertices are joined by an edge we say that they are adjacent. A simple graphG with vertices V1, V2, . . . , Vn is described by an adjacency matrix which has (i, j) entry (ith row and jth column) aij given by aij = { 1 if Vi is adjacent to Vj in G, 0 otherwise. For example, the graph V2 V3 V1 has adjacency matrix 0 0 10 0 1 1 1 0 30.3 Adjacency matrix powers The product of matricesa11 a12 · · ·a21 a22 · · · · · · · · · · · · × b11 b12 · · ·b21 b22 · · · · · · · · · · · · is the matrix whose (i, j) entry is ai1b1j + ai2b2j + ai3b3j + · · · , –the “dot product” of the ith row ai1 ai2 ai3 · · · of the matrix on the left with the jth column b1j b2j b3j ... of the matrix on the right. The (i, j) entry in the kth power of the ad- jacency matrix gives the number of walks of length k between Vi and Vj . The length of a walk is the number of “steps” (edges) in it. For example, suppose we want the number of walks of length 2 from V3 to V3 in the graph V2 V3 V1 The adjacency matrix M tells us that the fol- lowing edges exist.· · · · · · 1· · · · · · 1 1 1 0 ← V1 toV3← V2 toV3 ↑ ↑ V3 V3 to to V1 V2 So when we square this matrix, the (3, 3) entry in M2 ( 1 1 0 )11 0 = 1× 1 + 1× 1 = 2 counts the walks from V3 to V3, namely V3 → V1 → V3 and V3 → V2 → V3. Similarly, the (i, j) entry in M2 is the num- ber of walks of length 2 from Vi to Vj . The (i, j) entry in M3 is the number of walks of length 3 from to Vi to Vj , and so on. In fact, M2 ×M = 1 1 01 1 0 0 0 2 × 0 0 10 0 1 1 1 0 has (3, 2) entry ( 0 0 2 )00 1 = 2. Hence the number of walks of length 3 from V3 to V2 is 2. 30.4 General adjacency matrix The adjacency matrix can be generalised to multigraphs by making the (i, j) entry the num- ber of edges from Vi to Vj (special case: count each loop twice). For example, the graph V1 V2 has adjacency matrix N = ( 2 2 2 0 ) and so N2 = ( 8 4 4 4 ) . The (1, 1) entry 2×2+2×2 in N2, for example, indicates that there are 8 walks of length 2 from V1 to V1: 4 walks twice around the loop, and 4 walks from V1 to V2 (2 ways) then V2 to V1 (2 ways). This count distinguishes between different directions around the loop. It may help to re- gard the loop as a pair of opposite directed loops. We can generalise the adjacency matrix M to directed graphs by letting the (i, j) entry be the number of directed edges (which include or- dinary edges) from Vi to Vj . With this definition, the (i, j) entry of Mk gives the number of directed walks of length k from Vi to Vj (i.e. walks that obey the directions of edges). Questions 30.1 Draw the graph/digraph/multigraph with adjacency matrix M = 1 0 10 1 0 1 0 1 , using V1, V2 and V3 as names for the ver- tices corresponding to columns 1, 2 and 3 respectively. 30.2 Calculate M2, and use it to find the num- ber of walks of length 2 from V1 to V3. Does this calculation give the number you would expect from the graph? 30.3 Without any calculation, show that the middle row of any power Mk is (0 1 0). Lecture 31: Degree The degree of a vertex A in a graph G is the number of times A occurs in the list of edges of G. For example, if G is A B then the list of edges is AA, AB, AB, and hence degree(A) = 4. Intuitively speaking, degree(A) is the num- ber of “ends” of edges occurring at A. In par- ticular, a loop at A contributes 2 to the degree of A. 31.1 The handshaking lemma In any graph, sum of degrees = 2× number of edges. The reason for the name is that if each edge is viewed as a handshake, Lecture 31: Degree Lecture 28. Degree The degree of a vertex A in a graph G is the number of times A occurs in the list of edges of G. r'>v---.... For example, if G is � B then the list of edges is AA, AB, AB and hence degree(A) = 4. Intuitively speaking, degree(A) is the number of "ends" of edges occurring at A. In particular, a loop at A contributes 2 to the degree of A. 31.1 The handshaking lemma In any graph, sum of degrees = 2 x number of edges. The reason for the name is that if each edge is viewed as a handshake, . � . then at each vertex V Hence degree(V) = number of hands. sum of degrees = total number of hands = 2 x number of handshakes An important consequence The handshaking lemma implies that in any graph the sum of degrees is even (being 2xsomething). Thus it is impossible, e.g. for a graph to have degrees 1,2,3,4,5. 31.2 The seven Konigsberg bridges of In 18th century Konigsberg there were seven bridges connecting islands in the river to the banks as follows. 56 The question came up: is it possible for a walk to cross all seven bridges without crossing the same bridge twice? An equivalent question is whether there is a trail which includes all edges in the following graph. A � B D 31.3 Euler's solution Euler (1737) observed that the answer is no, be cause 1. Each time a walk enters and leaves a vertex it "uses up" 2 from the degree. 2. Hence if all edges are used by the walk, all vertices except the first and last must have even degree. 3. The seven bridges graph in fact has four ver tices of odd degree. 31.4 Euler's theorem The same argument shows in general that A graph with > 2 odd degree vertices has no trail using all its edges. And a similar argument shows A graph with odd degree vertices has no closed trail using all its edges. (Because in this case the first and last vertex are the same, and its degree is ''used up" by a closed trail as follows: 1 at the start, 2 each time through, 1 at the end.) C then at each vertex V degree(V ) = number of hands. Hence sum of degrees = total number of hands = 2 × number of h ndshakes An import n consequence The handshaking lemma implies that in any graph the sum of de rees is even (being 2×something). Thus it is impossible, e.g. for a graph to have degrees 1,2,3,4,5. 31.2 The seven bridges of Ko¨nigsberg In 18th century Ko¨nigsberg there were seven bridges connecting islands in the river to the banks as follows. The question came up: is it possible for a walk to cross all seven bridges without crossing the same bridge twice? An equivalent question is whether there is a trail which ncludes all edges in the following graph. A D C B 31.3 Euler’s solution Euler (1737) observed that the answer is no, be- cause 1. Each time a walk enters and leaves a ver- tex it “uses up” 2 from the degree. 2. Hence if all edges are used by the walk, all vertic s except the first and last must have even degree. 3. The seven bridges graph in fact has four vertices of odd degree. 31.4 Euler’s theorem The same argument shows in general that A graph with > 2 odd degree vertices has no trail using all its edges. And a similar argument shows A graph with odd degree vertices has no closed trail using all its edges. (Because in this case the first and last vertex are the same, and its degree is “used up” by a closed trail as follows: 1 at the start, 2 each time through, 1 at the end.) 31.5 The converse theorem If, conversely, we have a graph G whose vertices all have even degree, is there a closed trail using all the edges of G? Not necessarily. For example, G might be disconnected : We say a graph is connected if any two of its vertices are connected by a walk ( or equiva- lently, by a trail or a path). We call a trail using all edges of a graph an Euler trail. Then we have A connected graph with no odd degree ver- tices has a closed Euler trail. Such a closed trail can be constructed as follows: 1. Starting at any vertex V1 , follow a trail t1 as long as possible. 2. The trail t1 eventually returns to V1, be- cause it can leave any other vertex it en- ters. (Immediately after the start, V1 has one “used” edge, and hence an odd num- ber of “unused” edges. Any other vertex has an even number of “unused” edges.) 3. If t1 does not use all edges, retrace it to the first vertex V2 where t1 meets an edge not in t1. 4. At V2 add a “detour” to t1 by following a trail out of V2 as long as possible, not using edges in t1. As before, this trail eventually returns to its starting point V2, where we resume the trail t1. Let t2 be the trail t1 plus the detour from V2. 5. If t2 does not use all the edges, retrace t2 to the first vertex V3 where t2 meets an edge not in t2. Add a detour at V3, and so on. 6. Since a graph has only a finite number of edges, this process eventually halts. The result will be a closed trail which uses all the edges (this requires the graph to be connected, since any unused edge would be connected to used ones, and thus would have eventually been used). 31.6 Bridges A bridge in a connected graph G is an edge whose removal disconnects G. E.g. the edge B is a bridge in the following graph. B The construction of an Euler trail is improved by the doing the following (Fleury’s algorithm). • Erase each edge as soon as it is used. • Use a bridge in the remaining graph only if there is no alternative. It turns out, when this algorithm is used, that it is not necessary to make any detours. The im- provement, however, comes at the cost of need- ing an algorithm to recognise bridges. Questions 31.1 For each of the following sequences, con- struct a graph whose vertices have those degrees, or explain why no such graph ex- ists. • 1, 2, 3, 4 • 1, 2, 1, 2, 1 • 1, 2, 2, 2, 1 31.2 A graph G has adjacency matrix 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 2 1 0 0 0 1 2 Decide, without drawing the graph, whether G is connected or not. 31.3 A graph H has adjacency matrix 2 0 1 0 1 0 2 0 1 0 1 0 2 0 1 0 1 0 2 0 1 0 1 0 2 What are the degrees of its vertices? Does H have a closed Euler trail? Lecture 32: Trees A cycle is a closed trail (with at least one edge) that doesn’t repeat any vertex except that it ends where it started. A tree is a connected graph with no cycles. For example, is a tree. 32.1 The number of edges in a tree A tree with n vertices has n− 1 edges. The proof is by strong induction on n. Base step. A tree with 1 vertex has 0 edges (since any loop would itself be a cycle). Induction step. Supposing any tree with j 6 k vertices has j−1 edges, we have to deduce that a tree with k + 1 vertices has k edges. Well, given a tree Tk+1 with k + 1 vertices, we consider any edge e in Tk+1, e.g. A Be Removing e disconnects the ends A and B of e. (If they were still connected, by some path p, then p and e together would form a cycle in Tk+1, contrary to its being a tree.) Thus Tk+1−{e} consists of two trees, say Ti and Tj with i and j vertices respectively. We have i + j = k + 1 but both i, j 6 k, so our induction assumption gives Ti has i− 1 edges, Tj has j − 1 edges. But then Tk+1 = Ti ∪ Tj ∪ {e} has (i− 1) + (j − 1) + 1 = (i+ j)− 1 = k edges, as required. Remarks 1. This proof also shows that any edge in a tree is a bridge. 2. Since a tree has one more vertex than edge, it follows that m trees have m more ver- tices than edges. 3. The theorem also shows that adding any edge to a tree (without adding a vertex) creates a cycle. (Since the graph remains connected, but has too many edges to be a tree.) These remarks can be used to come up with several equivalent definitions of tree. Next we see how any connected graph can be related to trees. 32.2 Spanning trees A spanning tree of a graph G is a tree contained in G which includes all vertices of G. For example, is a spanning tree of Any connected graph G contains a spanning tree. This is proved by induction on the number of edges. Base step. If G has no edge but is connected then it consists of a single vertex. Hence G itself is a spanning tree of G. Induction step. Suppose any connected graph with 6 k edges has a spanning tree, and we have to find a spanning tree of a connected graph Gk+1 with k + 1 edges. If Gk+1 has no cycle then Gk+1 is itself a tree, hence a spanning tree of itself. If Gk+1 has a cycle p we can remove any edge e from p and Gk+1 − {e} is connected (because vertices previously connected via e are still con- nected via the rest of p). Since Gk+1 − {e} has one edge less, it contains a spanning tree T by induction, and T is also a spanning tree of Gk+1. Remark It follows from these two theorems that a graph G with n vertices and n− 2 edges (or less) is not connected. If it were, G would have a spanning tree T , with the same n vertices. But then T would have n− 1 edges, which is impossible, since it is more than the number of edges of G. 32.3 The greedy algorithm Given a connected graph with weighted edges, a minimal weight spanning tree T of G may be constructed as follows. 1. Start with T empty. 2. While T is not a spanning tree for G, add to T an edge ek+1 of minimal weight among those which do not close a cycle in T , together with the vertices of ek+1. This is also known as Kruskal’s algorithm. Remarks 1. T is not necessarily a tree at all steps of the algorithm, but it is at the end. 2. For a graph with n vertices, the algorithm runs for n − 1 steps, because this is the number of edges in a tree with n vertices. 3. The algorithm is called “greedy” because it always takes the cheapest step available, without considering how this affects future steps. For example, an edge of weight 4 may be chosen even though this prevents an edge of length 5 being chosen at the next step. 4. The algorithm always works, though this is not obvious, and the proof is not re- quired for this course. (You can find it, e.g. in Chartrand’s Introductory Graph Theory.) 5. Another problem which can solved by a “greedy” algorithm is splitting a natural number n into powers of 2. Begin by subtracting the largest such power 2m 6 n from n, then repeat the process with n− 2m, etc. Questions 32.1 Which of the following graphs are trees? In each case we insist that m 6= n. • vertices 1, 2, 3, 5, 7 an edge between m and n if m divides n or n divides m • vertices 1, 2, 3, 4, 5 an edge between m and n if m divides n or n divides m • vertices 2, 3, 4, 5, 6 an edge between m and n if m divides n or n divides m 32.2 Find spanning trees of the following graphs (cube and dodecahedron). 32.3 Also find spanning trees of the cube and dodecahedron which are paths. Lecture 33: Trees, queues and stacks To search a graph G systematically, it helps to have a spanning tree T , together with an or- dering of the vertices of T . 33.1 Breadth first ordering The easiest ordering to understand is called breadth first, because it orders vertices “across” the tree in “levels.” Level 0 is a given “root” vertex. Level 1 is the vertices one edge away from the root. Level 2 are the vertices two edges away from the root, . . . and so on. Example. A,B,C,D,E, F,G is a breadth first ordering of A B C DEF G 33.2 Queues Breadth first ordering amounts to putting ver- tices in a queue - a list processed on a “first come, first served” or “first in, first out” basis. • The root vertex is first in the queue (hence first out). • Vertices adjacent to the head vertex v in the queue go to the tail of the queue (hence they come out after v), if they are not al- ready in it. • The head vertex v does not come out of the queue until all vertices adjacent to v have gone in. 33.3 Breadth first algorithm For any connected graph G, this algorithm not only orders the vertices of G in a queue Q, it also builds a spanning tree T of G by attaching each vertex v to a “predecessor” among the ad- jacent vertices of v already in T . An arbitrary vertex is chosen as the root V0 of T . 1. Initially, T = tree with just one vertex V0, Q = the queue containing only V0. 2. While Q is nonempty 2.1. Let V be the vertex at the head of Q 2.2. If there is an edge e = VW in G where W is not in T 2.2.1. Add e and W to T 2.2.2. Insert W in Q (at the tail). 2.3. Else remove V from Q. Remarks 1. If the graph G is not connected, the al- gorithm gives a spanning tree of the con- nected component containing the root ver- tex A, the part of G containing all vertices connected to A. 2. Thus we can recognise whether G is con- nected by seeing whether all its vertices are included when the algorithm termi- nates. 3. Being able to recognise connectedness en- ables us, e.g., to recognise bridges. Example. If G = D B C A E , with root vertex A. Then Q and T grow as follows: Step Q T 1 A A 2 AB B A 3 ABC B A C 4 BC 5 BCD B D A C 6 BCDE B D A C E 7 CDE 8 DE 9 E 33.4 Depth first algorithm This is the same except it has a stack S instead of a queue Q. S is “last in, first out,” so we insert and remove vertices from the same end of S (called the top of the stack). 1. Initially, T = tree with just one vertex V0, S = the stack containing only V0. 2. While S is nonempty 2.1. Let V be the vertex at the top of S 2.2. If there is an edge e = VW in G where W is not in T 2.2.1. Add e and W to T 2.2.2. Insert W in S (at the top). 2.3. Else remove V from S. Remark. The breadth first and depth first al- gorithms give two ways to construct a spanning tree of a connected graph. Example. We use the same G, and take the top of S to be its right hand end. Step S T 1 A A 2 AB B A 3 ABC B A C 4 ABCE B A C E 4 ABCED B D A C E 6 ABCE 7 ABC 8 AB 9 A Questions 33.1 The following list gives the state, at suc- cessive stages, of either a queue or a stack. A AB ABC BC BCD CD D Which is it: a queue or a stack? 33.2 Construct a breadth first spanning tree for the graph DB C A E 33.3 Construct a depth first spanning tree for the graph in Question 33.2. Useful notation Sets of numbers N the set of natural numbers {0, 1, 2, 3, . . .} Z the set of integers {. . . ,−2,−1, 0, 1, 2, . . .} Q the set of rational numbers {ab : a, b ∈ Z, b 6= 0} R the set of real numbers Number Theory a | b a divides b b = qa for some q ∈ Z gcd(a, b) greatest common divisor of a and b a ≡ b (mod n) a and b are congruent modulo n n | (a− b) Logic ¬p not p p ∧ q p and q p ∨ q p or q p→ q p implies q ∀x for all x ∃x there exists x Sets x ∈ A x is an element of A {x : P (x)} the set of x such that P (x) |A| the number of elements in A A ⊆ B A is a subset of B A ∩B A intersect B {x : x ∈ A ∧ x ∈ B} A ∪B A union B {x : x ∈ A ∨ x ∈ B} A−B set difference A minus B {x : x ∈ A ∧ x /∈ B} Functions f : A→ B f is a function from A to B Probability Pr(E) probability of E Pr(A|B) conditional probability of A given B E[X] expected value of X Var[X] variance of X Sums and products b∑ i=a f(i) sum of f(i) from i = a to i = b f(a) + f(a+ 1) + · · ·+ f(b) b∏ i=a f(i) product of f(i) from i = a to i = b f(a)× f(a+ 1)× · · · × f(b) Useful formulas Logic Laws p↔ q ≡ (p→ q) ∧ (q → p) p→ q ≡ (¬p) ∨ q ¬¬p ≡ p p ∧ p ≡ p p ∨ p ≡ p p ∧ q ≡ q ∧ p p ∨ q ≡ q ∨ p p ∧ (q ∧ r) ≡ (p ∧ q) ∧ r p ∨ (q ∨ r) ≡ (p ∨ q) ∨ r p ∧ (q ∨ r) ≡ (p ∧ q) ∨ (p ∧ r) p ∨ (q ∧ r) ≡ (p ∨ q) ∧ (p ∨ r) ¬(p ∧ q) ≡ (¬p) ∨ (¬q) ¬(p ∨ q) ≡ (¬p) ∧ (¬q) p ∧ T ≡ p p ∨ F ≡ p p ∧ F ≡ F p ∨ T ≡ T p ∧ (¬p) ≡ F p ∨ (¬p) ≡ T p ∧ (p ∨ q) ≡ p p ∨ (p ∧ q) ≡ p ¬∀xP (x) ≡ ∃x¬P (x) ¬∃xP (x) ≡ ∀x¬P (x) Ordered selections without repetition n(n− 1) · · · (n− r + 1) = n! (n− r)! Unordered selections without repetition n(n− 1) · · · (n− r + 1) r! = n! r!(n− r)! = ( n r ) Ordered selections with repetition nr Unordered selections with repetition( n+ r − 1 r ) = (n+ r − 1)! r!(n− 1)! Binomial theorem (x+ y)n = ( n 0 ) xny0 + ( n 1 ) xn−1y1 + ( n 2 ) xn−2y2+ · · ·+ ( nn−1)x1yn−1 + (nn)x0yn Conditional probability Pr(A|B) = Pr(A ∩B) Pr(B) Bayes’ theorem Pr(A|B) = Pr(B|A)Pr(A) Pr(B|A)Pr(A) + Pr(B|A)Pr(A) Discrete uniform distribution Pr(X = k) = 1 b− a+ 1 for k ∈ {a, a+ 1, . . . , b} E[X] = a+b2 ,Var[X] = (b−a+1)2−1 12 Bernoulli distribution Pr(X = k) = { p for k = 1 1− p for k = 0 E[X] = p, Var[X] = p(1− p) Geometric distribution Pr(X = k) = p(1− p)k for k ∈ N E[X] = 1−pp , Var[X] = 1−p p2 Binomial distribution Pr(X = k) = ( n k ) pk(1− p)n−k for k ∈ {0, . . . , n} E[X] = np, Var[X] = np(1− p) Poisson distribution Pr(X = k) = λke−λ k! for k ∈ N E[X] = λ, Var[X] = λ
欢迎咨询51作业君