MATH1231 Mathematics 1B MATH1241 Higher Mathematics 1B ALGEBRA NOTES Copyright 2020 School of Mathematics and Statistics, UNSW Sydney ii c©2020 School of Mathematics and Statistics, UNSW Sydney iii Preface Please read carefully. These Notes form the basis for the algebra strand of MATH1231 and MATH1241. However, not all of the material in these Notes is included in the MATH1231 or MATH1241 algebra syllabuses. A detailed syllabus will be uploaded to Moodle. In using these Notes, you should remember the following points: 1. It is essential that you start working right from the beginning of the session and continue to work steadily throughout the session. Make every effort to keep up with the lectures and to do problems relevant to the current lectures. 2. These Notes are not intended to be a substitute for attending lectures or tutorials. The lectures will expand on the material in the notes and help you to understand it. 3. These Notes may seem to contain a lot of material but not all of this material is equally important. One aim of the lectures will be to give you a clearer idea of the relative importance of the topics covered in the Notes. 4. Use the tutorials for the purpose for which they are intended, that is, to ask questions about both the theory and the problems being covered in the current lectures. 5. The theory (i.e. the theorems and proofs) is regarded as an essential part of the Algebra course. A list of the theory that you should know is given on page ix. 6. Some of the material in these Notes is more difficult than the rest. This harder material is marked with the symbol [H]. Material marked with an [X] is intended for students in MATH1241. 7. It is essential for you to do problems which are given at the end of each chapter in addition to the online tutorials that can be found on Moodle. If you find that you do not have time to attempt all of the problems, you should at least attempt a representative selection of them. The problems set in tests and exams will be similar to the problems given in these notes. 8. You will probably find some of the ideas in Chapters 6 and 7 quite difficult at first because they are expressed in a general and abstract manner. However, as you work through the examples in the chapters and the problems at the ends of the chapters you should find that the ideas become much clearer to you. 9. You will be expected to use the computer algebra package Maple in tests and understand Maple syntax and output for the end of term examination. c©2020 School of Mathematics and Statistics, UNSW Sydney iv 10. You should keep these Notes for use in 2nd year subjects on Linear Algebra. Note. These notes have been prepared by a number of members of the University of New South Wales. The main contributors include Peter Blennerhassett, Peter Brown, Shaun Disney, Ian Doust, William Dunsmuir, Peter Donovan, David Hunt, Elvin Moore and Colin Sutherland. Chapter 9 was written by Dr. Thomas Britz based on the notes of Prof. William Dunsmuir. The original problems for this chapter came from MATH1151. They were reorganised and expanded by Dr. Chi Mak. Copyright is vested in The University of New South Wales, c©2020. c©2020 School of Mathematics and Statistics, UNSW Sydney CONTENTS v Contents Preface iii Algebra Syllabus viii Syllabus and lecture timetable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Problem schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Theory in the algebra component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Revision xi Important facts from MATH1131/1141 . . . . . . . . . . . . . . . . . . . . . . . . . . xi Revision problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii 6 VECTOR SPACES 1 6.1 Definitions and examples of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . 2 6.2 Vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 6.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.4 Linear combinations and spans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 6.4.1 Matrices and spans in Rm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 6.4.2 Solving problems about spans . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.5 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6.5.1 Solving problems about linear independence . . . . . . . . . . . . . . . . . . . 27 6.5.2 Uniqueness and linear independence . . . . . . . . . . . . . . . . . . . . . . . 31 6.5.3 Spans and linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.6 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.6.1 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.6.2 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 6.6.3 Existence and construction of bases . . . . . . . . . . . . . . . . . . . . . . . 41 6.7 [X] Coordinate vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.8 [X] Further important examples of vector spaces . . . . . . . . . . . . . . . . . . . . 49 6.8.1 Vector spaces of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 6.8.2 Vector spaces associated with real-valued functions . . . . . . . . . . . . . . . 52 6.8.3 Vector spaces associated with polynomials . . . . . . . . . . . . . . . . . . . . 55 6.9 A brief review of set and function notation . . . . . . . . . . . . . . . . . . . . . . . 62 6.9.1 Set notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.9.2 Function notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.10 Vector spaces and MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 c©2020 School of Mathematics and Statistics, UNSW Sydney vi Problems for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7 LINEAR TRANSFORMATIONS 79 7.1 Introduction to linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.2 Linear maps from Rn to Rm and m× n matrices . . . . . . . . . . . . . . . . . . . . 85 7.3 Geometric examples of linear transformations . . . . . . . . . . . . . . . . . . . . . . 88 7.4 Subspaces associated with linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . 94 7.4.1 The kernel of a map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 7.4.2 Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 7.4.3 Rank, nullity and solutions of Ax = b . . . . . . . . . . . . . . . . . . . . . . 101 7.5 Further applications and examples of linear maps . . . . . . . . . . . . . . . . . . . . 103 7.6 [X] Representation of linear maps by matrices . . . . . . . . . . . . . . . . . . . . . 109 7.7 [X] Matrix arithmetic and linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . 114 7.8 [X] One-to-one, onto and invertible linear maps and matrices . . . . . . . . . . . . . 115 7.8.1 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.9 [X] Proof of the Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 119 7.10 One-to-one, onto and inverses for functions . . . . . . . . . . . . . . . . . . . . . . . 120 7.11 Linear transformations and MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Problems for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8 EIGENVALUES AND EIGENVECTORS 137 8.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.1.1 Some fundamental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 8.1.2 Calculation of eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 141 8.2 Eigenvectors, bases, and diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . 144 8.3 Applications of eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 146 8.3.1 Powers of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8.3.2 Solution of first-order linear differential equations . . . . . . . . . . . . . . . . 149 8.3.3 [X] Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.4 Eigenvalues and MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Problems for Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 9 INTRODUCTION TO PROBABILITY AND STATISTICS 165 9.1 Some Preliminary Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 9.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 9.2.1 Sample Space and Probability Axioms . . . . . . . . . . . . . . . . . . . . . . 170 9.2.2 Rules for Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 9.2.3 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9.2.4 Statistical Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 9.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 9.3.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 9.3.2 The Mean and Variance of a Discrete Random Variable . . . . . . . . . . . . 186 9.4 Special Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.4.1 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 9.4.2 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 c©2020 School of Mathematics and Statistics, UNSW Sydney vii 9.4.3 Sign Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 9.5 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 9.5.1 The mean and variance of a continuous random variable . . . . . . . . . . . . 197 9.6 Special Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 9.6.1 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 9.6.2 [X] The Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 203 9.6.3 Useful Web Applets to Illustrate Probability Reasoning . . . . . . . . . . . . 206 9.7 Probability and MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Problems for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 ANSWERS TO SELECTED PROBLEMS 221 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 INDEX 239 c©2020 School of Mathematics and Statistics, UNSW Sydney viii ALGEBRA SYLLABUS AND LECTURE TIMETABLE The algebra course for both MATH1231 and MATH1241 is based on chapters 6 to 9 of the Algebra Notes. Lecturers will not cover all of the material in these notes in their lectures as some sections of the notes are intended for reference and for background reading. A detailed syllabus and lecture schedule will be uploaded to Moodle. PROBLEM SETS At the end of each chapter there is a set of problems. Some of the problems are very easy, some are less easy but still routine and some are quite hard. To help you decide which problems to try first, each problem is marked with an [R], an [H] or an [X]. The problems marked [R] form a basic set of problems which you should try first. Problems marked [H] are harder and can be left until you have done the problems marked [R]. You do need to make an attempt at the [H] problems because problems of this type will occur on tests and in the exam. If you have difficulty with the [H] problems, ask for help in your tutorial. The problems marked [X] are intended for students in MATH1241 – they relate to topics which are only covered in MATH1241. Extra problem sheets for MATH1241 may be issued in lectures. There are a number of questions marked [M], indicating that Maple is required in the solution of the problem. Questions marked with a [V] have a video solution available from the Moodle course page. c©2020 School of Mathematics and Statistics, UNSW Sydney ix ALGEBRA PROBLEM SCHEDULE Solving problems and writing mathematics clearly are two separate skills that need to be devel- oped through practice. We recommend that you keep a workbook to practice writing solutions to mathematical problems. The range of questions suitable for each week will be provided on Moodle along with a suggestion of specific recommended problems to do before your classroom tutorials. The Online Tutorials will develop your problem solving skills, and give you examples of math- ematical writing. Online Tutorials help build your understanding from lectures towards solving problems on your own. THEORY IN THE ALGEBRA COURSE The theory is regarded as an essential part of this course and it will be examined both in class tests and in the end of year examination. You should make sure that you can give DEFINITIONS of the following ideas: Chapter 6. Subspace of a vector space, linear combination of a set of vectors, span of a set of vectors, linear independence of a set of vectors, spanning set for a vector space, basis for a vector space, dimension of a vector space. Chapter 7. Linear function, kernel and nullity of a linear function, image and rank of a linear function. Chapter 8. Eigenvalue and eigenvector, diagonalizable matrix. Chapter 9. Probability, statistical independence, conditional probability, discrete random vari- able, expected value (mean) of a random variable, variance of a random variable, binomial distri- bution, geometric distribution. You should be able to give STATEMENTS of the following theorems and propositions. Chapter 6. Theorem 1 of §6.3, Propositions 1 and 3 and Theorem 2 of §6.4, Proposition 1 and Theorems 2, 3, 4, 5 and 6 of §6.5, Theorems 1, 2, 3, 4, 5, 6 and 7 of §6.6. Chapter 7. Theorems 2, 3 and 4 of §7.1, Theorem 1 and 2 of §7.2, Proposition 7 and Theorems 1, 5, 8, 9 and 10 of §7.4. Chapter 8. Theorems 1, 2 and 3 of §8.1, Theorem 1 and 2 of §8.2. You should be able to give PROOFS of the following theorems and propositions. Chapter 6. Theorem 2 of §6.4, Theorems 2 and 3 of §6.5, Theorem 2 of §6.6. Chapter 7. Theorem 2 of §7.1, Theorem 1 of §7.2, Theorems 1, 5 and 8 of §7.4. Chapter 8. Theorem 1 of §8.1. c©2020 School of Mathematics and Statistics, UNSW Sydney xc©2020 School of Mathematics and Statistics, UNSW Sydney REVISION xi Revision Some important facts from MATH1131/1141 In the next couple of chapters, we shall frequently refer to some subsets of Rn, such as lines and planes, etc. As well as the two operations addition and multiplication by a scalar, we shall also refer some other operations such as dot and cross products, etc. However, for ease of reading some definitions are restated below. When necessary, you should refer to the 1131/41 Algebra Notes. • Suppose that n > 1. A parametric vector equation of a line in Rn through a point A and parallel to a non-zero vector v is given by x = a+ λv, λ ∈ R, where a is the position vector of A (with respect to the origin O) and x is the position vector of a variable point on the line. • Suppose that n > 2. A parametric vector equation of a plane in Rn through a point A and parallel to two non-zero non-parallel vectors u,v is given by x = a+ λu+ µv, λ, µ ∈ R, where a is the position vector of A and x is the position vector of a variable point on the plane. • Suppose that a and b are two vectors in Rn, n > 1. The dot product is defined by a · b = a1b1 + · · · + anbn. The length of a vector a is defined to be √ a · a. (This definition is equivalent to the one given in the 1131/41 Algebra Notes.) The vectors are said to be orthogonal if a ·b = 0. A set of vectors is said to be an orthonormal set if the lengths of each vectors is 1 and the vectors are mutually orthogonal. The projection of a vector a on the vector b is projba = a · b |b|2 b. • Suppose that a and b are two vectors in R3. An equivalent definition of cross product in determinant form is given by a× b = ∣∣∣∣∣∣ e1 e2 e3 a1 a2 a3 b1 b2 b3 ∣∣∣∣∣∣ , where a = a1a2 a3 , b = b1b2 b3 . c©2020 School of Mathematics and Statistics, UNSW Sydney xii • The following is a point-normal form of a plane in R3 which passes through a point A and has a normal vector n. n · (x− a) = 0, where a is the position vector of A and x is the position vector of a variable point on the plane. Revision problems 1.[R] Let A, B, P be points in R3 with position vectors a = 7−2 3 , b = 1−5 0 and p = 1−1 2 . Let Q be the point on segment between A and B such that AQ = 23AB. i) Find q, the position vector of Q. ii) Find the parametric vector equation of the line that passes through P and Q. 2. [R] Consider the three points A(1, 1, 1), B(2, 0, 3) and C(3,−1, 1). i) Find −−→ AB and −→ AC. ii) Find a parametric vector form of the line through A and B. iii) Find a parametric vector form of the plane through A, B and C. iv) Find −−→ AB ×−→AC. v) Find a point-normal form of the plane through A, B and C. vi) Find a Cartesian equation of the plane through A, B and C. 3. [R] Given the vectors p = 11 −1 and q = 21 −1 , find |p|, |q|, p · q, then the cosine of the angle between p and q. 4. [R] Consider the equation det x− 1 y − 2 z + 11 0 2 2 −1 0 = 0 i) Show that the equation represents the Cartesian equation of a plane. ii) Write the equation in point-normal form. 5. [R] For the points P (1, 2, 0) , Q(1, 3,−1) and R(2, 1, 1), find −−→PQ × −→PR and the area of the triangle with vertices P, Q and R. c©2020 School of Mathematics and Statistics, UNSW Sydney xiii 6. [R] Suppose that A is the point (2,−1, 3) and Π is the plane x = λ 10 1 + µ 1−2 2 for λ, µ ∈ R. i) Find a vector n which is normal to Π. ii) Find the projection of −→ OA on the direction n. iii) Hence find the shortest distance of A from Π. 7. [R] Find the intersection (if any) of the line x = 018 1 + µ 2−3 1 for µ ∈ R and the plane x = 10 4 + λ1 14 1 + λ2 31 −2 for λ1, λ2 ∈ R. 8. [R] Are the planes x = 1 −4 2 3 + λ1 2 1 −2 7 + λ2 −3 1 5 2 for λ1, λ2 ∈ R and x = 2 −4 1 3 + µ1 3 −1 2 4 + µ2 −1 4 2 6 for µ1, µ2 ∈ R parallel? 9. [R] Consider the following system of linear equations where k is a real number. x1 + x2 + 2k x3 = 3 x1 + 3k x2 + x3 = −2 2x1 + 6k x2 + k x3 = 1 Find for which values of k the system has (I) no solutions, (II) a unique solution or (III) infinitely many solutions. c©2020 School of Mathematics and Statistics, UNSW Sydney xiv Answers to the revision problems 1i. 3−4 1 . ii. x = 1−1 2 + λ 2−3 −1 for λ ∈ R. 2i. 1−1 2 , 2−2 0 . ii. x = 11 1 + λ 1−1 2 for λ ∈ R. iii. x = 11 1 + λ 1−1 2 + µ 2−2 0 for λ, µ ∈ R. iv. 44 0 . v. 44 0 · x− 11 1 = 0. vi. x+ y − 2 = 0. 3. √ 3, √ 6, 4, 2 √ 2 3 . 4i. 2x+ 4y − z = 11. ii. 24 −1 · xy z − 00 −11 = 0 5. 0−1 −1 , √2 2 . 6i. 2−1 −2 , ii. −1 9 2−1 −2 , iii. 1 3 . 7. Meet at (6, 9, 4). 8. Planes are not parallel as λ1 2 1 −2 7 + λ2 −3 1 5 2 = µ1 3 −1 2 4 + µ2 −1 4 2 6 only when λ1 = λ2 = µ1 = µ2 = 0. 9. (I) If either k = 2 or k = 1/3, then the system will have no solutions. (II) If k 6= 2 and k 6= 1/3 then the system will a unique solution. (III) There are no k values which give a system with infinitely many solutions. c©2020 School of Mathematics and Statistics, UNSW Sydney 1Chapter 6 VECTOR SPACES But, keeping still the end in view To which I hope to come, I strove to prove the matter true By putting everything I knew Into an Axiom Lewis Carroll, Phantasmagoria. We have studied geometric vectors and column vectors in Chapter 1 and matrices in Chapter 5. What do the following sets have in common? • The set of geometric vectors in a three-dimensional space. • The set of column vectors of n real components, i.e. Rn. • The set of m× n matrices with real entries, i.e. Mmn(R). In each of these sets, we can add two elements and we can multiply an element in the set with a scalar (in this case, a scalar is a real number) and remain inside the set we started in. We say that each set is closed under the two operations — addition and multiplication by a scalar. Such a set together with scalars and the two operations satisfy some fundamental properties. As what we have seen in Chapters 1 and 5, addition of vectors and matrices satisfies the associative and commutative laws. There is a special element 0 in each set such that 0 + v = v + 0 = v for all v in the set. For each v in the set, there is a negative −v such that v + (−v) = (−v) + v = 0. The associative law of scalar multiplication and distributive laws also hold. These are examples of vector spaces which we are going to study in this chapter. In a vector space, each element in the set is called a vector. The set of scalars must be a field, generally the set of real or complex numbers. Besides the above mentioned vector spaces, there are many other examples, such as the set of polynomials with real or complex coefficients, the real functions on a given interval and the differ- entiable functions defined on an interval, which form a vector space. It is perhaps a remarkable fact that each of these quite different kinds of objects obey similar rules for addition and multiplication by a scalar. In the present chapter our main objectives are to develop a general theory of vector spaces and to show how this general theory can be applied to the study of particular vector spaces. Although c©2020 School of Mathematics and Statistics, UNSW Sydney 2 CHAPTER 6. VECTOR SPACES all of the theorems and propositions stated in this chapter are true for all vector spaces, we will concentrate on giving examples and applications of the theory for the vector space Rn. The main reason for this is that in Rn the theoretical results can be more easily understood, as they can often be given a geometric interpretation. Also, Rn is the most commonly used vector space in practical applications. As the mathematics developed in this chapter applies to all vector spaces, it is more abstract and theoretical than that of Chapter 1, where there was always an immediate geometric picture available for all results. Another reason for the difficulty is that an appreciable part of the language of vector spaces will be new to you. Therefore, as with any new language, it is absolutely essential that you make a special effort to learn the definitions of any new words. You will find that many of the fundamental vector space ideas discussed in this chapter, such as linear combination, span, linear independence, basis, dimension, and coordinate vector, are generalisations of ideas that we have already discussed in an informal geometric manner in Chapter 1 for vectors in Rn. In addition, you will also find that the solution of most of the problems in this chapter can be obtained by solving systems of linear equations using Gaussian Elimination developed in Chapter 4. In most cases the details are suppressed, but the reader should check them before attempting the exercises. It is important to keep in mind in this chapter that correct setting out is essential, rather than just computation. When you write down a solution to a question, you must make sure it reads correctly, both logically and mathematically. 6.1 Definitions and examples of vector spaces We start from a mathematical system which consists of the following four things. 1. A non-empty set V of elements called “vectors”. 2. A “vector-addition” rule (usually represented by +) for combining pairs of vectors from V . For vectors v, w ∈ V , the vector formed by adding w to v is denoted by v +w. 3. A field F of “scalars”. For example, F could be the rational numbers Q or the real numbers R or the complex numbers C. There are also other important, but less common, examples of fields which can be used. 4. A “multiplication by a scalar rule” for combining a vector from V and a scalar from F to form a vector. If λ is a scalar and v is an element of V , then λ ∗ v means the result of multiplying v by the scalar λ. The system is then denoted by (V,+, ∗,F). However, if we have no problem in distinguishing the product of two scalars λµ and the multiplication of a vector by a scalar λ ∗ v, we shall omit the symbol ∗. Just as we write 2x instead of 2× x, we shall write λv instead of λ ∗ v. We can now give a formal definition of a vector space. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.1. DEFINITIONS AND EXAMPLES OF VECTOR SPACES 3 Definition 1. A vector space V over the field F is a non-empty set V of vectors on which addition of vectors is defined and multiplication by a scalar is defined in such a way that the following ten fundamental properties are satisfied: 1. Closure under Addition. If u,v ∈ V , then u+ v ∈ V . 2. Associative Law of Addition. If u,v,w ∈ V , then (u+v)+w = u+(v+w). 3. Commutative Law of Addition. If u,v ∈ V , then u+ v = v + u. 4. Existence of Zero. There exists an element 0 ∈ V such that, for all v ∈ V , v+ 0 = v. 5. Existence of Negative. For each v ∈ V there exists an element w ∈ V (usually written as −v), such that v +w = 0. 6. Closure under Multiplication by a Scalar. If v ∈ V and λ ∈ F, then λv ∈ V . 7. Associative Law of Multiplication by a Scalar. If λ, µ ∈ F and v ∈ V , then λ(µv) = (λµ)v. 8. If v ∈ V and 1 ∈ F is the scalar one, then 1v = v. 9. Scalar Distributive Law. If λ, µ ∈ F and v ∈ V , then (λ+ µ)v = λv+ µv. 10. Vector Distributive Law. If λ ∈ F and u,v ∈ V , then λ(u+v) = λu+λv. Note. 1. Each of the basic rules is called an axiom. 2. In axiom 5, −v is a symbol for the negative of v. The vector −v and the vector formed by multiplying v with the scalar −1 are not the same by definition. We shall prove that they are the same later. 3. Formally, axiom 7 says λ ∗ (µ ∗ v) = (λµ) ∗ v. 4. In axiom 9, the addition on the left is the addition of two scalars while the addition on the right is the addition of two vectors. They are different additions. 5. Two systems are the same only when all the four things are the same. However, we seldom discuss different vector spaces with the same set of vectors but we often discuss different sets of vectors with the same set of scalars and the same operations. When there is no confusion, we shall simply call (V,+, ∗,F), the vector space V . c©2020 School of Mathematics and Statistics, UNSW Sydney 4 CHAPTER 6. VECTOR SPACES Example 1 (The Vector Space Rn). The set of vectors is the set of all n-vectors of real numbers, Rn = x : x = x1... xn for x1, . . . , xn ∈ R . The set of scalars is R. Vector addition is defined by x1... xn + y1... yn = x1 + y1... xn + yn . The multiplication of a vector by a scalar λ ∈ R is defined by λ x1... xn = λx1... λxn . To prove that this system is a vector space it is necessary to show that all ten axioms listed in the definition are satisfied by the system. All the axioms are general statements about arbitrary vectors and scalars. We have to prove the axioms are satisfied by any u = u1... un , v = v1... vn , w = w1... wn , and λ, µ ∈ R. 1. Closure under addition. If u,v ∈ Rn then u1 + v1, . . . , un + vn ∈ R because R is closed under addition. Hence u+ v = u1 + v1... un + vn ∈ Rn. 2. Associative law of addition. If u,v,w ∈ Rn then (u1 + v1) + w1 = u1 + (v1 + w1), . . . , (un + vn) + wn = un + (vn + wn) because addition in R is associative. Hence (u+ v) +w = u1 + v1... un + vn + w1... wn = (u1 + v1) + w1... (un + vn) + wn = u1 + (v1 + w1)... un + (vn + wn) = u1... un + v1 + w1... vn + wn = u+ (v +w). c©2020 School of Mathematics and Statistics, UNSW Sydney 6.1. DEFINITIONS AND EXAMPLES OF VECTOR SPACES 5 3. Commutative law of addition. If u,v ∈ Rn then u1 + v1 = v1 + u1, . . . , un + vn = vn + un because addition in R is commutative. Hence, u+ v = u1 + v1... un + vn = v1 + u1... vn + un = v + u. 4. Existence of zero. There is a special element 0 = 0... 0 ∈ Rn called the zero vector which has the property that v+ 0 = v1 + 0... vn + 0 = v1... vn = v, for all v ∈ Rn. 5. Existence of Negative. For each v ∈ Rn there exists an element −v1... −vn ∈ Rn, the negative of v, such that v1... vn + −v1... −vn = v1 − v1... vn − vn = 0. 6. Closure under scalar multiplication. If v ∈ Rn and λ ∈ R then λv1, . . . , λvn ∈ R because R is closed under multiplication. Hence λv ∈ Rn. 7. Associative law of multiplication by a scalar. If λ, µ ∈ R and v ∈ Rn then λ(µv1) = (λµ)v1, . . . , λ(µvn) = (λµ)vn because multiplication in R is associative. Hence, λ(µv) = λ µv1... µvn = λ(µv1)... λ(µvn) = (λµ)v1... (λµ)vn = (λµ) v1... vn = (λµ)v. c©2020 School of Mathematics and Statistics, UNSW Sydney 6 CHAPTER 6. VECTOR SPACES 8. If v ∈ Rn then 1v = 1v1... 1vn = v1... vn = v. 9. Scalar distributive law. If λ, µ ∈ R and v ∈ Rn then (λ+ µ)v1 = λv1 + µv1, . . . , (λ+ µ)vn = λvn + µvn, because of the distributive law in R. We then have (λ+ µ) v1... vn = (λ+ µ)v1... (λ+ µ)vn = λv1 + µv1... λvn + µvn = λv1... λvn + µv1... µvn = λ v1... vn + µ v1... vn . Hence (λ+ µ)v = λv + µv. 10. Vector distributive law. If λ ∈ R and u,v ∈ Rn then λ(u1 + v1) = λu1 + λv1, . . . , λ(un + vn) = λun + λvn, because of the distributive law in R. We then have λ u1 + v1... un + vn = λ(u1 + v1)... λ(un + vn) = λu1 + λv1... λun + λvn = λ u1... un + λ v1... vn . Hence λ(u+ v) = λu+ λv. After we have checked that all ten axioms are satisfied, we can conclude that the system is a vector space, or simply Rn is a vector space over R. ♦ Note. As special cases of Rn, the real number line R, the plane R2 and three-dimensional space R3 are all vector spaces over the real numbers. Example 2 (The Vector Space Cn). The set of vectors is the set of all column vectors with n components of complex numbers Cn = x : x = x1... xn for x1, . . . , xn ∈ C , and the set of scalars is C. Addition is defined by x1... xn + y1... yn = x1 + y1... xn + yn . c©2020 School of Mathematics and Statistics, UNSW Sydney 6.1. DEFINITIONS AND EXAMPLES OF VECTOR SPACES 7 and multiplication of a vector by a scalar λ ∈ C is defined by λ x1... xn = λx1... λxn . To prove that this is a vector space it is necessary to show that the ten vector space axioms are satisfied. This proof is formally identical to that for Rn over R since only basic operations are involved. ♦ Example 3 (The Vector Space Mmn = Mmn(R) of Real Matrices). Mmn is a vector space over R. There is a natural and straightforward generalisation to Mmn(F), where the entries come from the field F. In the most important case of F = R we often suppress the R. Note that here we are thinking of matrices as vectors! Mmn =Mmn(R) = A : A = a11 . . . a1n... . . . ... am1 . . . amn , where aij ∈ R for 1 6 i 6 m, 1 6 j 6 n . Using the notation introduced in Chapter 5, the ijth entry (ith row, jth column entry) of A is denoted by [A]ij . We define “addition of vectors” to be matrix addition where [A+B]ij = [A]ij + [B]ij , for all i, j. Similarly we define the “multiplication (of the vector A) by a scalar λ ∈ R” in terms of the multiplication of a matrix by a scalar. That is, [λA]ij = λ[A]ij for all i, j as in Chapter 5. To check that Mmn is a vector space is routine. All of the properties are included amongst the properties developed for matrices. For example A+B = B +A for matrices of the same size, hence the commutative law holds for the set Mmn regarded as a vector space. The details are left for the reader to check. ♦ Note. Mmn(R) and Mmn(C) are widely used in both quantum physics and chemistry. Example 4 (The Vector Space of Polynomials). One of the most important aspects of vector space theory is that it applies in many quite different situations. The set of all real-valued functions on R forms a vector space, as does the set of all continuous functions. A simpler example, perhaps, is the set P(R) of all real polynomials. Suppose that p is the polynomial given by p(x) = a0 + a1x+ · · ·+ anxn = ∑n k=0 akx k and q is the polynomial given by q(x) = ∑m k=0 bkx k where ak and bk are real. Note that p is a real-valued function while p(x) is the value of the function at x. (You might like to quickly read the brief review of function notation given in Appendix 6.9.) c©2020 School of Mathematics and Statistics, UNSW Sydney 8 CHAPTER 6. VECTOR SPACES We all know how to add and subtract these polynomials, and how to multiply a polynomial by a real number. Their sum is just the polynomial p+ q, where the value of the function at x is (p+ q)(x) = p(x) + q(x) = max(n,m)∑ k=0 (ak + bk)x k, x ∈ R. (Of course we just set any missing coefficient equal to zero to do this sum.) The scalar multiple is the polynomial λp where (λp)(x) = λ(p(x)) = n∑ k=0 λakx k, x ∈ R. The proof that P(R) is a vector space over R is straightforward. For example, if p and q are polynomials, then p + q is also a polynomial (Closure under Addition). The zero element of P(R) is just the polynomial p such that p(x) = 0, for all x ∈ R. It is important when working in P(R) to remember that saying that two polynomials p and q are equal means that p(x) = q(x) for all x ∈ R, or equivalently, that the corresponding coefficients for p and q are equal. Please check the details yourself. ♦ Note. The set P(F) of all polynomials over a field F, with addition and multiplication by a scalar similarly defined, is also a vector space. For any non-negative integers n, the subset of all polynomials of degrees n or less including the zero polynomial is again a vector space. That is, Pn(F) = {p : p is a polynomial over F, degree of p 6 n or p = 0}, with the same addition and multiplication by a scalar as in P(F), is a vector space over F. As a summary, we know that the following are vectors spaces. • Rn over R, where n is a positive integer. • Cn over C, where n is a positive integer. • P(F), Pn(F) over F, where F is a field. Usually F is either Q, R or C. • Mmn(F) over F, where m, n are positive integers and F is a field. Furthermore, the following set, its subset of all continuous function and its subset of all differentiable functions are vector spaces over R. • R[X], the set of all possible real-valued functions with domain X. When we refer to these vectors spaces, we assume the additions and multiplications by scalars are as defined above. However, there are systems with the same set of vectors but different operations. If we want to emphasise that the operations are the ones defined above, we shall use the terms usual addition and usual multiplication by a scalar. We shall concentrate on the vector space Rn. Those who want to see other examples of vector spaces can look at Section 6.8, where other vector spaces are covered in rather more depth. We are going to end this section by an example of a system which is not a vector space. To prove a system is not a vector space, we need to prove that one of the axioms is not satisfied. To disprove a general statement, we only need to give a counterexample to illustrate that the statement is false. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.2. VECTOR ARITHMETIC 9 Example 5. The system (R2,+, ∗,R) with the usual multiplication by a scalar but the addition defined by — for any u = ( u1 u2 ) and v = ( v1 v2 ) in R2, u+ v = ( u1 + v1 2u2 + 2v2 ) , is not a vector space. Solution. We are going to give a counterexample to axiom 9. Let λ = µ = 1 and v = ( 0 1 ) . We have (λ+ µ)v = 2 ( 0 1 ) = ( 0 2 ) 6= ( 0 4 ) = ( 0 + 0 2× 1 + 2× 1 ) = ( 0 1 ) + ( 0 1 ) = λv + µv. Hence, axiom 9 is not satisfied by this system and this system is not a vector space. ♦ 6.2 Vector arithmetic The axioms give a minimal set of rules needed to define a vector space. There are several other useful rules which can be proved directly from the axioms and which are therefore true in all vector spaces. In this section we shall discuss some of these properties and give examples of them for Rn. The first five vector space axioms apply to vector addition, and they are in fact identical to the five basic axioms of addition for integers, real numbers and complex numbers. This means that all the arithmetic properties of vector addition are identical to corresponding properties of addition of numbers. In particular, we have: Proposition 1. In any vector space V , the following properties hold for addition. 1. Uniqueness of Zero. There is one and only one zero vector. 2. Cancellation Property. If u,v,w ∈ V satisfy u+ v = u+w, then v = w. 3. Uniqueness of Negatives. For all v ∈ V , there exists only one w ∈ V such that v+w = 0. Proof. For property 1, Axiom 4 ensures the existence of a zero vector in V . Now assume that two vectors 0 and 0′ are both zero vectors in V . Then, for the reasons given in brackets, we have 0 = 0+ 0′ (axiom 4 applied to the zero vector 0′) = 0′ + 0 (axiom 3) = 0′ (axiom 4 applied to the zero vector 0) Hence, 0 = 0′, and there is only one zero vector in V . For property 2, Axiom 5 ensures the existence of the negative −u. Hence, we have (−u) + (u+ v) = (−u) + (u+w) (axiom 5) [(−u) + u] + v = [(−u) + u] +w (axiom 2) 0+ v = 0+w (axiom 5) v = w (axiom 4) For property 3, assume u andw are both negatives of v. By axiom 5, we have v+u = 0 = v+w. By property 2, we can conclude that v = w. Hence the inverse is unique. c©2020 School of Mathematics and Statistics, UNSW Sydney 10 CHAPTER 6. VECTOR SPACES Example 1. i) The unique zero vector in Rn is 0 = 0... 0 . ii) The negative of a vector is used in solving an equation such as 1 2 3 4 + v = −2 5 1 7 to obtain v = − 1 2 3 4 + −2 5 1 7 = −3 3 −2 3 . ♦ A comparison of the axioms for multiplication of a vector by a scalar with the multiplication properties for fields of numbers (see Chapter 3) also shows strong similarities — the main difference being that in a field, two numbers of the same kind are being multiplied, whereas for vectors the objects being multiplied are of different kinds. As a result, some of the fundamental properties of multiplication of numbers also hold for multiplication of a vector by a scalar. In particular, we have: Proposition 2. Suppose that V is a vector space over a field F, λ ∈ F, v ∈ V , 0 is the zero scalar in F and 0 is the zero vector in V . Then the following properties hold for multiplication by a scalar: 1. Multiplication by the zero scalar. 0v = 0, 2. Multiplication of the zero vector. λ0 = 0. 3. Multiplication by −1. (−1)v = −v (the additive inverse of v). 4. Zero products. If λv = 0, then either λ = 0 or v = 0. 5. Cancellation Property. If λv = µv and v 6= 0 then λ = µ. Proof. We shall prove properties 1 and 3. The readers should write out the proofs for the others as exercises. For property 1, v+ 0 = v = 1v = (1 + 0)v = 1v + 0v = v + 0v. You should check carefully which axioms are required. Finally, by Cancellation Property of vector addition, we have 0v = 0. For property 3, v + (−1)v = 1v + (−1)v = (1 + (−1))v = 0v = 0. Hence by Uniqueness of Negatives (−1)v = −v. Example 2. The properties in the previous proposition are true for all vector spaces. In particular, for vectors in Rn, the results can be easily proved by definitions of the operations and properties of scalars. Such as a) 0 x1... xn = 0... 0 = 0, c©2020 School of Mathematics and Statistics, UNSW Sydney 6.3. SUBSPACES 11 b) (−1) x1... xn = −x1... −xn = − x1... xn , c) If λ x1... xn = 0... 0 , then either λ = 0 or x1... xn = 0... 0 . ♦ 6.3 Subspaces Before reading this section, you should quickly read the brief review of sets given in Appendix 6.9. Although all examples in this section are subsets of Rn, the definitions, theorems and corollaries apply to all vector spaces as stated. In practice, many problems about vectors involve subsets of some vector space. For example, the points on a line in Rn form a subset of Rn, the points on a plane in R3 form a subset of R3, the solutions of a system of m linear equations in n unknowns form a subset of Rn. It is an important problem to determine the conditions under which some subset of a vector space is itself a vector space. It is convenient to begin by looking at some examples. Example 1. The real-number line is a vector space. The question arises whether some subset of the real-number line is a vector space. For example, we might ask if some interval, for example the interval S = [−5, 5] = {x ∈ R : −5 6 x 6 5} is a vector space. Geometrically, the set S represents the line segment shown in Figure 1(b). Solution. The given system is not a vector space, since it is not closed under scalar multiplication. A counterexample — 5 ∈ S, but 5 + 5 = 10 is not an element of S. A picture is given in Figure 1(b). ♦ −10 −5 0 5 10 −10 −5 [ ] 0 5 10 (a) The real number line is a vector space. (b) The interval S = [−5, 5] is not a vector space as 5 ∈ S but 5 + 5 = 10 /∈ S. Figure 1. Example 2. The plane R2 is a vector space. Show that the subset S of R2 given by S = { x = ( x1 x2 ) ∈ R2 : x1 > 0 } is not a vector space. c©2020 School of Mathematics and Statistics, UNSW Sydney 12 CHAPTER 6. VECTOR SPACES Solution. There are several ways to solve this problem since there are several axioms which are not satisfied. One method is to note that ( 1 0 ) ∈ S, whereas −1 ( 1 0 ) = (−1 0 ) 6∈ S. Hence the set S is not closed under scalar multiplication and so S is not a vector space. ♦ Note. Geometrically, the subset S contains the position vectors of all the points in the right half- plane as shown. 0 x1 x2 1−1 Figure 2. Example 3. Show that the subset of R2 given by S1 = { x = ( x1 x2 ) ∈ R2 : x1 + x2 = 4 } is not a vector space, whereas the subset given by S2 = { x = ( x1 x2 ) ∈ R2 : x1 + x2 = 0 } is a vector space. (Geometrically, S1 represents a line in R 2 which does not pass through the origin, whereas S2 represents a line which does pass through the origin (see Figure 3).) Solution. The vector v = ( 0 0 ) 6∈ S1, since 0 + 0 6= 4 so S1 is not a vector space. It is possible to show that S2 is a vector space by the usual time-consuming and tedious process of checking that it satisfies all ten of the vector space axioms. ♦ 4 4 x1 x2 0 Figure 3(a): The line x1 + x2 = 4 is not a vector space. x1 x2 0 Figure 3(b): The line x1 + x2 = 0 is a vector space. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.3. SUBSPACES 13 Note. Points in an n-dimensional space are represented by vectors in Rn. When we say a set of vectors S in Rn represents a line, or simply S is a line in Rn, we mean S is set of the position vectors of all the points on the line in the n-dimensional space. In Examples 1, 2 and 3 we have asked whether certain subsets of the vector spaces R and R2 are themselves vector spaces. Of course the operations of the systems are the usual addition and the usual multiplication by a scalar. We have seen that it is usually fairly simple to show that a subset is not a vector space, but it will be time-consuming and tedious to show that a given subset is a vector space by checking all ten axioms. We shall now develop a simple general test for determining whether a given subset of a vector space is itself a vector space. We first make the following definitions. Definition 1. A subset S of a vector space V is called a subspace of V if S is itself a vector space over the same field of scalars as V and under the same rules for addition and multiplication by scalars. In addition if there is at least one vector in V which is not contained in S, the subspace S is called a proper subspace of V . A simple test for a subspace is given by the following theorem. Theorem 1 (Subspace Theorem). A subset S of a vector space V over a field F, under the same rules for addition and multiplication by scalars, is a subspace of V if and only if i) The vector 0 in V also belongs to S. ii) S is closed under vector addition, and iii) S is closed under multiplication by scalars from F. Proof. We first note that if S is a subspace then it is a vector space. Conversely, suppose S contains the zero vector and the two closure axioms 1 and 6 are satisfied by the elements of S. Every element of S is an element of V because S is a subset of V . Furthermore, S and V are under the same operations, the vector space axioms 2, 3, 7, 8, 9, 10 are automatically satisfied by all elements of S. Since S contains the zero vector, if v ∈ S then, 0+ v = 0 (since this is true in V and hence in S, so axioms 4 follows. Finally, if v ∈ S then v ∈ V . Hence, from part 3 of Proposition 2 of 6.2, we have −v = (−1)v. But, as S is closed under multiplication by a scalar, we have (−1)v ∈ S, and hence −v ∈ S. Thus, axiom 5 is satisfied for all vectors in S. The proof is complete. If we want to check if S is a subspace of V , we should first check if the zero vector of V is in S. If the zero vector is in S we can proceed to verify the two closure axioms. Otherwise, we can draw a conclusion that S is not a subspace of V . c©2020 School of Mathematics and Statistics, UNSW Sydney 14 CHAPTER 6. VECTOR SPACES Example 4. Prove that the set S = x1x2 x3 : 2x1 − x2 + 4x3 = 0 is a vector subspace of R3. Solution. As 2(0)− 0 + 4(0) = 0, the zero vector of R3 is in S. We now proceed to show the two closure axioms. For any vectors u = u1u2 u3 , v = v1v2 v3 ∈ S and λ ∈ R, we have 2u1 − u2 + 4u3 = 0 (1) 2v1 − v2 + 4v3 = 0 (2) If we add (1) and (2), we have (2u1 − u2 + 4u3) + (2v1 − v2 + 4v3) = 0 + 0. Hence, we obtain 2(u1 + v1)− (u2 + v2) + 4(u3 + v3) = 0, and so u+ v = u1 + v1u2 + v2 u3 + v3 ∈ S. Thus S is closed under addition. Now we multiply both sides of (2) by λ, we have λ(2v1 − v2 + 4v3) = λ0 i.e. 2(λv1)− λv2 + 4(λv3) = 0. Hence λv = λv1λv2 λv3 ∈ S and so S is closed under multiplication by a scalar. By the Subspace Theorem, the set S is a vector subspace of R3. ♦ Example 5. Prove that a line in Rn is a subspace of Rn if and only if it passes through the origin. Solution. Suppose that S represents a line in Rn. If 0 /∈ S, then S is not a subspace. Hence, a line which does not pass through the origin is not a vector subspace. If S is a line through the origin, we can write S = {x ∈ Rn : x = tv, t ∈ R}, where v is a fixed non-zero vector in Rn. To check if S is a subspace we check the two closure axioms. Closure under addition. If x1,x2 ∈ S then x1 = t1v and x2 = t2v for some t1, t2 ∈ R. Hence, x1 + x2 = (t1 + t2)v = t ′v, where t′ = t1 + t2 ∈ R. Thus, x1 + x2 ∈ S, and hence S is closed under addition. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.3. SUBSPACES 15 Closure under multiplication by a scalar. We have x = tv for some t ∈ R, and hence, if λ ∈ R, λx = λ(tv) = (λt)v = t′′v, where t′′ = λt ∈ R. Hence λx ∈ S, and thus S is closed under multiplication by a scalar. Therefore, by the Subspace Theorem, the line S is a subspace of Rn if it passes through the origin. The proof is complete. ♦ A similar result to that given for lines in Example 5 also holds for planes. Example 6. Prove that a plane in Rn is a subspace of Rn if and only if it passes through the origin. Solution. If the plane does not pass through the origin, then it does not contain the zero vector, and hence is not a subspace. If the plane passes through the origin, then it is represented by S = {x ∈ Rn : x = s1v1 + s2v2, s1, s2 ∈ R}, where v1 and v2 are fixed, non-parallel vectors in R n. We now check closure under addition and under multiplication by a scalar. Closure under addition. If x1,x2 ∈ S, then x1 = s1v1 + s2v2 and x2 = t1v1 + t2v2 for some s1, s2, t1, t2 ∈ R. Therefore x1 + x2 = (s1 + t1)v1 + (s2 + t2)v2 = sv1 + tv2, where s = s1 + t1 and t = s2 + t2 are both real numbers. Thus x1 + x2 ∈ S, and hence S is closed under addition. The proof that S is closed under multiplication by a scalar is similar and is left as an exer- cise. ♦ In practice, some of the most important subspaces of Rn are connected with systems of linear equations, that is, with the matrix equation Ax = b. Here is an important example of this. follows. Example 7. Let A be an m × n matrix with real entries. Show that the subset S of Rn which consists of all solutions of the matrix equation Ax = b for given b ∈ Rm is a subspace of Rn if and only if b = 0. Formally, the set of all solutions of Ax = b is given by S = {x ∈ Rn : Ax = b}. Solution. We first consider the case b 6= 0. Then 0 ∈ Rn is not a solution of Ax = b as A0 = 0 6= b, and hence S does not contain the zero vector. Thus S is not a subspace. We next examine the case b = 0. Then S is the set of solutions of Ax = 0. We use the Subspace Theorem to show that S is a subspace. c©2020 School of Mathematics and Statistics, UNSW Sydney 16 CHAPTER 6. VECTOR SPACES Closure under addition. If x ∈ S and y ∈ S, then Ax = 0 and Ay = 0, and hence A(x+ y) = Ax+Ay = 0+ 0 = 0. Thus x+ y ∈ S and S is closed under addition. Closure under multiplication by a scalar. If x ∈ S, we have Ax = 0, and hence for all λ ∈ R, A(λx) = λ(Ax) = λ0 = 0. Thus λx ∈ S and S is closed under scalar multiplication. The result is proved. ♦ Note. Example 7 shows that the set of solutions of the matrix equation Ax = 0 is a subspace of Rn. This subspace, which is of considerable practical importance, is called the kernel of the matrix A (see Section 7.4.1). An important theoretical and practical problem concerning vector spaces is that of finding all their subspaces. For example, it can be shown that the only subspaces of R2 are (1) the origin, (2) lines through the origin, and (3) R2 itself. Similarly, for R3 the only subspaces (see Example 10 of Section 6.6) are (1) the origin, (2) lines through the origin, (3) planes through the origin, and (4) R3 itself. A listing of subspaces can be given for any vector space. However, before we can investigate this problem satisfactorily, we require further machinery. This machinery will be developed in Sections 6.4 and 6.5. In vector spaces other than Rn it may be difficult to get a good geometric feel for which subsets are subspaces. Nonetheless, the Subspace Theorem allows one a simple way to check whether a certain set is a subspace or not. Example 8. Let V = P(R), the set of all real polynomials. Let P2(R) denote the set of all real polynomials of degree less than or equal to 2. That is P2(R) = {p ∈ P(R) : p(x) = a0 + a1x+ a2x2 for some a0, a1, a2 ∈ R}. Show that P2(R) is a subspace of P(R). Solution. Clearly P2(R) contains the zero polynomial. Suppose then that p, q ∈ P2(R). Then there exist coefficients a0, a1, a2, b0, b1, b2 ∈ R such that p(x) = a0 + a1x+ a2x 2, q(x) = b0 + b1x+ b2x 2. Now (p+ q)(x) = (a0+ b0)+ (a1+ b1)x+(a2+ b2)x 2 which is another polynomial of degree less than or equal to 2, i.e. p+ q ∈ P2(R). Suppose that p ∈ P2(R) as above and that λ ∈ R. Then (λp)(x) = (λa0) + (λa1)x + (λa2)x2, and so λp ∈ P2. Thus, by the Subspace Theorem, P2(R) is a subspace of P(R). ♦ Suppose that P(F) is the set of all polynomials over F. We shall show in Section 6.8 that for any n, the set Pn(F) consisting of all polynomials over F of degree less than or equal to n is also a subspace of P(F). Example 9. Let S denote the set of all real polynomials of degree exactly 3. Show that S is not a subspace of P(R). Solution. This set is not closed under either addition or scalar multiplication! For example, the polynomial p given by p(x) = x3 is in S, but 0p = 0 which does not lie in S. Also, for example, (x3 + x2 + x) + (−x3 + x+ 3) 6∈ S. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 6.4. LINEAR COMBINATIONS AND SPANS 17 6.4 Linear combinations and spans The two fundamental vector space operations are addition and multiplication by a scalar. By combining these two operations we arrive at the important idea of a sum of scalar multiples of vectors. This leads to the ideas of “linear combination” and “span”: a linear combination of a given set of vectors is a sum of scalar multiples of the vectors and the span of a given set of vectors is the set of all linear combinations of the vectors. We defined linear combinations and span of two vectors in Chapter 1. These ideas are used to develop the parametric vector forms for planes. In particular, the span of two non-parallel vectors is a plane through the origin. In this section, we generalise the ideas to a finite set of vectors. The formal definition of ‘linear combination’ is as follows. Definition 1. Let S = {v1, . . . ,vn} be a finite set of vectors in a vector space V over a field F. Then a linear combination of S is a sum of scalar multiples of the form λ1v1 + · · ·+ λnvn with λ1, . . . , λn ∈ F. Example 1. The vector ( 3 −4 ) is a linear combination of the vectors in the set{( 1 1 ) , ( 2 3 ) , ( 1 −1 )} in R2 because ( 3 −4 ) = 2 ( 1 1 ) + (−1) ( 2 3 ) + 3 ( 1 −1 ) . ♦ We know that a vector space (and therefore any subspace of a vector space) is closed under addition and multiplication by scalars, so we would expect that it would also be closed under the operation of forming linear combinations. This is confirmed by the following theorem. The proof of the theorem (which uses induction) is left as an exercise (Problem 36). Proposition 1 (Closure under Linear Combinations). If S is a finite set of vectors in a vector space V , then every linear combination of S is also a vector in V . The formal definition of ‘span’ is as follows. Definition 2. Let S = {v1, . . . ,vn} be a finite set of vectors in a vector space V over a field F. Then the span of the set S is the set of all linear combinations of S, that is, span (S) = span (v1, . . . ,vn) = {v ∈ V : v = λ1v1 + · · · + λnvn for some λ1, . . . , λn ∈ F}. Example 2. The span of a single non-zero vector v in Rn is a line through the origin. In Chapter 1 we defined “the line in Rn spanned by v” to mean the set S = {x ∈ Rn : x = λv, for some λ ∈ R}. This set is just span (v) . ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 18 CHAPTER 6. VECTOR SPACES Example 3. If {v,w} is a pair of non-zero, non-parallel vectors in Rn then span (v,w) is a plane containing the origin. ♦ The following important theorem tells us that the span of a finite non-empty set of vectors in a vector space V is not only a subset of V , it is always a subspace of V . Theorem 2 (A span is a subspace). If S is a finite, non-empty set of vectors in a vector space V , then span(S) is a subspace of V . Further, span(S) is the smallest subspace containing S (in the sense that span(S) is a subspace of every subspace which contains S). Proof. We first note that 0 ∈ S, since we may take each scalar to be zero. Proposition 1 tells us that every linear combination of S is a vector in V , so span(S) is a subset of V . To prove that span(S) is a subspace we will use the Subspace Theorem, so we set out to prove that span(S) is closed under addition and under multiplication by scalars. Let S be the set S = {v1, . . . ,vn} where all vj belong to V . To show closure under addition, suppose u,w ∈ span (S). Then u = λ1v1 + · · ·+ λnvn for some λ1, . . . , λn ∈ F and w = µ1v1 + · · ·+ µnvn for some µ1, . . . , µn ∈ F, so u+w = (λ1 + µ1)v1 + · · ·+ (λn + µn)vn with λ1 + µ1, . . . , λn + µn ∈ F. This shows that u+w belongs to span (S), so span (S) is closed under addition. To prove closure under multiplication by a scalar, suppose u ∈ span (S) and λ ∈ F. Then λu = λ(λ1v1 + · · · + λnvn) = (λλ1)v1 + · · ·+ (λλn)vn, where λλ1, . . . , λλn ∈ F. This shows that λu belongs to span (S), so span (S) is closed under multiplication by scalars. We have now proved that span (S) is a subspace of V . To show that it is the smallest subspace of V containing S, suppose W is any subspace of V containing S. Then W is itself a vector space containing S and, by what we have just proved, span (S) is a subspace of W . This completes the proof by showing that span (S) is a subspace of every subspace of V containing S. Example 4. If v is a non-zero vector in R3, then the line span (v) is a subspace of every vector space which contains v. In particular, it is a subspace of R3 and of every plane through the origin parallel to v. Further, there is no subset of the line span (v) which both contains v and is a vector space. Thus, for example, a line segment cannot be a vector space. We have already seen a special case of this result in Example 1 of Section 6.3 and Figure 1. ♦ We often need to find a set S in a vector space V such that span (S) is the whole of V . Definition 3. A finite set S of vectors in a vector space V is called a spanning set for V if span (S) = V or equivalently, if every vector in V can be expressed as a linear combination of vectors in S. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.4. LINEAR COMBINATIONS AND SPANS 19 Note also that we often say that “S spans V ” instead of “S is a spanning set for V ”. Example 5. As shown in Chapter 1, every geometric vector a in three dimensions can be written as a linear combination a = a1i+ a2j+ a3k for a1, a2, a3 ∈ R, where i, j and k are the unit vectors along the directions of the three coordinate axes. Therefore {i, j,k} is a spanning set for the vector space of all geometric vectors in three dimensions. ♦ Example 6. The set S = 11 3 , −12 −2 is a spanning set of the vector space x ∈ R3 : x = λ 11 3 + µ −12 −2 , λ, µ ∈ R . The set S′ = 11 3 , −12 −2 , 03 1 also spans the above vector space. Obviously, the third vector in S′ is the sum of the other two, so span (S) = span (S′). Thus the third vector in S′ is somewhat redundant. ♦ Example 7. Let v be a fixed non-zero vector in Rn. The spanning set of the vector space {x ∈ Rn : x = λv, λ ∈ R} is {v}. ♦ Example 8. Every vector x1... xn ∈ Rn can be written as x = x1e1 + · · ·+ xnen. This expresses x as a linear combination of the set {e1, . . . , en}, where e1 = 1 0 ... 0 , e2 = 0 1 ... 0 , . . . , en = 0 ... 0 1 . Thus Rn = span (e1, . . . , en) and the set {e1, . . . , en} spans Rn. ♦ Example 9. Let Pn denote the space of polynomials of degree less than or equal to n. Every polynomial p ∈ Pn can be written as a linear combination of the polynomials {1, x, x2, . . . , xn}, so Pn = span ( 1, x, x2, . . . , xn ) . We shall see later that there is no finite set of vectors whose span is all of P (the vector space of all polynomials). ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 20 CHAPTER 6. VECTOR SPACES 6.4.1 Matrices and spans in Rm We want to have an effective way to tell whether or not a given vector in Rm belongs to the span of a set S = {v1, . . . ,vn}. From the definition of span, we know that b belongs to span (S) if and only if there are λ1, . . . , λn ∈ R such that b = λ1v1 + · · · + λnvn. This equivalent to the condition that there is at least one solution to the vector equation x1v1 + · · ·+ xnvn = b, where x1, . . . , xn are the unknowns. This vector equation represents a set of simultaneous linear equations in n unknowns. Therefore the question of whether b belongs to span (S) is a question of whether or not a particular set of linear equations has a solution. This is the sort of question which we studied in detail in MATH1131/41. Furthermore, suppose that v1 = a11... am1 , v2 = a12... am2 , . . . , vn = a1n... amn , and x = x1... xn . If that A is the m× n matrix whose columns are the vectors v1, . . . ,vn then Ax = a11 · · · a1n... . . . ... am1 · · · amn x1... xn = a11x1 + · · · + a1nxn... am1x1 + · · · + amnxn = x1 a11... am1 + · · ·+ xn a1n... amn = x1v1 + · · · + xnvn. As a result, we have the following proposition. Proposition 3 (Matrices, Linear Combinations and Spans). If S = {v1, . . . ,vn} is a set of vectors in Rm and A is the m× n matrix whose columns are the vectors v1, . . . ,vn then a) a vector b in Rm can be expressed as a linear combination of S if and only if it can be expressed in the form Ax for some x in Rn, b) a vector b in Rm belongs to span (S) if and only if the equation Ax = b has a solution x in Rn. Example 10. For the set of three vectors v1 = 0 5 3 6 , v2 = 1 3 4 5 , v3 = −2 −3 −5 −6 ∈ R4, we let A = 0 1 −2 5 3 −3 3 4 −5 6 5 −6 and x = x1x2 x3 . By expanding each side, it can easily be checked that Ax = x2 − 2x3 5x1 + 3x2 − 3x3 3x1 + 4x2 − 5x3 6x1 + 5x2 − 6x3 = x1 0 5 3 6 + x2 1 3 4 5 + x3 −2 −3 −5 −6 . c©2020 School of Mathematics and Statistics, UNSW Sydney 6.4. LINEAR COMBINATIONS AND SPANS 21 In particular, A 11 3 = 1 0 5 3 6 + 1 1 3 4 5 + 3 −2 −3 −5 −6 = −5 −1 −8 −7 . Let us denote the vector −5 −1 −8 −7 by b. Hence x = 11 3 is a solution of Ax = b if and only if b can be written as the linear combination v1 + v2 + 3v3. ♦ Example 11. If A is an m× n matrix and ej is the jth standard basis vector in Rn then Aej = aj, where aj is the jth column of A. In the case of the matrix A of the last example, we find by direct matrix multiplication that Ae1 = 0 1 −2 5 3 −3 3 4 −5 6 5 −6 10 0 = 0 5 3 6 = a1; A 01 0 = 1 3 4 5 = a2; A 00 1 = 2 −3 −5 −6 = a3. ♦ When applying the results of Proposition 3, it is convenient to have a special name for the subspace of Rm spanned by the columns of a given m× n matrix. Definition 4. The subspace of Rm spanned by the columns of an m×n matrix A is called the column space of A and is denoted by col(A). 6.4.2 Solving problems about spans By Proposition 3, a vector b in Rm lies in the span of a set S = {a1, . . . ,an} in Rm if and only if the equation Ax = b has a solution, where A is the matrix with columns a1, . . . ,an. The following examples show how to apply this knowledge to problems about spans in Rm. Example 12. Is the vector b = 1 4 1 2 in the span of the set S = 1 3 4 2 , −4 −8 −12 6 ? In geometric terms, the question is asking whether the point (1, 4, 1, 2) lies on the plane through the origin parallel to 1 3 4 2 and −4 −8 −12 6 . c©2020 School of Mathematics and Statistics, UNSW Sydney 22 CHAPTER 6. VECTOR SPACES Solution. Let A be the matrix whose columns are the members of S, so A = 1 −4 3 −8 4 −12 2 6 . As a consequence of Proposition 3, we know that b belongs to span (S) if and only if the equation Ax = b has a solution. We form the augmented matrix (A|b) for this system and reduce it to row-echelon form. 1 −4 1 3 −8 4 4 −12 1 2 6 2 R2 = R2 − 3R1−−−−−−−−−−−−−−−→ R3 = R3 − 4R1 R4 = R4 − 2R1 1 −4 1 0 4 1 0 4 −3 0 14 0 R3 = R3 −R2−−−−−−−−−−−−−−→ R4 = R4 − 72R2 1 −4 1 0 4 1 0 0 −4 0 0 −72 R4 = R4 − 78R3−−−−−−−−−−−−−−−→ 1 −4 1 0 4 1 0 0 −4 0 0 0 . Since the right hand column is a leading column, the system has no solution. Therefore b does not belong to the span of S. ♦ Example 13. Find conditions which are necessary and sufficient to ensure that a vector b in R3 belongs to the span of the set S = {v1,v2,v3} where v1 = 12 3 , v2 = 11 −1 , v3 = −10 5 . Hence, determine if the vector v = 21 −1 ∈ span (S). Then give a geometric interpretation of the span. Solution. By Proposition 3, the vector b belongs to span (S) if and only if there is a solution to the system of equations Ax = b, where the three columns of A are the vectors v1, v2, v3. We reduce the augmented matrix (A|b) to row-echelon form. 1 1 −1 b12 1 0 b2 3 −1 5 b3 R2 = R2 − 2R1−−−−−−−−−−−−−−−→ R3 = R3 − 3R1 1 1 −1 b10 −1 2 b2 − 2b1 0 −4 8 b3 − 3b1 R3 = R3 − 4R2−−−−−−−−−−−−−−−→ 1 1 −1 b10 −1 2 b2 − 2b1 0 0 0 5b1 − 4b2 + b3 . The system represented by this augmented matrix has a solution if and only if 5b1 − 4b2 + b3 = 0. Therefore b belongs to span (S) if and only if this condition is satisfied. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.4. LINEAR COMBINATIONS AND SPANS 23 To check that v is in the span or not, we substitute the components of v in the condition. Since 5(2) − 4(1) + 1 = 7 6= 0, so v is not in span (S). To get a geometric interpretation of this result, note that a vector b is in the span if and only if its components satisfy the Cartesian equation 5x1 − 4x2 + x3 = 0 which is a plane through the origin with normal 5−4 1 . Therefore span (S) is this plane. ♦ Note. As a check, note that each of the vectors v1, v2, v3 belongs to span (v1,v2,v3) and should satisfy the above condition. On substituting v1 = 12 3 for b1b2 b3 we find 5(1) − 4(2) + 3 = 0, so v1 does satisfy the above condition. You can check for yourself that v2 and v3 also satisfy this condition. Example 14. Determine whether or not the set S = {v1,v2,v3,v4}, is a spanning set for R3, where v1 = 12 3 , v2 = 11 −1 , v3 = −10 5 and v4 = 23 5 . Solution. S is a spanning set for R3 if and only if every vector b ∈ R3 belongs to span (S). By Proposition 3, every vector b ∈ R3 belongs to span (S) if and only if the system Ax = b has a solution for every b in R3, where A is the matrix whose columns are the members of S. By row operations we can reduce the augmented matrix of the system to row-echelon form 1 1 −1 2 b12 1 0 3 b2 3 −1 5 5 b3 R2 = R2 − 2R1−−−−−−−−−−−−−−−→ R3 = R3 − 3R1 1 1 −1 2 b10 −1 2 −1 b2 − 2b1 0 −4 8 −1 b3 − 3b1 R3 = R3 − 4R2−−−−−−−−−−−−−−−→ 1 1 −1 2 b10 −1 2 −1 b2 − 2b1 0 0 0 3 b3 − 4b2 + 5b1 . For every b ∈ R3, the right-hand column is non-leading which means that this system has a solution. This implies that every vector b ∈ R3 belongs to span (S). Hence S is a spanning set for R3. ♦ Note. The equations would still have a solution for all b ∈ R3 if the non-leading column (column 3) were dropped from the row-echelon form matrix. This means that the vector v3 can be dropped from S and the set {v1, v2, v4} will still span R3. Thus, in this case, span (v1,v2,v3,v4) = span (v1,v2,v4) = R 3. We shall see that in Section 6.5 Example 6 that in place of v3 we could drop either v1 or v2 from S and obtain the same span. That is span (v1,v2,v3,v4) = span (v2,v3,v4) = span (v1,v3,v4) = R 3. c©2020 School of Mathematics and Statistics, UNSW Sydney 24 CHAPTER 6. VECTOR SPACES However, the removal of the vector corresponding to a non-leading column in a row-echelon form matrix gives us a simple criterion to get a subset of S which spans the same subspace span (S). In general, we have the following result. Suppose that S = {v1,v2, . . . ,vn} is a subset of Rm and A, which is the matrix with the n vectors in S as columns, reduces to a row-echelon from matrix U . If the ith column of U is non-leading, then {v1, . . . ,vi−1,vi+1, . . . ,vn} spans the same set as S. The following example shows that matrix methods can also be used to solve problems about spans in some vector spaces other than Rn. Example 15. Find conditions on the coefficients of p ∈ P3(R) so that p ∈ span ( 1 + x, 1− x2). Solution. Let p(x) = b0 + b1x + b2x 2 + b3x 3 be a polynomial in P3(R). From the definition of span, we know that p ∈ span (1 + x, 1− x2) if and only if there exist λ1, λ2 ∈ R such that, for all x ∈ R, p(x) = λ1(1 + x) + λ2(1− x2) = (λ1 + λ2) + λ1x− λ2x2. By comparing coefficients, we must have λ1 + λ2 = b0 λ1 = b1 −λ2 = b2 0 = b3. This is a system of linear equations in the variables λ1 and λ2 and we have to find out which choices of b0, b1, b2, b3 make it into a system which does have a solution. The augmented matrix for the system is 1 1 b0 1 0 b1 0 −1 b2 0 0 b3 . This augmented matrix can be reduced to row-echelon form 1 1 0 1 0 0 0 0 ∣∣∣∣∣∣∣∣ b0 b1 − b0 b2 − b1 + b0 b3 . This system has a solution if and only if b2 − b1 + b0 = 0 and b3 = 0, so these are the conditions under which p belongs to span ( 1 + x, 1− x2). ♦ 6.5 Linear independence Suppose that v1, v2 are non-zero vectors. In Chapter 1 we saw that span (v1, v2) represents a plane if v1 and v2 are not parallel to each other, but only a line if they are parallel. Similarly, if v1, v2 and v3 are given non-zero vectors in R 3 then span (v1,v2,v3) represents i) a line if the three vectors are all parallel to each other, c©2020 School of Mathematics and Statistics, UNSW Sydney 6.5. LINEAR INDEPENDENCE 25 ii) a plane if they are coplanar, or iii) the whole of R3 otherwise. In this section we shall show how these results can be understood through the ideas of linear independence and linear dependence of a set of vectors. Definition 1. Suppose that S = {v1, . . . , vn} is a subset of a vector space. The set S is a linearly independent set if the only values of the scalars λ1, λ2, . . . , λn for which λ1v1 + · · ·+ λnvn = 0 are λ1 = λ2 = · · · = λn = 0. Definition 2. Suppose that S = {v1, . . . , vn} is a subset of a vector space. The set S = {v1, . . . ,vn} is a linearly dependent set if it is not a linearly independent set, that is, if there exist scalars λ1, . . . , λn, not all zero, such that λ1v1 + · · ·+ λnvn = 0. Note. The linear combination λ1v1 + · · · + λnvn is equal to 0 when all the scalars are zero. The essential point of the definition of linear independence is that the only way this linear combination is 0 is that all the scalars are zero. Example 1. Show that the vectors 1 2 3 4 and −3 −6 −9 5 form a linearly independent set. Solution. Applying the definition, we look for scalars λ1, λ2 such that λ1 1 2 3 4 + λ2 −3 −6 −9 5 = 0 0 0 0 . In order to satisfy this vector equation the scalars must satisfy the four equations λ1 − 3λ2 = 0, 2λ1 − 6λ2 = 0, 3λ1 − 9λ2 = 0, 4λ1 + 5λ2 = 0. Each of the first three equations is satisfied if and only if λ1 = 3λ2. By substituting this formula for λ1 in the fourth equation we get 17λ2 = 0. Thus the only solution is λ1 = λ2 = 0 and this shows that the two given vectors form a linearly independent set. ♦ The vectors in the above example are not parallel because neither is a scalar multiple of the other. This is a special case of the following geometric interpretation of linear dependence for pairs of vectors. c©2020 School of Mathematics and Statistics, UNSW Sydney 26 CHAPTER 6. VECTOR SPACES Example 2. Show that two non-zero vectors in Rn are parallel if and only if they form a linearly dependent set. Solution. We first show that two parallel vectors form a linearly dependent set. Two non-zero vectors {v1, v2} are parallel if one is a (non-zero) scalar multiple of the other, and that is v2 = λv1 for some λ ∈ R. We can rewrite this equation as λv1 − v2 = 0. The coefficient of v2 in this expression is −1 (which is non-zero), so this equation proves that {v1, v2} is a linearly dependent set. We next show that two linearly dependent non-zero vectors are parallel. If {v1, v2} is a linearly dependent set then there exist λ1 and λ2, not both zero, such that λ1v1 + λ2v2 = 0. Without loss of generality, we can assume that λ1 6= 0. Dividing by λ1 and rearranging gives v1 = −λ2 λ1 v2. This shows that v1 is a scalar multiple of v2. It also implies that λ2 6= 0 (otherwise v1 would be 0). Hence the two vectors are parallel. ♦ Example 3. It is easy to verify that 3 12 1 + 2 1−1 2 + (−1) 54 7 = 00 0 . By Definition 2, the set 12 1 , 1−1 2 , 54 7 is a linearly dependent set. ♦ Note that no two of the three vectors in the previous example are parallel. The following example gives a geometric interpretation (in terms of coplanarity) for linear dependence of sets of three vectors. Example 4. Show that three non-zero vectors in Rn are coplanar if and only if they form a linearly dependent set. Solution. Suppose that v1, v2, v3 are three non-zero vectors in R n. We first show that three coplanar vectors form a linearly dependent set. Suppose that v1, v2, v3 are coplanar. We first consider the case that two of the three vectors are parallel. Without loss of generality, we can assume that v2 and v3 are parallel. By Example 2, there exist λ2, λ3, not both of them are zero such that λ2v2 + λ3v3 = 0. Hence, we have 0v1 + λ2v2 + λ3v3 = 0. Since, not both λ2 and λ3 are zero, the vectors v1, v2, v3 form a linearly dependent set. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.5. LINEAR INDEPENDENCE 27 Otherwise, we may assume then that v2 is not parallel to v3. Hence, v1 lies on the plane through the origin parallel to v2 and v3. This means that there are scalars λ2, λ3 such that v1 = λ2v2 + λ3v3. We can rearrange this to get −v1 + λ2v2 + λ3v3 = 0. At least one coefficient is non-zero (the coefficient of v1 is −1), so we have shown that the set is linearly dependent. We now show that if three vectors form a linearly dependent set then they must be coplanar. If the set {v1,v2,v3} is linearly dependent then there exist λ1, λ2, λ3, not all zero, such that λ1v1 + λ2v2 + λ3v3 = 0. Without loss of generality we can assume that λ1 6= 0. Then the above equation can be rearranged as v1 = −λ2 λ1 v2 − λ2 λ1 v3. This shows that v1 satisfies the parametric vector form x = λv2 + µv3, which represents either a line or a plane through the origin. In both cases, the three vectors are coplanar. ♦ 6.5.1 Solving problems about linear independence We have seen that questions about spans in Rm can be answered by relating them to questions about the existence of solutions for systems of linear equations. The same is true for questions about linear dependence in Rm. Proposition 1. If S = {a1, . . . ,an} is a set of vectors in Rm and A is the m × n matrix whose columns are the vectors a1, . . . ,an then the set S is linearly dependent if and only if the system Ax = 0 has at least one non-zero solution x ∈ Rn. Proof. As on page 20, Ax = x1a1 + · · · + xnan for any vector x = x1... xn ∈ Rn. Therefore Ax = 0 has at least one non-zero solution x = λ1... λn if and only if there are scalars λ1, . . . , λn, not all zero, such that λ1a1 + · · ·+ λnan = 0. In other words, if and only if the set {a1, . . . , an} is linearly dependent. c©2020 School of Mathematics and Statistics, UNSW Sydney 28 CHAPTER 6. VECTOR SPACES Example 5. Is the set S = 1 3 2 4 , −2 −1 0 2 , 0 0 1 2 a linearly independent set? Solution. Let A be the matrix whose columns are the vectors in S. By Proposition 1, the set S is linearly dependent if and only if the system Ax = 0 has at least one non-zero solution. We then reduce the augmented matrix (A|0) to row-echelon form (U |0). 1 −2 0 0 3 −1 0 0 2 0 1 0 4 2 2 0 R2 = R2 − 3R1−−−−−−−−−−−−−−−→ R3 = R3 − 2R1 R4 = R4 − 4R1 1 −2 0 0 0 5 0 0 0 4 1 0 0 10 2 0 R2 = 1 5R2−−−−−−−−−−−−−−→ R4 = 1 2R4 1 −2 0 0 0 1 0 0 0 4 1 0 0 5 1 0 R3 = R3 − 4R2−−−−−−−−−−−−−−−→ R4 = R4 − 5R2 1 −2 0 0 0 1 0 0 0 0 1 0 0 0 1 0 R4 = R4 −R3−−−−−−−−−−−−−−→ 1 −2 0 0 0 1 0 0 0 0 1 0 0 0 0 0 There are no non-leading columns in U , so the system has a unique solution, namely x = 0. Since there are no non-zero solutions, the set S is linearly independent. ♦ Example 6. Suppose that v1 = 12 3 , v2 = 11 −1 , v3 = −10 5 and v4 = 23 5 . a) Prove that the set S = {v1,v2,v3,v4} is a linearly dependent set. b) Find all possible ways of writing 0 as a linear combination of the vectors in S. c) Find a linearly independent subset of S with the same span as S. Solution. a) Let A be the matrix with v1, v2, v3, v4 as columns. As seen in the previous example, elementary row operations do not affect the zero right-hand column. To see whether or not there is any non-zero solution for the equation Ax = 0, we can simply reduce the matrix A c©2020 School of Mathematics and Statistics, UNSW Sydney 6.5. LINEAR INDEPENDENCE 29 and an equivalent row-echelon form U . 1 1 −1 22 1 0 3 3 −1 5 5 R2 = R2 − 2R1−−−−−−−−−−−−−−−→ R3 = R3 − 3R1 1 1 −1 20 −1 2 −1 0 −4 8 1 R3 = R3 − 4R2−−−−−−−−−−−−−−−→ 1 1 −1 20 −1 2 −1 0 0 0 3 The row-echelon form matrix U has a non-leading column, the homogeneous system Ax = 0 has infinitely many solutions. There must be solutions other than the zero solution. Therefore the given set S is linearly dependent. b) The complete solution of Ax = 0 can be found by back substitution, and it is x4 = 0, x3 = λ, x2 = 2λ, and x1 = −λ. By substituting for x1, x2, x3, x4 in the original linear combination we get λ(−v1 + 2v2 + v3) + 0v4 = 0, which gives all possible ways of writing 0 as a linear combination of the vectors in S. c) Choosing λ = 1, we have −v1 + 2v2 + v3 = 0, i.e. v3 = −v1 + 2v2. This shows that span (v1, v2, v4) = span (S). On the other hand, if we remove the third column from A, we can reduce the matrix 1 1 22 1 3 3 −1 5 to 1 1 20 −1 −1 0 0 3 . Hence the set {v1, v2, v4} is a linearly independent subset of S with the same span as S. Note also that v1 = 2v2 + v3 and v2 = 1 2v1 − 12v3, so both sets {v2,v3,v4} and {v1,v3,v4} have the same span as S. It is not difficult to check that these two sets are also linearly independent. ♦ The next two examples illustrate the fact that matrix methods can be used in vector spaces other than Rm. Example 7. Show that the set {1 + x, 2 − x} is linearly independent in the vector space P(R) of all polynomials. Solution. Suppose that λ1(x + 1) + λ2(2 − x) = 0 for all x ∈ R. Expanding the left-hand-side gives (λ1 + 2λ2) + (λ1 − λ2)x = 0. Comparing coefficients shows that we must have λ1 + 2λ2 = 0 λ1 − λ2 = 0. c©2020 School of Mathematics and Statistics, UNSW Sydney 30 CHAPTER 6. VECTOR SPACES The augmented matrix for this system is ( 1 2 0 1 −1 0 ) , which can be reduced to the row-echelon form( 1 2 0 0 −3 0 ) . There are no non-leading variables here, so the only solution is λ1 = λ2 = 0. Therefore this set of polynomials is linearly independent. ♦ Example 8. Is the set {1 + x, 2 − x,−1 + 2x} a linearly independent subset of the vector space P(R)? Solution. Suppose that λ1(x + 1) + λ2(2 − x) + λ3(−1 + 2x) = 0 for all x ∈ R. Expanding the left-hand-side gives (λ1 + 2λ2 − λ3) + (λ1 − λ2 + 2λ3)x = 0. Comparing coefficients shows that we must have λ1 + 2λ2 − λ3 = 0 λ1 − λ2 + 2λ3 = 0. The augmented matrix for this system is ( 1 2 −1 0 1 −1 2 0 ) , which can be reduced to the row-echelon form (U |0) = ( 1 2 −1 0 0 −3 3 0 ) . The third column in U is non-leading so there must be some solutions with λ1, λ2 and λ3 not all zero. Therefore the given set of polynomials is linearly dependent. ♦ Note. Although it is not required here, it is a good idea to find a specific nonzero solution and check that the appropriate linear combination is zero. In this example we could use back substitution to get a solution λ1 = −1, λ2 = 1, λ3 = 1. Checking shows that indeed −1(1 + x) + 1(2 − x) + 1(−1 + 2x) = 0 for all x ∈ R. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.5. LINEAR INDEPENDENCE 31 6.5.2 Uniqueness and linear independence The following theorem gives one reason why the idea of linear independence is important. Theorem 2 (Uniqueness of Linear Combinations). Let S be a finite, non-empty set of vectors in a vector space and let v be a vector which can be written as a linear combination of S. Then the values of the scalars in the linear combination for v are unique if and only if S is a linearly independent set. Proof. It turns out to be easier to prove the theorem by proving the equivalent result that: The values of the scalars in the linear combination are non-unique if and only if S is a linearly dependent set. Let S = {v1, . . . ,vn}. Suppose first that v has two expressions as a linear combination of S, namely v = λ1v1 + · · · + λnvn and v = µ1v1 + · · ·+ µnvn. Subtracting the second equation from the first equation gives (λ1 − µ1)v1 + · · · + (λn − µn)vn = 0. (#) If the two sets of scalars are not identical then there must be at least one value of j (with 1 6 j 6 n) such that λj − µj 6= 0. This means that there is at least one non-zero scalar coefficient in (#) and therefore the set S is a linearly dependent set. Conversely, if S is a linearly dependent set then there are scalars α1, . . . , αn, not all zero, such that α1v1 + · · ·+ αnvn = 0. If v = λ1v1 + · · · + λnvn is any expression for v as a linear combination of S then we can get a second expression by saying v = v + 0 = (λ1v1 + · · ·+ λnvn) + (α1v1 + · · · + αvn) = (λ1 + α1)v1 + · · ·+ (λn + αn)vn. The coefficients in this expression are not all the same as the coefficients in the first expression because α1, . . . , αn are not all zero. Therefore the coefficients in an expression for v as a linear combination of v1, . . . ,vn are not unique. The following is a numerical example of the result of Theorem 2. Example 9. As shown in Example 6, the set of vectors S = {v1,v2,v3,v4} is a linearly dependent set, where v1 = 12 3 , v2 = 11 −1 , v3 = −10 5 , and v4 = 23 5 . Show that b = 77 −4 belongs to span (S) and then check that the linear combination for b in terms of S is not unique. c©2020 School of Mathematics and Statistics, UNSW Sydney 32 CHAPTER 6. VECTOR SPACES Solution. We know that b ∈ span (S) if and only if the system Ax = b, where A is the matrix with the vectors in S as columns. We then reduce (A|b) to a row-echelon form (U |y). 1 1 −1 2 72 1 0 3 7 3 −1 5 5 −4 R2 = R2 − 2R1−−−−−−−−−−−−−−−→ R3 = R3 − 3R1 1 1 −1 2 70 −1 2 −1 −7 0 −4 8 1 −25 R3 = R3 − 4R2−−−−−−−−−−−−−−−→ 1 1 −1 2 70 −1 2 −1 −7 0 0 0 3 3 The right-hand-side column is not a leading column, so the system does have a solution and b must be in span (S). The third column in the row-echelon form is non-leading, so the system has infinitely many solutions and hence there are infinitely many expressions for b as a linear combination of S. ♦ Note. If we want to find these expressions explicitly, we can apply back substitution to the row- echelon form and find the general solution x4 = 1, x3 = λ, x2 = 6 + 2λ and x1 = −1− λ, where λ is an arbitrary real parameter. Therefore 77 −4 = (−1− λ) 12 3 + (6 + 2λ) 11 −1 + λ −10 5 + 23 5 . We found in Example 6 that if v3 (which is a non-leading variable in this system) is dropped then the resulting set {v1,v2,v4} is linearly independent. In this case we find that 77 −4 is a unique linear combination of v1, v2 and v4. In fact, by putting x3 = λ = 0 in the above expression we get 77 −4 = −1 12 3 + 6 11 −1 + 1 23 5 . 6.5.3 Spans and linear independence Suppose that v1 and v2 are non-zero vectors in R n. We have seen in Chapter 1 that span (v1,v2) represents a plane if the vectors are not parallel (in other words, if they are linearly independent), whereas the span represents a line if they are parallel (in other words, linearly dependent). An equivalent way of expressing this result is to say that if {v1,v2} is a linearly dependent set then span (v1,v2) = span (v1) = span (v2) . For three non-zero vectors in Rn, we have also seen that span (v1,v2,v3) reduces to either a plane or a line when the three vectors form a linearly dependent set. In either of these cases it is possible to drop at least one vector from {v1,v2,v3} without changing the span of the set. In this section we show that similar results are true for the span of any number of vectors in any vector space. To do this, we need two important results. The first, which is a generalisation of the results in Examples 2 and 4, is as follows. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.5. LINEAR INDEPENDENCE 33 Theorem 3. A set of vectors S is a linearly independent set if and only if no vector in S can be written as a linear combination of the other vectors in S, that is, if and only if no vector in S is in the span of the other vectors in S. Note. The theorem is equivalent to: A set of vectors S is a linearly dependent set if and only if at least one vector in S is in the span of the other vectors in S. Proof. It is easier to prove the alternative statement because in that case the method of proof can closely follow the solution given for Example 4. If some vector vi in S = {v1, . . . , vn} is in the span of the other vectors in S then there are scalars µ1, . . . , µi−1, µi+1, . . . , µn such that vi = µ1v1 + · · ·+ µi−1vi−1 + µi+1vi+1 + · · ·+ µnvn. An obvious rearrangement then gives µ1v1 + · · · − vi + · · ·+ µnvn = 0. At least one coefficient in this expression (the coefficient −1 for vi) is non-zero, so the set is linearly dependent. Conversely, if S is a linearly dependent set then there are scalars λ1, . . . , λn, not all zero, such that λ1v1 + · · ·+ λnvn = 0. Let i be such that λi 6= 0. Then we can solve for vi in the preceding equation to obtain vi = − 1 λi (λ1v1 + · · · + λi−1vi−1 + λi+1vi+1 + · · ·+ λnvn), This shows that vi is in the span of the other vectors {v1, . . . ,vi−1,vi+1, . . . ,vn}, so the proof is complete. Example 10. The set {v1,v2,v3,v4} in Example 6 is a linearly dependent set, and we found that −v1 + 2v2 + v3 = 0. We saw that v3 = v1 − 2v2. Hence, v3 ∈ span (v1,v2), which implies v3 ∈ span (v1,v2,v4). Furthermore, we have v2 ∈ span (v1,v3,v4) and v1 ∈ span (v2,v3,v4). However, v4 is not in the span of the other three. A geometric interpretation of this result is that v1, v2, v3 lie in the same plane, whereas v4 does not lie in this plane. ♦ The second key result that we need is as follows. If a vector is added to a set then the span of the new set is equal to the span of the original set if and only if the additional vector is in the span of the original set. Formally, we have the following theorem. Theorem 4. If S is a finite subset of a vector space V and the vector v is in V , then span (S ∪ {v}) = span (S) if and only if v ∈ span (S). c©2020 School of Mathematics and Statistics, UNSW Sydney 34 CHAPTER 6. VECTOR SPACES [X] Proof. Let S = {v1, . . . ,vn} so that S ∪ {v} = {v1, . . . ,vn,v}. Obviously v ∈ span (S ∪ {v}), so if span (S ∪ {v}) = span (S) then v ∈ span (S). To prove the converse, we assume that v ∈ span (S) and prove span (S) = span (S ∪ {v}) by proving firstly that if a vector u ∈ span (S) then u ∈ span (S ∪ {v}), and secondly that if a vector u ∈ span (S ∪ {v}) then u ∈ span (S). For the first proof, suppose that u ∈ span (S). Then u = λ1v1 + · · ·+ λnvn, for some λ1, . . . , λn ∈ F. Then u = λ1v1 + · · ·+ λnvn + 0v. Hence u ∈ span (S ∪ {v}). For the second proof, suppose that u ∈ span (S ∪ {v}). Then u = λv + λ1v1 + · · ·+ λnvn for some λ, λ1, . . . , λn ∈ F. But v ∈ span (S), and hence v = µ1v1 + · · ·+ µnvn for some µ1, . . . , µn ∈ F. On substituting this linear combination for v into the previous linear combination for u, we find u is a linear combination of {v1, . . . ,vn}. Hence, u ∈ span (S) and the proof is complete. Example 11. For the linearly dependent set of four vectors {v1,v2,v3,v4} of Example 6, from Example 10, we have span (v1,v2,v3,v4) = span (v1,v2,v4) = span (v1,v3,v4) = span (v2,v3,v4) which agrees with Theorem 3. ♦ By combining Theorems 3 and 4 we get the following. If S is a linearly dependent set of vectors then it is possible to drop at least one vector from S to obtain a new set with the same span as S, whereas if S is a linearly independent set then dropping any vector from S results in a new set with a smaller span than span (()S). In formal terms, we have the following theorem. Theorem 5. Suppose that S is a finite subset of a vector space. The span of every proper subset of S is a proper subspace of span (S) if and only if S is a linearly independent set. Example 12. For the linearly dependent set S = {v1,v2,v3,v4} considered in Example 6, we have seen that span (v1,v2,v4) = span (v1,v2,v3,v4) = R 3, and the set {v1,v2,v4} is a linearly independent set. We have also seen in Example 14 of Section 6.4 that S spans R3. If we now drop any vector from the linearly independent set {v1,v2,v4}, we obtain a set whose span is only a plane in R3 and not all of R3. This illustrates Theorem 5. ♦ For use in the next section, we need one more result involving spanning and linear independence. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.6. BASIS AND DIMENSION 35 Theorem 6. If S is a finite linearly independent subset of a vector space V and v is in V but not in span (S) then S ∪ {v} is a linearly independent set. Proof. Let S = {v1, . . . ,vn} and S′ = S ∪ {v}. If S′ is not linearly independent then there exist scalars λ, λ1, . . . , λn, not all zero, such that λv+ λ1v1 + · · ·+ λnvn = 0. (*) If λ = 0 then we must have λi 6= 0 for some 1 6 i 6 n and this would contradict the assumption that S is linearly independent. Therefore we must have λ 6= 0 and this means that the equation (∗) can be rearranged to express v as a linear combination of the set S. This contradicts the fact v is not in span (S), so the supposition that S′ was linearly dependent must be false and hence S′ is linearly independent. 6.6 Basis and dimension In Chapter 1 we called the set of vectors {e1, . . . , en} the standard “basis” for Rn. We have also talked of a plane as being “two-dimensional” and ordinary space as being “three-dimensional”. The main aims of this section are to give precise definitions of the ideas of basis and dimension and to show that these ideas apply to every vector space. Recall that S is a spanning set for a vector space V when span (S) = V , that is, when every vector in V can be written as a linear combination of the vectors in S. We have also shown that if S is a linearly independent set then all linear combinations formed from S are unique. Hence, if S is a linearly independent spanning set for V then every vector in V can be written as a unique linear combination of S. We have also shown (Theorem 5 of Section 6.5) that a proper subset of a linearly independent set S spans a proper subspace of span (S). This means, in particular, that if S is a linearly independent spanning set for a vector space V then dropping any vector from S gives a set which is not a spanning set for V . Because of these two properties, linearly independent spanning sets are of fundamental impor- tance in both the theoretical development and the practical applications of vector spaces. 6.6.1 Bases Definition 1. A set of vectors B in a vector space V is called a basis for V if: 1. B is a linearly independent set, and 2. B is a spanning set for V (that is, span (B) = V ). Note. We exclude from our discussion the vector space consisting of only the zero vector. Example 1. The set {e1, e2, . . . , en}, where e1 = 1 0 ... 0 , e2 = 0 1 ... 0 , . . . , en = 0 0 ... 1 , is a linearly c©2020 School of Mathematics and Statistics, UNSW Sydney 36 CHAPTER 6. VECTOR SPACES independent spanning set for Rn, so this set is a basis for Rn. It is called the standard basis for Rn. Each vector a = a1... an can be written as a unique linear combination a1e1 + · · ·+ anen. ♦ Example 2. Show that the set S = 21 0 , −10 1 , 01 −1 is a basis for R3. Solution. To show that S is a spanning set for R3, we need to show that every vector b ∈ R3 belongs to span (S). This will be true if and only if the system Ax = b, where A = 2 −1 01 0 1 0 1 −1 , has a solution for every right hand side b ∈ R3. The augmented matrix (A|b) can be reduced to the row-echelon form 2 −1 0 b11 0 1 b2 0 1 −1 b3 R1 ↔ R2−−−−−−−−−−−−−−→ 1 0 1 b22 −1 0 b1 0 1 −1 b3 R2 = R2 − 2R1−−−−−−−−−−−−−−−→ 1 0 1 b20 −1 −2 b1 − 2b2 0 1 −1 b3 R3 = R3 +R2−−−−−−−−−−−−−−→ 1 0 1 b20 −1 −2 b1 − 2b2 0 0 −3 b1 − 2b2 + b3 . For all b ∈ R3, the row-echelon matrix has a non-leading right hand column and hence the equation Ax = b has a solution. Therefore span (S) = R3. Moreover, left side of the row-echelon matrix has no non-leading columns, so the only solution for a zero right hand side is x1 = x2 = x3 = 0. This shows that S is a linearly independent set. We have now proved S is linearly independent spanning set for R3 and is therefore a basis for R3. ♦ Example 3. Let v1 = 1−1 2 , v2 = 21 3 , v3 = 24 2 , v4 = 15 0 and S = {v1,v2,v3,v4}. Find a subset of S which is a basis for span (S). Solution. From the result on page 24, we only need to reduce the matrix with the vectors in S as columns to row-echelon from, then the vectors in S corresponding to the leading columns will span the same set as S. 1 2 2 1−1 1 4 5 2 3 2 0 R2 = R2 +R1−−−−−−−−−−−−−−→ R3 = R3 − 2R1 1 2 2 10 3 6 6 0 −1 −2 −2 R3 = R3 + 1 3R2−−−−−−−−−−−−−−−→ 1 2 2 10 3 6 6 0 0 0 0 c©2020 School of Mathematics and Statistics, UNSW Sydney 6.6. BASIS AND DIMENSION 37 The vectors corresponding to the leading columns are v1 and v2. Hence {v1, v2} is a spanning set for span (S). Furthermore, if we remove the third and fourth columns in the matrices in the above reduction, we can see that {v1, v2} is a linearly independent set. Therefore, this set is a basis for span (S). ♦ Suppose that B is a basis for a finite-dimensional vector space V . Since B is a spanning set, every vector in V can be written as a linear combination of B. Since B is also independent, the linear combination is unique. In formal terms: Let B = {v1, . . . , vn} be a basis for a vector space V over F. Every vector v ∈ V can be uniquely written as v = λ1v1 + · · ·+ λnvn, where λ1, . . . , λn ∈ F. Example 4. Write the vector b = −10 5 as the unique linear combination of the ordered basis v1 = 12 3 ,v2 = 11 −1 of span (v1,v2). Solution. We first write b as a linear combination of v1 and v2. From Proposition 3 of Section 6.4 we know that the required scalars are the components of a solution to the system Ax = b, where A is the matrix whose columns are the given vectors v1, v2. The augmented matrix (A|b) can be reduced to row-echelon form. 1 1 −12 1 0 3 −1 5 R2 = R2 − 2R1−−−−−−−−−−−−−→ R3 = R3 − 3R1 1 1 −10 −1 2 0 −4 8 R3 = R3 − 4R4−−−−−−−−−−−−−→ 1 1 −10 −1 2 0 0 0 This system has the unique solution x = ( 1 −2 ) , so b = 1 12 3 − 2 11 −1 . ♦ Example 5. Show that the set {i, j,k} is a basis for R3 Solution. We have seen in Section 6.4 Example 5 that {i, j,k} is a spanning set. To prove that the set is linearly independent, we use the fact that {i, j,k} is an orthonormal set of vectors. That is, i · i = j · j = k · k = 1 and i · j = j · k = i · k = 0. If we assume that 0 = a1i+ a2j+ a3k (#) then by taking the dot product of (#) with i we get 0 = i · 0 = i · (a1i+ a2j+ a3k) = a1(i · i) + a2(i · j) + a3(i · k) = a1, c©2020 School of Mathematics and Statistics, UNSW Sydney 38 CHAPTER 6. VECTOR SPACES and hence a1 = 0. Similarly, by taking the dot products of (#) with j and k in turn, we find that a2 = 0 and a3 = 0. Therefore {i, j,k} is a linearly independent set. We have now proved that {i, j,k} is a linearly independent spanning set for, and hence a basis for, R3. ♦ The set {i, j,k} is an example of a orthonormal basis. An orthonormal basis is a basis whose elements are all of length 1 and are mutually orthogonal. The advantage of using an orthonormal basis is that we can write easily any vector as the unique linear combination of the basis by dot product. Example 6. The set of vectors B = {u1,u2,u3}, where u1 = 1√ 2 0 − 1√ 2 , u2 = 1√ 2 0 1√ 2 , u3 = 0−1 0 , is an orthonormal basis for R3. Write a = a1a2 a3 as the unique linear combination of this basis. Solution. Suppose that a = x1u1 + x2u2 + x3u3. We could find x1, x2, x3 in the same way as Example 4, but there is a simpler method which uses the orthonormality properties of the basis. Given that B is orthonormal. That is u1 · u1 = u2 · u2 = u3 · u3 = 1 and u1 · u2 = u1 · u3 = u2 · u3 = 0. We have x1 = u1 · (x1u1 + x2u2 + x3u3) = u1 · a = 1√ 2 a1 − 1√ 2 a3, and similarly x2 = u2 · a = 1√ 2 a1 + 1√ 2 a3 and x3 = u3 · a = −a2. Therefore, the unique linear combination is given by a = 1√ 2 (a1 − a3)u1 + 1√ 2 (a1 + a3)u2 − a2u3. ♦ Example 7. Show that the set {1, x, x2, . . . , xn} is a basis for Pn(R). (This is called the standard basis for Pn(R).) Solution. As we have seen, {1, x, x2, . . . , xn} is a spanning set for Pn(R). This set is also linearly independent because if λ11 + λ2x+ · · ·λn+1xn = 0 for all x ∈ R then λ1 = λ2 = · · · = λn+1 = 0. This follows from the theorem that we proved in Chapter 3 which states that two polynomials agree for all values of x ∈ R if and only if their corresponding coefficients are equal. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 6.6. BASIS AND DIMENSION 39 6.6.2 Dimension In this section we show that “dimension” can be defined for every vector space which is spanned by a finite set of vectors. To do this we need two results. Theorem 1. The number of vectors in any spanning set for a vector space V is always greater than or equal to the number of vectors in any linearly independent set in V . [X] Sketch Proof. Suppose I = {v1, · · · ,vm} is a linearly independent set and S = {w1, · · · ,wn} is a spanning set. Then we can write v1 = a11w1 + · · · + a1nwn ... vm = am1w1 + · · · + amnwn. (1) Suppose m > n and consider the matrix A = (aij). A has more rows than columns, so if we reduce A to row-echelon form by the Gaussian elimination algorithm then we must end up with at least one row of zeros. Apart from a row swap to get it into the right position, this row of zeros will have been obtained from some row of our original matrix A, say row Rk, by subtracting from it multiples of other rows or linear combinations of other rows of A. Therefore there are scalars α1, . . . , αk−1, αk+1, . . . , αm such that Rk − (α1R1 + · · ·+ αk−1Rk−1 + αk+1Rk+1 + · · ·+ αmRm) (2) is an all-zero row. If we use the equations in (1) to express the vector vk − (α1 v1 + · · ·+ αk−1 vk−1 + αk+1 vk+1 + · · ·+ αm vm) (3) as a linear combination of w1, . . . ,wn then the coefficient of each wi will be the same as the ith entry in the row which is defined by (2). But we know that this row is all zeros, so the vector defined by (3) must be the zero vector. We now have a linear combination of v1, . . . ,vm which equals zero and at least one coefficient in this combination (the coefficient of vk) is non-zero. This is not compatible with the fact that the set {v1, . . . ,vm} is linearly independent. By assuming that m is greater than n we have reached a contradiction, therefore m must be less than or equal to n. The second important theorem which guarantees the existence of a dimension for a vector space is as follows: Theorem 2. If a vector space V has a finite basis then every set of basis vectors for V contains the same number of vectors, that is, if B1 = {u1, . . . ,um} and B2 = {v1, . . . ,vn} are two bases for the same vector space V then m = n. Proof. Using the results of Theorem 1 and the fact that a basis is a linearly independent spanning set, we have m > n, since B1 spans V and B2 is linearly independent, and n > m, since B2 spans V and B1 is linearly independent. Therefore m = n and the proof is complete. c©2020 School of Mathematics and Statistics, UNSW Sydney 40 CHAPTER 6. VECTOR SPACES Since every basis for a vector space with a finite basis contains exactly the same number of vectors, the following definition makes sense for every vector space with a finite basis. Definition 2. If V is a vector space with a finite basis, then the dimension of V , denoted by dim(V ), is the number of vectors in any basis for V . Such a V is called a finite dimensional vector space. Example 8. a) Rn has a basis {e1, . . . , en} of n vectors, and hence dim(Rn) = n. b) The space of geometric vectors in ordinary physical space has a basis {i, j,k} of three vectors and therefore its dimension is 3. c) The subspace span (S) in Example 3 has a basis of two vectors. The dimension of span (S) is 2. d) We define the dimension of the vector space consisting only of the zero vector to be 0. e) The space Pn of polynomials of degree less than or equal to n has a basis {1, x, x2, . . . , xn}, so dim(Pn) = n+ 1. The following theorem summarises some useful results connecting spanning sets, linearly indepen- dent sets and dimension. Theorem 3. Suppose that V is a finite dimensional vector space. 1. the number of vectors in any spanning set for V is greater than or equal to the dimension of V ; 2. the number of vectors in any linearly independent set in V is less than or equal to the dimen- sion of V ; 3. if the number of vectors in a spanning set is equal to the dimension then the set is also a linearly independent set and hence a basis for V ; 4. if the number of vectors in a linearly independent set is equal to the dimension then the set is also a spanning set and hence a basis for V . Proof. Assume that V is a vector space of dimension n. The dimension of a vector space is equal to the number of vectors in a basis and a basis is a linearly independent spanning set. Therefore there is a linearly independent set in V which contains n vectors and there is also a spanning set for V which contains n vectors. (1) and (2) follow from Theorem 1. 3. Assume that a spanning set S contains n vectors. Then, as no spanning set for V can contain fewer than n vectors, there is no proper subset of S which is a spanning set for V . Hence no proper subset of S has the same span as S. Thus, by Theorem 5 of Section 6.5, S is a linearly independent set, and is a basis of V . 4. Assume that I = {v1, . . . ,vn} is a linearly independent set of n vectors in V and let v be any vector in V . If v does not belong to span (I) then, by Theorem 6 of Section 6.5, the set I ∪ {v} is linearly independent. This implies that V contains a linearly independent set with n+1 > dim(V ) vectors. This would contradict the result of (2), so we must have v ∈ span (I) for all v ∈ V . Therefore I is a spanning set for V and hence a basis for V. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.6. BASIS AND DIMENSION 41 Some of the uses of Theorem 3 are illustrated in the next example. Example 9. a) Obviously, the two vectors ( 1 −1 ) and ( 4 5 ) are non-parallel and so they are linearly independent. Hence, the set of two vectors {( 1 −1 ) , ( 4 5 )} is a basis for the two- dimensional space R2. b) The set of 3 linearly independent vectors 21 0 , −10 1 , 01 −1 in Example 2 is a basis for the three-dimensional space R3. c) A set of three vectors is not a spanning set for R4 as dim(R4) = 4 > 3. d) Any set of 10 vectors which spans R10 is a basis for R10 since dim(R10) = 10. e) Any linearly independent set of 325 vectors in R325 is a basis for R325 since dim(R325) = 325. f) A set of 1200 vectors in R1209 is not a spanning set as dim(R1209) = 1209 > 1200. ♦ Example 10. Show that the only subspaces in R3 are (1) the origin, (2) lines through the origin, (3) planes through the origin, and (4) R3 itself. Solution. By part 2 of Theorem 3, no subspace of R3 can have dimension greater than 3, otherwise a basis for the subspace would be a linearly independent set with more than three members. It follows that the only possible dimensions for subspaces (other than the subspace {0}) are 1, 2 and 3. A subspace of dimension 1 must be of the form span (v), where v is non-zero. We know that this represents a line through the origin. A subspace of dimension 2 must be of the form span (v1,v2), where the set {v1,v2} is linearly independent. We know that this represents a plane through the origin. If a subspace is of dimension 3, it must be the whole of R3 because a basis for it will be a set of 3 linearly independent vectors in R3 and hence a basis for R3 itself. ♦ 6.6.3 Existence and construction of bases In this section we examine the following two problems. Firstly, is it always possible to find a basis for a given vector space V ? Secondly, if we know that a basis does exist for a given vector space V , how can we find a basis for V ? For the existence of a basis, we have seen in Example 3 that a finite set of vectors in R3 contains a subset which is a basis for span (S). This is generally true for any vector space spanned by a finite set of vectors. Theorem 4. If S is a finite non-empty subset of a vector space then S contains a subset which is a basis for span (S). In particular, if V is any non-zero vector space which can be spanned by a finite set of vectors then V has a basis. c©2020 School of Mathematics and Statistics, UNSW Sydney 42 CHAPTER 6. VECTOR SPACES Proof. Let S be a finite non-empty set of vectors. If S is linearly independent then it is a basis for span (S) and there is nothing more to be done. If not, then (by Theorem 5 of Section 6.5) there must be a vector which can be dropped from S without changing the span. This gives a new set with fewer vectors which still spans span (S). If this new set is linearly independent, we have a basis. If not, we can again remove a vector and get a smaller set which still spans the same subspace. If we continue in this way we must eventually get a set which is linearly independent and spans span (S) (the process cannot continue indefinitely because the original set S had only a finite number, say n, of members and after n steps we would have no vectors left in the set). A non-zero vector space V contains the zero vector, so it cannot be spanned by an empty set. Suppose that S is a non-empty finite spanning set for V . By the above result, there exists a subset of S which is a basis for span (S) = V . This theorem shows that a spanning set can always be converted into a basis by removing some vectors from it. The next theorem proves a result about going in the opposite direction. Theorem 5. Suppose that V is a vector space which can be spanned by a finite set of vectors. If S is a linearly independent subset of V then there exists a basis for V which contains S as a subset. In other words, every linearly independent subset of V can be extended to a basis for V . Proof. Suppose S is a linearly independent set in V and that V can be spanned by a set of n vectors. If S spans V then there is nothing more to be done. If not, then there is a vector v ∈ V which is not in span (S). If we add v to S then we get a new set S ∪ {v} which is still linearly independent (by Theorem 6 of Section 6.5). If this new set spans V then we can stop. Otherwise, we can repeat the previous step and add another vector to get a larger linearly independent set. This process cannot continue beyond n steps, otherwise we would have a linearly independent set with more than n members (which is more than the number of members in a spanning set) and this would contradict Theorem 1. But the process only ends when we get a set which does span V . So eventually we must have a linearly independent spanning set or, in other words, a basis, Note carefully that the last two theorems only apply to vector spaces which can be spanned by a finite set of vectors. Any vector space which cannot be spanned by any finite set of vectors is said to be an infinite-dimensional vector space. The vector space P of all polynomials is an example of an infinite-dimensional vector space (see Example 22 of Section 6.8). In the proofs of the last two theorems we used step-by-step procedures to reduce a spanning set to a basis and to extend a linearly independent set to a basis. These procedures could be translated into algorithms for finding a basis but they would not be efficient because they involve re-testing (for linear independence or spanning) each time a vector is added or deleted. In practice, at least in Rm, we can do all the adding or all the deleting at the same time. We form a suitable matrix, reduce it to echelon form and examine the echelon form to see which vectors we should add or delete. The details of the procedures are given in the next two theorems. Theorem 6 (Reducing a spanning set to a basis in Rm). Suppose that S = {v1, . . . ,vn} is any subset of Rm and A is the matrix whose columns are the members of S. If U is a row-echelon form for A and S′ is created from S by deleting those vectors which correspond to non-leading columns in U then S′ is a basis for span (S). Proof. Let U ′ be the matrix created by deleting any non-leading columns from U and let A′ be created by deleting the same-numbered columns from A (so that the columns of A′ are the members c©2020 School of Mathematics and Statistics, UNSW Sydney 6.6. BASIS AND DIMENSION 43 of S′). The matrix U ′ has no non-leading columns, so the homogeneous system A′y = 0 has no solutions other than the zero solution. This implies (by Proposition 1 of Section 6.5) that the set S′ is linearly independent. In removing non-leading columns from U , we cannot create any new all-zero rows, so the system A′y = b has a solution whenever Ax = b has a solution. This implies (by Proposition 3 of Section 6.4) that S′ spans the same subspace as S. This completes the proof that S′ is a basis for span (S). Example 11. Find a basis and the dimension for the subspace of R4 spanned by the set S = 1 1 2 2 , 2 3 4 5 , −3 1 −6 −2 , 1 3 3 6 , −2 −1 −4 −3 . Solution. As an exercise, show that the matrix A with the members of S as its columns can be reduced by elementary row operations to echelon form U = 1 2 −3 1 −2 0 1 4 2 1 0 0 0 1 0 0 0 0 0 0 . The third and fifth columns of U are non-leading, so we remove the third and fifth members from S and get S′ = 1 1 2 2 , 2 3 4 5 , 1 3 3 6 as a basis for span (S), which is 3 dimensional. ♦ Note. Do not confuse the dimension of a subspace with the with the dimension of the vector space it lies in. In the above example, span (S) is a 3 dimension subspace of R4. This has nothing to do with R3. Example 12. Show that the vectors v1 = 01 2 , v2 = 2−1 −2 , v3 = 32 4 , v4 = 54 2 span R3 and find a basis for R3 which is a subset of S. Solution. Suppose that A is the matrix whose columns are the four given vectors. One way to show that S spans R3 is to reduce the augmented matrix (A|b) to the echelon form as in Section 6.4.1. The augmented matrix involves a column of indeterminate b. By Theorem 6, we can first find a basis for span (S) simply by reducing A to a row-echelon form. If the dimension of the span is 3 then the result follows. 0 2 3 51 −1 2 4 2 −2 4 2 R1 ↔ R2−−−−−−−−−−−−−−→ 1 −1 2 40 2 3 5 2 −2 4 2 R3 = R3 − 2R1−−−−−−−−−−−−−−−→ 1 −1 2 40 2 3 5 0 0 0 −6 c©2020 School of Mathematics and Statistics, UNSW Sydney 44 CHAPTER 6. VECTOR SPACES The matrix U has one non-leading column, the third, so we delete the third member from the given set and find that the subset B = {v1,v2,v4} is a basis for span (S). However B is a linearly independent set of 3 vectors in R3, so it is also a basis for R3. ♦ Theorem 7 (Extending a linearly independent set to a basis in Rm). Suppose that S = {v1, . . . ,vn} is a linearly independent subset of Rm and A is the matrix whose columns are the members of S followed by the members of the standard basis for Rm. If U is a row- echelon form for A and S′ is created by choosing those columns of A which correspond to leading columns in U then S′ is a basis for Rm containing S as a subset. Proof. Let S′′ be the set of n+m vectors from the columns of A. Since S′′ includes all the standard basis vectors for Rm, so Rm = span (S′′). By Theorem 6, the set S′ is a basis for Rm. To see that S′ contains S we need to prove that the first n columns of U (which correspond to the members of S in A) are leading columns. Let B be the submatrix formed from the first n columns of A and P be the submatrix formed from the first n columns of U . Since A reduces to U , we also have B reduces to P . By Proposition 1 of Section 6.5, the linear independence of S implies Bx = 0 has unique solution x = 0. Hence all columns of P , i.e. the first n columns of U , are leading. Hence the result follows. Example 13. Find a basis for R4 containing the members of the linearly independent set S = 1 2 4 −2 , 2 5 10 −5 . Solution. We form the matrix A whose columns are the members of S followed by the standard basis vectors for R4 and reduce it to row-echelon form A = 1 2 1 0 0 0 2 5 0 1 0 0 4 10 0 0 1 0 −2 −5 0 0 0 1 R2 = R2 − 2R1−−−−−−−−−−−−−−−→ R3 = R3 − 4R1 R4 = R4 + 2R1 1 2 1 0 0 0 0 1 −2 1 0 0 0 2 −4 0 1 0 0 −1 2 0 0 1 R3 = R3 − 2R2−−−−−−−−−−−−−→ R4 = R4 +R2 1 2 1 0 0 0 0 1 −2 1 0 0 0 0 0 −2 1 0 0 0 0 1 0 1 R4 = R4 + 12R3−−−−−−−−−−−−−→ 1 2 1 0 0 0 0 1 −2 1 0 0 0 0 0 −2 1 0 0 0 0 0 12 1 = U. The first, second, fourth and fifth columns are the leading columns in U , so we take the corre- sponding columns in A and get a basis S′ = 1 2 4 −2 , 2 5 10 −5 , 0 1 0 0 , 0 0 1 0 for R4. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 6.6. BASIS AND DIMENSION 45 Note. The procedure stated in the last theorem can also be used in situations where we have a set S which is neither linearly independent nor spanning and we want to find a basis which contains as many members of S as possible. We still form a matrix A from the members of S followed by the standard basis vectors, reduce to echelon form U and pick out from A the columns which correspond to leading columns in U . The only difference is that the first n columns are not necessarily all leading columns (because S is not necessarily linearly independent), so the new set S′ does not necessarily include all the members of S. Example 14. Find a basis for R4 containing as many as possible of the members of the set S = 1 2 4 −2 , 2 5 1 4 , 1 3 −3 6 . Solution. We form the matrix A whose columns are the members of S followed by the standard basis vectors for R4 and reduce it to row-echelon form. 1 2 1 1 0 0 0 2 5 3 0 1 0 0 4 1 −3 0 0 1 0 −2 4 6 0 0 0 1 R2 = R2 − 2R1−−−−−−−−−−−−−−−→ R3 = R3 − 4R1 R4 = R4 + 2R1 1 2 1 1 0 0 0 0 1 1 −2 1 0 0 0 −7 −7 −4 0 1 0 0 8 8 2 0 0 1 R3 = R3 + 7R2−−−−−−−−−−−−−→ R4 = R4 − 8R2 1 2 1 1 0 0 0 0 1 1 −2 1 0 0 0 0 0 −18 7 1 0 0 0 0 18 −8 0 1 R4 = R4 +R3−−−−−−−−−−−−→ 1 2 1 1 0 0 0 0 1 1 −2 1 0 0 0 0 0 −18 7 1 0 0 0 0 0 −1 1 1 The first, second, fourth and fifth columns are the leading columns in the row-echelon form matrix, so we take the corresponding columns in A and get a basis S′ = 1 2 4 −2 , 2 5 1 4 , 1 0 0 0 , 0 1 0 0 for R4. Note that this basis only contains two of the three vectors in the original set because S was not linearly independent (Question, what is dim(span(S))?) ♦ In Chapter 7 we shall need the following proposition which follows from Theorems 3 and 4. Proposition 8. If V is a finite-dimensional space andW is a subspace of V and dim(W ) = dim(V ) then W = V . Proof. By Theorem 4, there exists B a basis for W . So B is a linearly independent set in V . By Theorem 3 part 4, B is also a basis for V and V = span (B) =W . c©2020 School of Mathematics and Statistics, UNSW Sydney 46 CHAPTER 6. VECTOR SPACES 6.7 [X] Coordinate vectors In Chapter 1, we have seen that the position vector a of a point in an n-dimensional space could be represented by a (column) coordinate vector a1... an ∈ Rn, where the coordinates are the scalars in the linear combination which expresses a in terms of the standard basis vectors {e1, . . . , en}. As remarked in the last section, any basis B for a finite-dimensional vector space V over F has the property that — every vector v ∈ V can be written as a unique linear combination of B. If we specify a fixed ordering of the basis B = {v1, . . . ,vn} then we can associate with every vector v the unique n-vector x1... xn ∈ Fn, where v = x1v1 + · · ·+ xnvn. Note that the order of vectors in B is important because changing the order of the vectors also changes the order of the coefficients and therefore changes the n-vector corresponding to v. This generalise the notion of coordinates of a point in an n-dimensional real space to coordinates of a vector in any finite dimensional vector space. In this case, we can represent a vector in any vector space by a coordinate vector which is an n-vector in Fn. Consequently, we can use all the techniques which we have learnt in the previous sections to study vector spaces by matrices over F. Now, we introduce the definition of coordinate vectors. Definition 1. Let V be an n-dimensional vector space and let the ordered set of vectors B = {v1, . . . , vn} be a basis for V . If v = x1v1 + · · ·+ xnvn then the vector [v]B = x = x1... xn is called the coordinate vector of v with respect to the ordered basis B. Example 1. With respect to the ordered basis B = 0 1 3 −1 , 2 5 −3 1 , 4 −1 0 2 , −6 2 1 4 of R4, a vector v ∈ R4 has the coordinate vector [v]B = 1 −3 4 2 . Find v. Solution. v = 1 0 1 3 −1 − 3 2 5 −3 1 + 4 4 −1 0 2 + 2 −6 2 1 4 = −2 −14 14 12 . c©2020 School of Mathematics and Statistics, UNSW Sydney 6.7. [X] COORDINATE VECTORS 47 ♦ Example 2. Find the coordinate vector for the vector b = −10 5 with respect to the ordered basis v1 = 12 3 ,v2 = 11 −1 of span (v1,v2). Solution. From the result of Example 4 in Section 6.6, the coordinate vector for b with respect to the ordered basis {v1, v2} is ( 1 −2 ) . ♦ Example 3. The set of vectors B = {u1,u2,u3}, where u1 = 1√ 2 0 − 1√ 2 , u2 = 1√ 2 0 1√ 2 , u3 = 0−1 0 , is an orthonormal basis for R3. Find the coordinate vector of a = a1a2 a3 with respect to this basis. Solution. From the result of Example 6 in Section 6.6, the required coordinate vector is therefore [a]B = 1√ 2 (a1 − a3) 1√ 2 (a1 + a3) −a2 . ♦ One of the most important aspects of coordinate vectors is that by using them we can reduce problems in any finite-dimensional real vector space to problems in Rn, where of course we have powerful matrix techniques available. Example 4. Find the coordinate vector of p(x) = a0 + a1x+ · · ·+ anxn ∈ Pn(R) with respect to the ordered basis B = {1, x, x2, . . . , xn}. Solution. Of course there is no need to do any calculation in this case — we already know how to write p as a linear combination of elements of B. The coordinate vector of p with respect to B is [p]B = a0 a1 ... an ∈ Rn+1. ♦ Things are more difficult if we are given a nonstandard basis for Pn(R). Example 5. As a special case of Example 17 in 6.8.3, the set P2(R) of all real polynomials which have degree two or less is a vector space. You are given an ordered basis B = {1+x, x+x2, 1+x2} for P2(R). Find the coordinate vector for p(x) = 1− x2 with respect to the ordered basis B. c©2020 School of Mathematics and Statistics, UNSW Sydney 48 CHAPTER 6. VECTOR SPACES Solution. We need to find scalars α1, α2, α3 ∈ R such that p(x) = 1− x2 = α1(1 + x) + α2(x+ x2) + α3(1 + x2). Expanding the right-hand-side and comparing coefficients shows that we must have α1 + α3 = 1 α1 + α2 = 0 α2 + α3 = −1. This reduces the problem to that of solving a system of linear equations in the unknowns α1, α2 and α3. Solving these equations by Gaussian elimination gives the unique solution α1 = 1, α2 = −1, α3 = 0. Therefore [p]B = 1−1 0 . ♦ Because coordinate vectors can be obtained in any finite-dimensional vector space, we can turn problems involving vectors in the vector space into problems involving coordinate vectors in Fn. There are three important results which are fundamental to this approach. Theorem 1. If B is an ordered basis for a vector space V over a field F and u,v ∈ V and λ ∈ F, then (a) u = v if and only if [u]B = [v]B , that is, two vectors are equal if and only if the corresponding coordinate vectors are equal. (b) [u+v]B = [u]B + [v]B , that is, the coordinate vector of the sum of two vectors is equal to the sum of the two corresponding coordinate vectors. (c) [λv]B = λ[v]B , that is, the coordinate vector of a scalar multiple of a vector is equal to the same scalar multiple of the corresponding coordinate vector. Proof. Equality. If u and v have the same coordinates then they are equal to the same linear combination of B and must therefore be equal to each other. Conversely, if u = v then, because B is a basis and (by Theorem 2 of Section 6.5) no vector can have two different expressions as a linear combination of a given basis, any expressions for u and v as linear combinations of B must be the same. The coefficients in these linear combinations are the coordinates of u and v, so [u]B = [v]B . Addition. Let B = {b1, . . . , bn} and [u]B = λ1... λn and [v]B = µ1... µn . By the definition of coordinate vectors, we have u = λ1b1 + · · ·+ λnbn and v = µ1b1 + · · ·+ µnbn. By adding these two equations we get u+ v = (λ1 + µ1)b1 + · · ·+ (λn + µn)bn c©2020 School of Mathematics and Statistics, UNSW Sydney 6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 49 and this implies (by the definition of coordinate vectors) that [u+ v]B = λ1 + µ1... λn + µn = λ1... λn + µ1... µn = [u]B + [v]B . Multiplication by a scalar. The proof is similar to that for addition, and is omitted. 6.8 [X] Further important examples of vector spaces All of the material in this section is regarded as being more difficult than the material in previous sections of this chapter. In the preceding sections we have developed a general theory of vector spaces. However, for simplicity, we have concentrated on examples which illustrate the applications of the theory to the particular vector space Rn. In this section we shall examine three other important examples of vector spaces, namely the vector spaces of matrices, of real-valued functions, and of polynomials. We shall show how the vector space ideas of subspace, linear combination and span, linear independence, basis, dimension, and coordinate vector apply to these spaces. Before the discussion of these vector spaces, we introduce an alternative way of proving a subset to be a subspace. Theorem 1 (Alternative Subspace Theorem). A subset S of a vector space V over a field F is a subspace of V if and only if S contains the zero vector and it satisfies the closure condition: If v1,v2 ∈ S, then λ1v1 + λ2v2 ∈ S for all λ1, λ2 ∈ F. (#) Proof. We prove that the closure condition (#) is satisfied if and only if both closure under addition and closure under scalar multiplication are also satisfied. We first assume that (#) is satisfied. Then, for the special case of λ1 = λ2 = 1, condition (#) becomes v1 + v2 ∈ S for all v1,v2 ∈ S, and hence S is closed under addition. Further, for the special case of λ2 = 0, condition (#) becomes λ1v1 ∈ S for all v1 ∈ S and for all λ1 ∈ F, and hence S is closed under multiplication by a scalar. Thus, if (#) is satisfied then closure under addition and closure under scalar multiplication are also satisfied. We now assume that both closure conditions are satisfied. Then, from closure under multipli- cation by a scalar, we have, for all v1,v2 ∈ S and for all λ1, λ2 ∈ F , that λ1v1 ∈ S and λ2v2 ∈ S. Then, on adding these scalar multiples and using closure under addition, we have that λ1v1 + λ2v2 ∈ S, and hence (#) is satisfied. Thus, (#) is satisfied if and only if closure under addition and under scalar multiplication are both satisfied. We then use the Subspace Theorem of Section 6.3 to complete the proof. c©2020 School of Mathematics and Statistics, UNSW Sydney 50 CHAPTER 6. VECTOR SPACES 6.8.1 Vector spaces of matrices We have seen in Example 3 in Section 6.1 that Mmn(R) the set of all m×n real matrices is a vector space over R and Mmn(C) the set of m × n complex matrices is a vector space over C. In this section, we are going to see some examples of their subspaces, their bases and coordinate vectors with respect to these bases. Example 1. Show that the set S = {A ∈M22(R) : [A]11 = [A]22 = 1} is not a vector subspace of M22(R). Solution. Since the zero matrix is not in S, then vector space axiom 4 Existence of Zero is not satisfied. Therefore the set S is not a vector subspace. ♦ Example 2. Prove that the set of n× n real symmetric matrices is a subspace of Mnn(R). Solution. Recall that a square matrix A is called symmetric if A = AT . Let S be the set of n×n symmetric matrices. Obvious, the zero matrix is symmetric, and so belongs to S. Suppose that A, B ∈ S and λ, µ ∈ R. By a property of transpose in Chapter 5 and the fact that A, B are symmetric, we have (λA+ µB)T = λAT + µBT = λA+ µB. Therefore, λA + µB is symmetric and so it is in S. Hence by the Alternative Subspace Theorem, S is a subspace. ♦ Example 3. For 1 6 i 6 m, 1 6 j 6 n, let Eij be the m× n matrix with all entries 0 except that the ijth entry is 1. Show that the set B = {Eij : 1 6 i 6 m, 1 6 j 6 n} is a basis for Mmn(C). Solution. For any A = (aij) ∈Mmn(C), we have A = a11E11 + · · ·+ a1nE1n + a21E21 + · · ·+ amnEmn = m∑ i=1 n∑ j=1 aijEij. Thus, S is a spanning set for Mmn(C). Suppose that m∑ i=1 n∑ j=1 λijEij = 0, where 0 is the m× n zero matrix. Note that m∑ i=1 n∑ j=1 λijEij = λ11 · · · λ1n... . . . ... λm1 · · · λmn is the zero matrix. Hence λ11 = · · · = λmn = 0 and so B is an independent set. Therefore B is a basis for Mmn(C). ♦ Note. In general, the set B is a basis called the standard basis for Mmn =Mmn(F) for F = Q,R or C. As a result the dimension of Mmn is mn. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 51 Example 4. The coordinate vector of the matrix ( a b c d ) with respect to the standard basis of M22 is a b c d . We can also solve problems of independent sets, spanning sets and bases by Gaussian Elimina- tion. Be careful not to mix up the augmented matrix formed from the linear combinations and the elements in Mmn. Example 5. Show that set{( 1 1 1 0 ) , ( 1 1 0 2 ) , ( 1 0 1 2 ) , ( 0 1 1 2 )} is a basis for M22. Solution. Note that the dimension of M22 is 4 and the number of vectors in the given set is also 4. To prove this set is a basis, by Theorem 3 part 4 in Section 6.6.2 we only need to show that this set is independent. Suppose that λ1 ( 1 1 1 0 ) + λ2 ( 1 1 0 2 ) λ3 ( 1 0 1 2 ) + λ4 ( 0 1 1 2 ) = ( 0 0 0 0 ) . By equating the corresponding entries in both sides, we have λ1 + λ2 + λ3 + 0 = 0 λ1 + λ2 + 0 + λ4 = 0 λ1 + 0 + λ3 + λ4 = 0 0 + 2λ2 + 2λ3 + 2λ4 = 0 Since the right hand sides are all zeros, we can simply reduce the coefficient matrix to row-echelon form. 1 1 1 0 1 1 0 1 1 0 1 1 0 2 2 2 R2 = R2 −R1−−−−−−−−−−−−−−→R3 = R3 −R1 1 1 1 0 0 0 −1 1 0 −1 0 1 0 2 2 2 R2 ↔ R3−−−−−−−−−−→ 1 1 1 0 0 −1 0 1 0 0 −1 1 0 2 2 2 R4 = R4 + 2R2 + 2R3−−−−−−−−−−−−−−−−−−−−→ 1 1 1 0 0 −1 0 1 0 0 −1 1 0 0 0 6 Thus the system of equations has unique solution λ1 = λ2 = λ3 = λ4 = 0. Hence the set of matrices is a basis. ♦ Note. The four columns of the coefficient matrix are the coordinate vectors of the four matrices with respect to the standard basis. c©2020 School of Mathematics and Statistics, UNSW Sydney 52 CHAPTER 6. VECTOR SPACES 6.8.2 Vector spaces associated with real-valued functions Before reading this section, you might like to quickly read the brief review of function notation given in Appendix 6.9. We know how to add two functions and how to multiply a function by a real number, so it is natural to ask whether or not these operations satisfy the axioms of a vector space. In this section we shall see that they do. The Vector Space of Real-Valued Functions. Let X be a non-empty set. Consider the set (which we call R[X]) of all possible real-valued functions with domain X, that is, R[X] = {f : X → R}. We also let + be the usual rule for adding real functions, i.e., f + g is the function given by (f + g)(x) = f(x) + g(x) for all x ∈ X, and we let ∗ represent the usual rule for multiplying a real function by a real number, i.e., λ ∗ f is the function given by (λ ∗ f)(x) = (λf)(x) = λf(x) for all x ∈ X. We then have the following result: Proposition 2. The system (R[X],+, ∗,R) is a vector space over the real-number field R. Proof. The proof of this proposition follows the usual procedure of proving that all ten of the vector space axioms are satisfied. We give proofs of two of the axioms and leave the proofs of the other ones as exercises. Closure under Addition. If f, g ∈ R[X], then f(x) and g(x) are defined and are real numbers for all x ∈ X. Then, using the usual rule for function addition, we have (f + g)(x) = f(x) + g(x) for all x ∈ X. Thus, (f + g)(x) is also defined and is a real number for all x ∈ X. Hence f + g : X → R and therefore f + g ∈ R[X]. Closure under Multiplication by Scalars. If f ∈ R[X] and λ ∈ R, then, from the usual rule for multiplication of a function by a real number, we have (λf)(x) = λf(x) for all x ∈ X. Thus (λf)(x) is defined and is a real number for all x ∈ X. Hence, λf : X → R and therefore λf ∈ R[X]. In the next examples we briefly mention some subspaces of the vector space of real-valued functions. These subspaces are of importance in many areas of modern mathematics, science and engineering. The most important one, the subspace of polynomials will be discussed in the next subsection. Example 6. Let (a, b) be an interval of R, and let C[(a, b)] be the set of all continuous real-valued functions on (a, b). Show that C[(a, b)] is a subspace of the vector space R[(a, b)] of all real-valued functions with domain (a, b). c©2020 School of Mathematics and Statistics, UNSW Sydney 6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 53 Solution. We use the Alternative Subspace Theorem. The set C[(a, b)] contains the zero function, as the zero function is continuous and so is in this set. From calculus, we know that if f and g are continuous on an interval, then λ1f + λ2g is also continuous on the same interval for all real λ1 and λ2. Also, λ1f + λ2g is a real function, and hence C[(a, b)] is a subset of R[(a, b)] which satisfies the condition (#) of the Alternative Subspace Theorem. Thus C[(a, b)] is a subspace of R[(a, b)]. ♦ Calculus provides a very rich source of subspaces of the vector space of real-valued functions. Example 7. Let C(1)[(a, b)] be the set of all real-valued functions whose first derivative exists and is continuous on an interval (a, b) of R. Show that C(1)[(a, b)] is a subspace of the vector space R[(a, b)]. Solution. We use the Alternative Subspace Theorem. The set C(1)[(a, b)] contains the zero function, as the zero function has a continuous first deriva- tive and so is in this set. From calculus, we know that if the first derivatives of the real functions f and g exist and are continuous on an interval, then, for all λ1, λ2 ∈ R, the function λ1f + λ2g is also a real-valued function whose first derivative exists and is continuous on the same interval. Thus, C(1)[(a, b)] is a subset of R[(a, b)] which satisfies condition (#) of the Alternative Subspace Theorem, and hence it is a subspace of R[(a, b)]. Note that C(1)[(a, b)] is also a subspace of the vector space C[(a, b)] of real-valued, continuous functions on (a, b) given in Example 6. Can you see why? ♦ An important class of function subspace is defined by the solutions of homogeneous, linear differential equations. Example 8. Let S be the subset of the vector space R[R] of real-valued functions on R defined by S = { f ∈ R[R] : d 2f dx2 − 6 df dx + 5f = 0 } . Show that S is a subspace of R[R]. Solution. As the zero function satisfies the equation, so it belongs to S. For f1, f2 ∈ S and λ1, λ2 ∈ R, we have, on using the properties of derivatives, that d2 dx2 (λ1f1 + λ2f2)− 6 d dx (λ1f1 + λ2f2) + 5(λ1f1 + λ2f2) = λ1 ( d2f1 dx2 − 6df1 dx + 5f1 ) + λ2 ( d2f2 dx2 − 6df2 dx + 5f2 ) = λ10 + λ20 = 0. Hence, λ1f1 + λ2f2 ∈ S, and therefore S is a subspace. ♦ Subspaces can also be defined by integration. c©2020 School of Mathematics and Statistics, UNSW Sydney 54 CHAPTER 6. VECTOR SPACES Example 9. Let S be the subset of the vector space C [[−π, π]] of all real-valued continuous functions on the interval [−π, π] defined by S = { f ∈ C [[−π, π]] : ∫ pi −pi f(x)g(x) dx = 0 } , where g is a fixed continuous function. Clearly the product f(x)g(x) is integrable on [−π, π], since f and g are continuous. Prove that S is a subspace of C [[−π, π]]. Solution. The zero function is in S. For all f1, f2 ∈ S and for all λ1, λ2 ∈ R, we have λ1f1 + λ2f2 ∈ C [[−π, π]] and∫ pi −pi ( λ1f1(x) + λ2f2(x) ) g(x) dx = λ1 ∫ pi −pi f1(x)g(x) dx + λ2 ∫ pi −pi f2(x) g(x)dx = λ10 + λ20 = 0. Hence λ1f1 + λ2f2 ∈ S, and thus S is a subspace of C [[−π, π]]. ♦ Example 10. As shown in courses on differential equations, the homogeneous, linear differential equation in Example 8 has the solution f(x) = λ1e 5x + λ2e x for λ1, λ2 ∈ R. Hence the set S of solutions is given by S = span ( e5x, ex ) , and thus {e5x, ex} is a spanning set for S. ♦ Example 11. Show that the set S = {sin(x), cos(x)} is a linearly independent subset of the vector space R [[−π, π]] of all real-valued functions on the interval [−π, π]. Solution. We have to show that if f(x) = λ1 sin(x) + λ2 cos(x) = 0 for all x ∈ [−π, π] then λ1 and λ2 are zero. We first note that if the linear combination f(x) is zero for all x ∈ [−π, π] then f(x) must also be zero for the values x = 0 and x = pi2 . We then obtain 0 = f(0) = λ2 and 0 = f (π 2 ) = λ1. Thus the scalars are zero, and hence the set is linearly independent. ♦ Example 12. Show that the subset Sn of the vector space R[R] defined by Sn = {sin(kx) : k = 1, . . . , n and x ∈ R} is a linearly independent set. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 55 Solution. We will use a trick which is based on an extension to the vector space of real functions of the idea of orthogonality of geometric vectors and vectors in Rn (see Example 5 and 6 of Section 6.6). We first prove that if k and m are integers then∫ pi 0 sin(kx) sin(mx)dx = { 0 for k 6= m pi 2 for k = m. From trigonometry, we have sin(kx) sin(mx) = 1 2 (cos(k −m)x− cos(k +m)x) . Then, for k 6= m,∫ pi 0 sin(kx) sin(mx)dx = 1 2 ∫ pi 0 cos(k −m)x dx− 1 2 ∫ pi 0 cos(k +m)x dx = [ 1 2(k −m) sin(k −m)x ]pi 0 − [ 1 2(k +m) sin(k +m)x ]pi 0 = 0 as sin(0) = 0, sin(k −m)π = 0, and sin(k +m)π = 0 for integers k and m. We leave the proof of the result for k = m as a simple exercise. Now suppose that n∑ k=1 λk sin(kx) = 0. On multiplying this expression by sin(mx) and integrating from 0 to π, we have 0 = n∑ k=1 λk ∫ pi 0 sin(kx) sin(mx) dx = λm π 2 Hence, λm = 0 for 1 6 m 6 n, and thus the set Sn is linearly independent. ♦ Example 13. Use the previous example, we can show that the vector space R[R] cannot be spanned by a finite set and hence it is an infinite dimensional vector space. If it can be spanned by a finite set of m elements, by Theorem 1 in Section 6.6, all independent set has at most m elements. We can choose an integer n such that n > m, by the previous example, Sn will be a linearly independent set of more than m elements. Hence R[R] cannot be spanned by a finite set.♦ 6.8.3 Vector spaces associated with polynomials Polynomials can be added, subtracted and multiplied. From the point of view of vector spaces, only addition (and subtraction) and multiplication of a polynomial by a scalar are relevant. We will be concerned with polynomials defined over either the real or complex fields. Although it is possible to generalise all of the results in this section to the rational field Q and to also generalise many of the results to the finite fields Zp, (p a prime), we will not do so here. In the following, therefore, the field F should be taken as either the real numbers R or the complex numbers C. We begin by quickly reviewing the definitions of polynomial function, polynomial addition, multiplication of a polynomial by a scalar, and equality of polynomials. c©2020 School of Mathematics and Statistics, UNSW Sydney 56 CHAPTER 6. VECTOR SPACES Definition 1. A function p : F → F is called a polynomial function over F if there is a natural number n ∈ N and numbers a0, a1, . . . , an ∈ F such that p(z) = a0 + a1z + · · ·+ anzn = n∑ k=0 akz k for all z ∈ F. For brevity we will usually refer to a polynomial function as a polynomial even though it is important in advanced mathematics courses to distinguish between the two. Polynomials may be added and multiplied by scalars in such a way as to produce other polyno- mials. The formal definitions follow the usual definitions of addition and multiplication by a scalar for functions (see Appendix 6.9). Definition 2. If p and q are polynomials over the same field F given by p(z) = n∑ k=0 akz k and q(z) = m∑ k=0 bkz k for all z ∈ F, then the sum polynomial is the polynomial p+ q given by (p+ q)(z) = p(z) + q(z) = max(n,m)∑ k=0 (ak + bk)z k for all z ∈ F. That is, the rule is to add corresponding coefficients. The rule for subtraction of polynomials follows immediately from the addition rule, and it is to subtract corresponding coefficients. Definition 3. If λ ∈ F and p is a polynomial over F given by p(z) = n∑ k=0 akz k for all z ∈ F then the scalar multiple λ p of p is the polynomial given by (λp)(z) = λ(p(z)) = n∑ k=0 (λak)z k for all z ∈ F. That is, the rule is to multiply each coefficient by the scalar. The last main property of polynomials that we need is given by the following Uniqueness Proposition (see Chapter 3 for a proof for complex polynomials). Proposition 3 (Uniqueness Proposition for Real and Complex Polynomials). Let p and q be polynomials over F given by p(z) = n∑ k=0 akz k and q(z) = n∑ k=0 bkz k for all z ∈ F. Then, if the field F is either R or C, we have that p(z) = q(z) for all z ∈ F if and only if ak = bk for all k = 0, 1, 2, . . . , n. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 57 An immediate consequence of Proposition 3 is the following important result. Corollary 4. If F is the field R or C, then a polynomial p over F has the property that p(z) = 0 for all z ∈ F if and only if all of its coefficients are zero. Note. that this is not true for other fields such as Zp, p a prime. The unique polynomial, whose function values are all zero and which has all of its coefficients equal to zero, is called the zero polynomial. The fundamental vector space associated with polynomials is defined in the following example. Example 14 (The Vector Space of Polynomials over F). The vector space of polynomials over a field F is the system (P(F),+, ∗,F) defined as follows. The set of “vectors” is the set P(F) of all possible polynomials over the field F, i.e., P(F) = {p : p(z) = a0 + a1z + · · ·+ anzn for n ∈ N, aj ∈ F, 1 6 j 6 n, z ∈ F}. The rule of “vector addition” is the polynomial addition rule given in Definition 2 and the rule of multiplication is the scalar multiplication rule for polynomials given in Definition 3. To prove that this system is a vector space it is necessary to check that all ten of the vector space axioms are satisfied. We leave the checking of the axioms as an exercise. ♦ Example 15. As special cases of Example 14, we have the vector space of real polynomials (P(R),+, ∗,R) (we have seen this in Example 4 in Section 6.1) and the vector space of complex polynomials (P(C),+, ∗,C). ♦ Notation. We will usually talk of the polynomial vector space P instead of the more formal (P(F),+, ∗,F) when there can be no possibility of confusion over the field (R or C) being used. When necessary, the vector space of real polynomials will be referred to as P(R) and the vector space of complex polynomials as P(C). As shown in the following example, the field for the polynomials and the field for the scalars must be, in some sense, compatible. Example 16. Show that the system (P(C),+, ∗,R) of complex polynomials and real scalars is a vector space, whereas the system (P(R),+, ∗,C) of real polynomials and complex scalars is not a vector space. Solution. It can be checked without too much difficulty that both systems satisfy nine of the ten vector space axioms. The axioms satisfied by both are the five axioms of addition, the three scalar-multiplication axioms of associativity, commutativity, multiplication by 1, and the scalar and vector distributive axioms. The remaining axiom to check is closure under scalar multiplication. For the system of complex polynomials and real scalars, we have that if p ∈ P(C) is a complex polynomial and λ ∈ R, then λp is also a complex polynomial. The closure under scalar multiplica- tion axiom is therefore satisfied, and hence the system of complex polynomials and real scalars is a vector space. For the system of real polynomials and complex scalars, we note that x ∈ P(R), i ∈ C, ix 6∈ P(R). The closure under scalar multiplication axiom is therefore not satisfied, and hence the system of real polynomials with complex scalars is not a vector space. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 58 CHAPTER 6. VECTOR SPACES We now consider some of the subspaces of the polynomial vector space P = P(F). The most important subspaces of P are the vector spaces of polynomials of degree less than or equal to n, for some n. Example 17. Let P be the vector space of polynomials over F, and let Pn be the subset of P consisting of all polynomials of degree less than or equal to some fixed integer n > 0, that is, Pn = {p ∈ P : deg(p) 6 n}. Show that Pn is a subspace of P. Solution. If p, q ∈ Pn, then there exist coefficients {a0, . . . , an} and {b0, . . . , bn} such that p(z) = a0 + a1z + · · ·+ anzn q(z) = b0 + b1z + · · ·+ bnzn for all z ∈ F. Then, for λp+ µq with λ, µ ∈ F, we have (λp+ µq)(z) = λp(z) + µq(z) = (λa0 + µb0) + · · ·+ (λan + µbn)zn for all z ∈ F. But the coefficients λaj + µbj, 1 6 j 6 n, are also scalars in F, and hence λp+ µq ∈ Pn. Thus the condition in the Alternative Subspace Theorem is satisfied, and Pn is a subspace of P. ♦ Note that the subset of all polynomials of degree exactly n is not a subspace, since this subset does not contain the zero polynomial. Subspaces of the space of all polynomials can also be formed by selecting all polynomials which have their roots at given points. Example 18. Let Pn be the vector space of polynomials of degree less than or equal to n over F. Show that the subset S of Pn given by S = {p ∈ Pn : p(5) = α} is a subspace of Pn if and only if α = 0. Solution. We use the Alternative Subspace Theorem. If α 6= 0, then the zero polynomial is not in S and so S is not a subspace. If α = 0, then the zero polynomial is in S and so S is not empty. If p, q ∈ S and λ1, λ2 ∈ F, then p(5) = 0 and q(5) = 0 and (λ1p+ λ2q)(5) = λ1p(5) + λ2q(5) = 0. Hence λ1p+ λ2q ∈ S, and S is a subspace. Therefore, S is a subspace of Pn if and only if α = 0. Note that this subspace is the set of all polynomials of degree less than or equal to n over F which have a root at z = 5. ♦ As the ideas of linear combination, span and linear independence apply to all vector spaces, they apply to spaces of polynomials. The methods used in Sections 6.4 and 6.5 for Rn can be used to solve problems of spanning sets and independent sets. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 59 Example 19. Does the complex polynomial p belong to span (p1, p2), where p, p1, p2 are complex polynomials defined by p(z) = 4 + z + 2z2, p1(z) = 1 + z − z2 and p2(z) = 2− z for all z ∈ C? Solution. p ∈ span (p1, p2) if and only if p is a linear combination of p1 and p2, that is, if and only if there are scalars x1, x2 such that p(z) = x1p1(z) + x2p2(z) for all z ∈ C. That is, 4 + z + 2z2 = x1(1 + z − z2) + x2(2− z) = (x1 + 2x2) + (x1 − x2)z + (−x1)z2 for all z ∈ C. From the Uniqueness Proposition for Polynomials (Section 6.8.3), we know that polynomials are equal if and only if coefficients of corresponding powers of z are equal. Equating coefficients of equal powers then gives the system of linear equations x1 + 2x2 = 4, x1 − x2 = 1, −x1 = 2, with augmented matrix (A|b) = 1 2 41 −1 1 −1 0 2 . Then, p is in the span of p1 and p2 if and only if these equations have a solution. It is easy to see that these equations have no solution, and hence p is not in span(p1, p2). ♦ Example 20. Is the set {p1, p2, p3} of polynomials, where p1(z) = 1 + 2z − z2; p2(z) = −3− z + 2z2; p3(z) = 2 + 3z + z2, a linearly independent subset of P2? Solution. Following the usual test for linear independence, we look for scalars x1, x2, x3 such that x1p1 + x2p2 + x3p3 = 0, that is, such that x1(1 + 2z − z2) + x2(−3− z + 2z2) + x3(2 + 3z + z2) = 0 for all z ∈ C. Thus, the polynomial on the left is the zero polynomial, and hence the coefficient of each power of z is zero, that is, x1 − 3x2 + 2x3 = 0; 2x1 − x2 + 3x3 = 0; −x1 + 2x2 + x3 = 0. This system of equations corresponds to the homogeneous system Ax = 0, where the matrix A and an equivalent row-echelon form U are A = 1 −3 22 −1 3 −1 2 1 ; U = 1 −3 20 5 −1 0 0 145 . c©2020 School of Mathematics and Statistics, UNSW Sydney 60 CHAPTER 6. VECTOR SPACES Then, as all columns of U are leading columns, the only solution for the scalars is x1 = x2 = x3 = 0, and hence the set is linearly independent. ♦ As we can discuss spans and independence of sets of polynomials, we also have the notion of dimensions of subspaces of polynomials. In the next example, we construct a standard basis for Pn. Example 21. Show that the set {1, z, . . . , zn} is a basis for Pn(F), where F = R or C. Solution. A polynomial p of degree less than or equal to n over a field F is a function of the form p(z) = a0 + a1z + · · · + anzn with aj ∈ F for 0 6 j 6 n and for z ∈ F, where any or all of the coefficients may be zero. Hence span (1, z, . . . , zn) = Pn(F). Furthermore, if a0 + a1z + · · · + anzn = 0 for all z ∈ C. This linear combination is the zero polynomial, and hence, from the Uniqueness Proposition for Polynomials of Section 6.8.3, all of the coefficients are zero. Hence {1, z, . . . , zn} is independent. Therefore, this set is a basis for Pn(F). ♦ An important result which follows immediately from Example 21 is the following. Proposition 5. The vector space Pn of polynomials of degree less than or equal to n has dimension n+ 1. As a consequence, P is not a finite dimensional space. Example 22. The vector space P of all polynomials cannot be spanned by a finite set. Solution. We can use the same argument used for R[R] in Example 13. However, we now use another approach. Assume the contrary that some set S containing a finite number of polynomials is a spanning set for P. Then, since the number of polynomials in S is finite, there must be a highest-degree polynomial in S. Let N be the degree of this polynomial. Then, no polynomial p with deg(p) > N is in span (S). Hence, P is not spanned by any finite set of polynomials. ♦ Example 23. Show that the set S = {2 + z,−1 + z2, z − z2} is a basis for P2. Solution. If p ∈ P2, then p is given by p(z) = a0 + a1z + a2z 2 for all z ∈ C. Then p ∈ span (S) if there exist scalars x1, x2, x3 such that a0 + a1z + a2z 2 = x1(2 + z) + x2(−1 + z2) + x3(z − z2) for all z ∈ C. On equating coefficients of powers of z, we obtain the system of linear equations with augmented matrix (A|b) = 2 −1 0 a01 0 1 a1 0 1 −1 a2 c©2020 School of Mathematics and Statistics, UNSW Sydney 6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 61 On using Gaussian elimination, we obtain the augmented matrix (U |y) = 2 −1 0 a00 12 1 −12a0 + a1 0 0 −3 a0 − 2a1 + a2 . Then, as there are no zero rows in U , there is a solution for all right hand sides. Thus, for every polynomial p ∈ P2, we have p ∈ span (S), and hence S is a spanning set for P2. Further, as U has no non-leading columns, the only solution for a zero right hand side is x1 = x2 = x3 = 0, and hence S is linearly independent. S is therefore a basis for P2. ♦ The coordinate-vector idea applies immediately to the finite-dimensional vector space Pn. Example 24. For the standard basis of Pn consisting of powers of z, that is, {1, z, . . . , zn}, the coordinate vector consists of the coefficients of the polynomial. For example, the polynomial p ∈ Pn defined by p(z) = a0 + a1z + · · ·+ anzn has the coordinate vector ( a0 a1 · · · an )T with respect to the standard basis. Note. The order is important. For example, the coordinate vector of p with respect to the ordered basis {z2, z, 1, z3, . . . , zn} would be (a2 a1 a0 a3 · · · an )T . ♦ Example 25. Find the coordinate vector for the polynomial p3(z) = −1+ 5z2 with respect to the ordered basis {p1, p2} of span (p1, p2), where p1(z) = 1 + 2z + 3z2 and p2 = 1 + z − z2. Solution. We must find the scalars in the expression for p3 as a linear combination of p1 and p2. On writing p3 = x1p1 + x2p2 and equating coefficients of equal powers of z, we get the system of equations with augmented matrix (A|b) = 1 1 −12 1 0 3 −1 5 . The solution is x1 = 1, x2 = −2. Hence the coordinate vector of p3 with respect to the ordered basis {p1, p2} is ( 1 −2 ) , i.e., p3(z) = −1 + 5z2 = 1p1(z)− 2p2(z) = 1(1 + 2z + 3z2)− 2(1 + z − z2). ♦ Example 26. A polynomial p has a coordinate vector 2−1 4 with respect to the ordered basis {p1 = 1, p2 = 1 + z, p3 = 2− z + z2} of P2. Find p. Solution. p = 2p1 − p2 + 4p3 = 9− 5z + 4z2. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 62 CHAPTER 6. VECTOR SPACES 6.9 A brief review of set and function notation 6.9.1 Set notation. A set is any collection of elements. Sets are usually defined either by listing their elements or by giving a rule for selection of the elements. The elements of a set are usually enclosed in braces {}. Example 1. S = {1, 4,−7} is the set whose elements are 1,4, and −7. ♦ Common notation for a set defined by a rule is shown in the following example. Example 2. The notation S = {x ∈ Rn : x1 > 0, x3 6 4} is read as: the set S of vectors x in Rn such that x1 is greater than or equal to zero and x3 is less than or equal to 4. Note that the colon (:) is read as “such that” and the comma (,) is read as “and”. ♦ Definition 1. Two sets A and B are equal (notation A = B) if every element of A is an element of B, and if every element of B is an element of A. To prove that A = B it is necessary to prove that the two conditions: 1. if x ∈ A then x ∈ B, and 2. if x ∈ B then x ∈ A are both satisfied. Definition 2. A set A is a subset of another set B (notation A ⊆ B) if every element of A is also an element of B. To prove that A ⊆ B it is necessary to prove that the condition: if x ∈ A then x ∈ B is satisfied. Definition 3. A is said to be a proper subset of B if A is a subset of B and at least one element of B is not an element of A. To prove that A is a proper subset of B it is necessary to prove that the two conditions: 1. if x ∈ A then x ∈ B, and 2. for some x ∈ B, x is not an element of A are both satisfied. c©2020 School of Mathematics and Statistics, UNSW Sydney 6.9. A BRIEF REVIEW OF SET AND FUNCTION NOTATION 63 Definition 4. The intersection of two sets A and B (notation: A∩B) is the set of elements which are common to both sets. That is, A ∩B = {x : x ∈ A and x ∈ B}. Definition 5. The union of two sets A and B (notation: A ∪ B) is the set of all elements which are in either or both sets. That is, A ∪B = {x : x ∈ A or x ∈ B}. 6.9.2 Function notation The notation f : X → Y (which is read as “f is a function (or map) from the set X to the set Y ”) means that f is a rule which associates exactly one element y ∈ Y to each element x ∈ X. The y associated with x is written as y = f(x) and is called the “value of the function f at x” or “the image of x under f”. The set X is often called the domain of the function f and the set Y is often called the codomain of the function f . Equality of Functions. Two functions f : X → Y and g : X → Y are defined to be equal if and only if f(x) = g(x) for all x ∈ X. Addition of Functions. If f : X → Y and g : X → Y , and if elements of Y can be added, then the sum function f + g is defined by (f + g)(x) = f(x) + g(x) for all x ∈ X. Multiplication by a Scalar. If f : X → Y and λ ∈ F, where F is a field, and if elements of Y can be multiplied by elements of F, then the function λf is defined by (λf)(x) = λ ( f(x) ) for all x ∈ X. Multiplication of Functions. If f : X → Y and g : X → Y , and if elements of Y can be multiplied, then the product function fg is defined by (fg)(x) = f(x)g(x) for all x ∈ X. Composition of Functions. If g : X → W and f : W → Y , then the composition function f ◦ g : X → Y is defined by (f ◦ g)(x) = f(g(x)) for all x ∈ X. c©2020 School of Mathematics and Statistics, UNSW Sydney 64 CHAPTER 6. VECTOR SPACES 6.10 Vector spaces and MAPLE Most of the problems in this chapter can be solved using matrix methods, which of course means that Maple can be very helpful. You might look at your session 1 notes to refresh your memory as to how Maple handles vectors and matrices. As usual, you should type with(LinearAlgebra); in order to load the linear algebra package. The following commands, for example, can be used to put the vectors v1,v2, and v3 as the columns in a 3× 3 matrix. v1:=<1,2,3>; v2:=<0,-1,2>; v3:=<3,-1,3>; A:=
; We could now test whether these three vectors are linearly independent by performing Gaussian elimination on A. GaussianElimination(A); In this particular example, there are no non-leading columns so the vectors are linearly independent (and hence form a basis for R3). Most of the other problems concerning problems in Rn can be solved using similar methods. Actually, the LinearAlgebra package contains many ready-made commands for performing the standard calculations. For example Basis({v1,v2,v3}); returns a subset of {v1,v2,v3} which forms a basis for span (v1,v2,v3). Problems in other vector spaces can often be solved by using coordinate vectors to convert the problem to one in Rn. Finding the coordinate vector for a vector b ∈ Rn with respect to a new basis is quite simple. For example, with the ordered basis {v1,v2,v3}, the commands b:=<1,1,1>; coordvect:=LinearSolve(A,b); find the coordinate vector of b. The firstyear package contains commands to find the coordinate vector of a polynomial in Pn with respect to the standard basis {1, x, x2, . . . , xn} (be careful with the capital letters). with(firstyear); p:=3*x^2-x+4; v:=polytoVect(p,2); Vecttopoly(v,x); c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 6 65 Problems for Chapter 6 Questions marked with [R] are routine, [H] harder, [X] extra for MATH1241, [M] Maple. You should try to solve some of the questions in Sections 6.4 to 6.7 with Maple. Problems 6.1 : Definitions and examples of vector spaces 1. [R] Show that the set S = { x ∈ R3 : x1 6 0, x2 > 0 } , with the usual rules for addition and multiplication by a scalar in R3 is not a vector space by showing that at least one of the vector space axioms is not satisfied. Give a geometric interpretation of this result. 2. [R] Show that the system S with the usual rules for addition and multiplication by a scalar in R3, and where S = { x ∈ R3 : 2x1 + 3x32 − 4x23 = 0 } , is not a vector space by showing that at least one of the vector space axioms is not satisfied. 3. [R] Let S = ab c ∈ R3 : (a− b)c = 0 . a) Write down two non-zero elements of S. b) Show that S is not closed under vector addition. 4. [H] The set Cn is a vector space over C (see Example 2 of Section 6.1). Check that axioms 1, 2, 6, 9 are satisfied by this system. 5. [X] Let Mmn(C) be the set of all m × n matrices with complex entries with addition the usual rule for addition of complex matrices, and multiplication by a scalar the usual rule for multiplication of a complex matrix by a complex scalar. Prove that the vector space Mmn(C) satisfies axioms 1, 3, 6 and 10. 6. [H] Prove that the system (Cn,+, ∗,R) with “natural” definitions of + and ∗ is a vector space, whereas the system (Rn,+, ∗,C) with “natural” definitions of + and ∗ is not a vector space. 7. [X] Consider the system (R2,+′, ∗′,R) in which the usual operations of “addition” and “mul- tiplication by a scalar” are replaced by the new definitions:( a1 a2 ) +′ ( b1 b2 ) = ( a1 + b1 a2 − 3b2 ) λ ∗′ ( a1 a2 ) = ( 4λa1 λa2 ) . Give a list of the vector space axioms satisfied by this system, and a list of any which are not satisfied. Is this system a vector space? c©2020 School of Mathematics and Statistics, UNSW Sydney 66 CHAPTER 6. VECTOR SPACES Problems 6.2 : Vector arithmetic 8. [H] Prove that the following properties are true for every vector space. a) 2v = v + v. b) nv = v + · · ·+ v, where there are n terms on the right. 9. [H] Prove parts 2, 4 and 5 of Proposition 2 of Section 6.2. Problems 6.3 : Subspaces 10. [R] Suppose v = 12 3 . Show that the line segment defined by S = { x ∈ R3 : x = αv, for 0 6 α 6 10} is not a subspace of R3. 11. [R] Show that the set S = { x ∈ R3 : 2x1 + 3x2 − 4x3 = 6 } , is not a subspace of R3. Give a geometric interpretation of this result. 12. [R] Let S the set S = { x ∈ R3 : 2x1 + 3x2 − 4x3 = 0 } , a) Find three distinct members of S. b) Show that S is a subspace of R3. c) Give a geometric interpretation of this latter result. 13. [R] Show that T = xy z : −1 6 x+ y + z 6 1 is not a vector subspace of R3. 14. [R] Show that the set S = { x ∈ R3 : 2x1 + 3x2 − 4x3 = 4x1 − 2x2 + 3x3 = 0 } is a subspace of R3. 15. [R] Show that the set S = { x ∈ R3 : 2x1 + 3x2 − 4x3 = 0 or 4x1 − 2x2 + 3x3 = 0 } is not a subspace of R3. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 6 67 16. [R] Show that the set S = { b ∈ R2 : b = Ax for some x ∈ R3} , where A = ( 2 −3 1 4 5 −3 ) , is a subspace of R2. Explain why each column of the matrix belongs to the set S. 17. [R] For each of the following subsets of R3, either prove that the given subset is a subspace of R3 or explain why it is not a subspace. a) S = x10 x3 ∈ R3 : x1 + 2x3 > 0 . b) T = { 4∑ i=1 λivi : λi ∈ R, 1 6 i 6 4 } , where v1, v2, v3, v4 are given fixed vectors in R3. c) U = { Ax : x ∈ R5}, where A is a fixed 3× 5 matrix. 18. [H] Suppose that u = 10 −2 and v = 12 3 . Show, by the Subspace Theorem that the set S = { x ∈ R3 : x = λu+ µv, for λ, µ ∈ R} is a subspace of R3. 19. [H] Prove that the set S in Example 6 on page 15 is closed under multiplication by a scalar. 20. [H] Let a and b be two fixed non-zero vectors in R5. Show that W = { x ∈ R5 : x · a = x · b = 0} is a subspace of R5. If a = e1 = 1 0 0 0 0 and b = e2 = 0 1 0 0 0 , describe W. 21. [R] Show that the set S = {p ∈ P2 : p(0) = 1} is NOT a subspace of P2. 22. [R] Show that the set S = { p ∈ P3 : p′′(x) = 0 for all x ∈ R } is a subspace of P3. c©2020 School of Mathematics and Statistics, UNSW Sydney 68 CHAPTER 6. VECTOR SPACES 23. [H] Is the set S = { p ∈ P3 : p′(x) + x+ 1 = 0 for all x ∈ R } a subspace of P3? 24. [H] Consider the set S = { p ∈ P3 : (x+ 1)p′(x)− 3p(x) = 0 for all x ∈ R } . a) Show that S is a subspace of P3, (the set of all real polynomials of degree 6 3). b) Find a polynomial in S where not all the coefficients are zero. 25. [H] By constructing a counterexample, show that the union of two subspaces is not, in general, a subspace. 26. [H] Let W1 and W2 be two subspaces of a vector space V over the field F. Prove that the intersection of W1 and W2 (i.e., the set W1 ∩W2) is a subspace of V . 27. [X] Let V be a vector space over the field F. a) Let {Wk : 1 6 k 6 m} be m subspaces of V , and let W be the intersection of these m subspaces. Prove that W is a subspace of V . b) Let S be any set of vectors in V , and let W be the intersection of all subspaces of V which contain S (that is, x ∈W if and only if x lies in every subspace which contains S). Prove that W is the set of finite linear combinations of vectors from S. Problems 6.4 : Linear combinations and spans 28. [R] Let a = 1011 4 , v1 = 21 4 , v2 = −1−2 1 and v3 = 33 −1 . a) Is a ∈ span(v1,v2,v3)? If so, express a as a linear combination of v1,v2 and v3. b) Do the vectors v1,v2,v3 span R 3? If not, find condition(s) on b = b1b2 b3 for b to belong to span(v1,v2,v3) and interpret your answer geometrically. 29. [R] Repeat the preceding question using a = 9−2 −4 , v1 = 02 3 , v2 = 5−2 −3 and v3 = 15−4 −6 . 30. [R] Repeat using a = 1 1 −9 1 , v1 = 1 3 0 5 , v2 = 2 2 1 2 , v3 = −1 0 4 1 and b = b1 b2 b3 b4 . [Replace R3 by R4, of course.] c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 6 69 31. [R] Is the vector b = −2−6 −4 ∈ span (v1,v2,v3,v4), where v1 = 13 0 , v2 = 22 1 , v3 = −10 −1 , v4 = 1−2 1 ? 32. [R] Is the set of vectors v1 = 12 3 , v2 = 11 −1 , v3 = −10 5 a spanning set for R3? 33. [R] Does v belong to the column space of A, col(A), where v = 2 −5 19 −13 and A = 1 3 −1 0 1 2 2 −3 −5 1 2 7 ? If so, write v as a linear combination of the columns of A. 34. [R] Is the polynomial p(x) = 1 + x+ x2 in span ( 1− x+ 2x2, −1 + x2, −2− x+ 5x2)? 35. [R] Is S = { 1 + x, 1− x2, x+ 2x2} a spanning set for P2? 36. [X] Prove Proposition 1 of Section 6.4. 37. [X] Use the vector space axioms to prove that we do not need to use brackets when writing down the linear combination n∑ k=1 λkvk = λ1v1 + · · ·+ λnvn. That is, prove that the result of the operations in independent of the order in which the additions are performed. Problems 6.5 : Linear independence 38. [R] Is the set of vectors v1 = 12 3 , v2 = 11 −1 , v3 = −10 5 linearly independent? Are these three vectors coplanar? 39. [R] Is the set S = {v1,v2,v3}, where v1 = 11 −1 , v2 = 2−1 0 , v3 = 5−4 3 , a linearly independent set? Are these three vectors coplanar? 40. [R] Can a set of linearly independent vectors contain a zero vector? Explain your answer. c©2020 School of Mathematics and Statistics, UNSW Sydney 70 CHAPTER 6. VECTOR SPACES 41. [R] Given the set S = {v1,v2,v3}, where v1 = 1−3 −2 , v2 = 32 1 , v3 = 4−1 −1 , do the following. a) Show that S is a linearly dependent set. b) Show that at least one of the vectors in S can be written as a linear combination of the others, and find the corresponding linear combination. c) Find all possible ways of writing the vector 89 5 as a linear combination of the vectors in the set. d) Find a linearly independent subset of S with the same span as S, and then show that 89 5 can be written as a unique linear combination of this subset. e) Give a geometric interpretation of span (S). 42. [R] Repeat the previous question for the set of four vectors S = {v1,v2,v3,v4}, where v1 = 13 0 , v2 = 22 1 , v3 = −10 −1 , v4 = 1−2 1 . 43. [R] Is { 1− x+ 2x2, −1 + x2, −2− x+ 5x2} a linearly independent subset of P2? If the set is not linearly independent express one of the polynomials as a linear combination of the others. 44. [H] (For discussion). Let the set S = {v1, . . . ,vn} be a linearly independent subset of a vector space V . You are standing at the origin of V and set off in the direction of v1. After a certain length of time, you turn and head in direction v2 — then in direction v3 and so on. Is it possible for you to return to the origin? (Note: You may walk any distance that you like along any of the directions, but you are not allowed to retrace your steps). 45. [H] What would happen in the previous question if the set S were a linearly dependent set? 46. [H] Assume that m 6 n and that S = {v1, . . . ,vm} is a set of mutually orthogonal, non-zero vectors in Rn, that is, the dot products satisfy (see Section 5.3.1) vi · vj = 0 for i 6= j; 1 6 i, j 6 m vi · vi 6= 0 for 1 6 i 6 m. Show that S is a linearly independent set. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 6 71 Problems 6.6 : Basis and dimension 47. [R] Is the set S = {v1,v2,v3}, where v1 = 11 −1 , v2 = 2−1 0 , v3 = 5−4 3 , a basis for R3? 48. [R] Find a basis for, and the dimension of, W = span (v1,v2,v3), where v1 = 12 3 , v2 = 11 −1 , v3 = −10 5 . 49. [R] Without doing any calculation, explain why 1 3 0 5 , 2 2 1 3 , −1 0 4 0 is not a spanning set for R4. Similarly, without doing any calculation, explain why 13 0 , 22 1 , −10 1 , 1−3 1 is a linearly dependent set. 50. [R] Which of the following statements are true and which are false? Explain your answer. a) Any set of 6 vectors in R5 is linearly dependent. b) Some sets of 6 vectors in R5 are linearly independent. c) Any set of 6 vectors in R5 is a spanning set for R5. d) Some sets of 6 vectors in R5 span R5. e) Same as in (a) – (d), with 6 replaced by 4. f) Any set of 5 vectors in R5 is a basis for R5. g) Some sets of 5 vectors in R5 are bases for R5. h) Any set of vectors which spans R5 is linearly independent. i) Any set of 5 vectors which spans R5 is linearly independent. j) Any 5 linearly independent vectors in R5 form a basis for R5. 51. [R] Let V be a finite dimensional real vector space, and let S = {v1, . . . ,vn} be a finite set of vectors in V . Suppose also that the dimension of V is ℓ. State, with brief reasons, the relationship, if any, between n and ℓ if a) S is linearly independent. b) S is linearly dependent. c) S spans V . d) S is a basis for V . c©2020 School of Mathematics and Statistics, UNSW Sydney 72 CHAPTER 6. VECTOR SPACES 52. [H] Explain why it is impossible to have a set of m mutually orthogonal, non-zero vectors in Rn with m > n. 53. [R] Consider the plane P in R3 whose equation is x+ y + z = 0. a) Prove that P is a subspace of R3. b) Find a basis for P . Give reasons for your answer. 54. [R] Find a basis for, and the dimension of, the column space of the matrix A = 1 1 −1 −2 1 0 0 1 4 −1 0 0 0 2 2 0 0 0 0 0 . 55. [R] Find a basis for, and the dimension of, col(A), where A = 1 1 0 2 1 0 0 −1 −2 2 −1 −1 1 4 −1 1 1 0 4 2 . 56. [R] Show that the columns of the matrix A given below are not a spanning set for R4. Then find a basis for R4 which contains as many of the columns of A as possible. A = 1 3 3 −7 5 2 6 5 −8 1 3 9 5 −3 −2 4 12 5 2 −5 . 57. [R] Consider the set T = {v1, v2, v3, v4, x} where v1 = 1 2 −1 0 , v2 = 3 6 −3 0 , v3 = 2 1 −1 4 , v4 = −1 −5 2 4 , x = 6 −3 −1 20 . a) Find a basis B for span (v1, v2, v3, v4, x). b) Explain why x belongs to span (v1, v2, v3, v4). Write x as a linear combination of B. c) Suppose the matrix A has columns v1, v2, v3, v4. What is the dimension of the column space of A? 58. [R] Show that the set { 1− x2 + x3, x+ 2x2, 2 + x− x2 + 2x3, 2x− x2 + x3} forms a basis for P3. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 6 73 59. [H] Consider the set of polynomials S = {p1, p2, p3, p4} in P2, where p1(z) = 1 + z − z2, p2(z) = 2− z, p3(z) = 5− 4z + z2, p4(z) = z2. Show that S is a linearly dependent spanning set for P2, and then find a subset of S which is a basis for P2. 60. [H] You are given that V is a vector space and that S = {v1,v2,v3} is a subset of V . Suppose that w ∈ span(S). Prove that the set {v1,v2,v3,w} is a linearly dependent set. 61. [X] Prove that the only subspaces of R4 are (i) {0}, (ii) lines through the origin, (iii) planes through the origin, (iv) subspaces of the form span (S), where S is any set of three linearly independent vectors in R4, and (v) R4 itself. Problems 6.7 : [X] Coordinate vectors 62. [X] Show that the columns of the matrix A = 1 2 −1 1 3 2 0 −2 0 1 −1 1 5 3 0 −1 are a basis for R4. Then find the coordinate vector of v = −2 −6 −4 −2 with respect to the ordered basis given by the columns of A. 63. [X] A vector v ∈ R4 has the coordinate vector 1 6 −1 4 with respect to the ordered basis formed by the columns of the matrix A of the previous question. Find v. 64. [X] Find the vector v that has coordinate vector 2−1 1 with respect to the ordered basis 12 −2 , 37 −5 , 24 9 of R3. 65. [X] Find the coordinates of the following vectors with respect to the given ordered bases. c©2020 School of Mathematics and Statistics, UNSW Sydney 74 CHAPTER 6. VECTOR SPACES a) 12 3 with respect to 10 1 , 11 1 , −20 −1 . b) a1a2 a3 with respect to 10 1 , 11 1 , −20 −1 . 66. [X] With respect to the basis B = 1−1 2 , 34 6 , −23 −3 of R3, a) find the vector v with coordinate vector [v]B = 31 −3 ; b) find the coordinate vector of w = 7−3 11 . 67. [X] Consider the set S = {v1,v2,v3}, where v1 = 1√ 2 − 1√ 2 0 , v2 = 1√ 3 1√ 3 1√ 3 , v3 = − 1√ 6 − 1√ 6 2√ 6 . Without solving systems of linear equations, do the following. a) Show that S is an orthonormal set of vectors in R3. b) Show that S is a basis for R3. c) Find the coordinate vector of −13 4 with respect to the ordered basis S. HINT. See Example 6 of Section 6.6. 68. [X] Let S = {u1, . . . ,un} be an orthonormal set of n vectors in Rn. Prove that S is a basis for Rn, and then show that the coordinate vector for any v ∈ Rn is given by [v]S = x1... xn , where xj = uj · v. Problems 6.8 : [X] Further important examples of vector spaces 69. [X] Let M22 be the vector space of all 2 × 2 matrices with real entries (see Example 3 of Section 6.1). Let S be the set S = {A ∈M22 : a11 + a22 = 5} . c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 6 75 a) Find three matrices in S. b) Is S a subspace of M22? Give a reason. 70. [X] Let T be the set T = {A ∈M22 : a11 + a22 = 0} . a) Find three matrices in T . b) Is T a subspace of M22? Give a reason. 71. [X] Show that the four matrices( 1 0 0 0 ) , ( 0 1 0 0 ) , ( 0 0 1 0 ) , ( 0 0 0 1 ) . form a basis for M22(R). This set is called the standard basis for M22(R). 72. [X] Show that the four matrices of the previous question also form a basis for the vector space M22(C) of all 2× 2 matrices with complex entries. HINT: Can you see why your proof of the previous question will also be valid for complex numbers? 73. [X] Show that the set of four matrices( 1 0 0 1 ) , ( 0 1 1 0 ) , ( 0 −i i 0 ) , ( 1 0 0 −1 ) . form a basis for M22(C). These matrices are called the Pauli spin matrices, and they are important in quantum physics and chemistry. 74. [X] Find the coordinates of the following vectors with respect to the given ordered bases. a) The matrix A = ( a11 a12 a21 a22 ) with respect to the standard basis for M22 given in question 71. Note that the results are the same for both real and complex numbers b) Repeat part (b) for the basis of Pauli spin matrices given in question 73. In this case the entries of A should be regarded as complex numbers. 75. [X] Let R = {( 1 0 −2 0 ) , ( 0 1 3 0 ) , ( 0 0 5 1 )} . a) Express ( −4 2 −1 −3 ) as a linear combination of elements of R. b) Does R span M22(R), the space of all 2 × 2 matrices? Give a brief reason for your answer. c©2020 School of Mathematics and Statistics, UNSW Sydney 76 CHAPTER 6. VECTOR SPACES 76. [X] Complete the proof of Proposition 2 of Section 6.8 that the system (R[X],+, ∗,R) is a vector space. 77. [X] Let C[X] be the set of all complex-valued functions with domain X. Show that the system (C[X],+, ∗,C), where + and ∗ are the usual rules for addition and multiplication by a scalar of functions, satisfies vector space axioms 2, 4, 7 and 9. This system is a vector space over the complex field C. 78. [X] Show that the set S = { y ∈ R[R] : d 2y dx2 + 3 dy dx + 4y = 0 } is a subspace of the vector space R[R] of all real-valued functions with domain R. 79. [X] Is the set S = { y ∈ R[R] : d 2y dx2 + 3 dy dx + 4y = 5 } a subspace of R[R]? Prove your answer. 80. [X] Let C(k)[R] be the set of all real-valued functions with domain R for which the first k derivatives exist and are continuous. Prove that C(k)[R] is a subspace of the vector space R[R] of all real-valued functions with domain R. 81. [X] Show that the vector spaces C(k)[R] defined in the previous question have the property that, if m > n, then C(m)[R] is a subspace of C(n)[R]. 82. [X] Let S be the subset of R [[−π, π]] defined by S = { f ∈ R [[−π, π]] : ∫ pi −pi cos(x+ t)f(t)dt = 0 for all x ∈ [−π, π] } . Prove that S is a subspace of the vector space R [[−π, π]]. 83. [X] This question generalises the results of question 46 to real-valued functions. Let S = {f1, . . . , fn} be a set of real-valued functions defined on an interval [a, b] with the properties that ∫ b a fi(x)fj(x)dx = 0 for i 6= j; 1 6 i, j 6 n ∫ b a f2i (x)dx 6= 0 for 1 6 i 6 n. Prove that S is a linearly independent set. Note. A set of functions with these properties is said to be mutually orthogonal on the interval [a, b]. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 6 77 84. [X] Show that the set S = { p ∈ Pn(R) : 5p′(6) + 3p(6) = 0 } , where p′(x) = dp dx , is a subspace of the vector space Pn(R) of all real polynomials of degree less than or equal to n. 85. [X] Is the set S = { p ∈ Pn(R) : 5p′(6) + 3p(6) = 8 } , where p′(x) = dp dx , a subspace of Pn(R)? Prove your answer. 86. [X] Let P be the set of all polynomials over the complex-number field C. Show that P is a subspace of the vector space C[C] of all complex-valued functions with domain C. 87. [X] Is the polynomial p ∈ span (p1, p2, p3), where the polynomials are defined by p(z) = −6+2z+30z2, p1(z) = 1+2z+3z2, p2(z) = −4−z+9z2, p3(z) = −5−z+12z2. 88. [X] Find conditions on the coefficients of the polynomial p ∈ P2 for p to be a linear combination of the three polynomials p1,p2, p3, where the polynomials are given by p1(z) = 2z + 3z 2, p2(z) = 5− 2z − 3z2, p3(z) = 15− 4z − 6z2. 89. [X] Are the polynomials p1, p2, p3 in the previous two questions spanning sets for P2? 90. [X] Is the set of polynomials S = {p1, p2, p3} in P2, where p1(z) = 1 + z − z2, p2(z) = 2− z, p3(z) = 5− 4z + z2, a linearly independent set? If not, express one of the polynomials as a linear combination of the others. 91. [X] Show that p1(z) = −2 + 5z − 4z2 + 15z3 − 5z4 + z5, p2(z) = 3z + 4z 2 − 3z3 + 6z5, p3(z) = 2 + 3z 2 − 4z3 + 10z4 − 5z5, p4(z) = 3 + 14z 2 − 5z3 + 6z4 − 3z5, p5(z) = 3 + 8z + 17z 2 + 3z3 + 11z4 − z5, p6(z) = −3 + 11z − 7z2 + 10z3 − z4 + 11z5, are not a spanning set for P5, and then construct a basis for P5 containing as many of the given polynomials as possible. HINT. Check using Maple. c©2020 School of Mathematics and Statistics, UNSW Sydney 78 CHAPTER 6. VECTOR SPACES 92. [X] Find the coordinate vector for p(x) = 1 + 2x + x2 with respect to the ordered basis{ 1 + x, 1− x2, x+ 2x2} of P2. 93. [X] Find the coordinate vector of 1+ 2z+3z2 with respect to the ordered basis of P2 given by{ 1 8 z(z − 2), 1 − 1 4 z2, 1 8 z(z + 2) } . Note. This question and the one that follows do not require Gaussian Elimination. 94. [X] Find the coordinate vector of a0+a1z+a2z 2 with respect to the ordered basis of P2 given by { 1 2 z(z − 1), 1 − z2, 1 2 z(z + 1) } . 95. [X] Let S = {p1, . . . , pn} be a set of n polynomials in Pn−1(R) with the property that∫ b a pi(x)pj(x)dx = { 0 for i 6= j 1 for i = j for 1 6 i, j 6 n. A set of polynomials with this property is called an orthonormal set of polynomials on the interval [a, b]. Prove that S is a basis for Pn−1(R), and then show that the coordinate vector for any p ∈ Pn−1(R) is given by [p]S = x1... xn , where xi = ∫ b a pi(x)p(x)dx. c©2020 School of Mathematics and Statistics, UNSW Sydney 79 Chapter 7 LINEAR TRANSFORMATIONS “But I don’t need a Sillygism, you know, to prove that mathematical axiom you mentioned.” “Nor to prove that ‘all angles are equal’, I suppose?” “Why, of course not! One takes such a simple truth as that for granted!” Lewis Carroll, Sylvie and Bruno. The purpose of this chapter is to give an introduction to an extremely important class of functions called “linear transformations” or “linear maps”. Mathematical examples of linear trans- formations include geometric transformations such as stretching, reflection and rotation, algebraic operations such as matrix multiplication, and calculus operations such as differentiation and in- tegration. Linear transformations are also widely used in many applications of mathematics, and objects which are often modelled (either exactly or approximately) by linear transformations are related to radio and TV sets, amplifiers and hi-fi equipment, atomic spectra, molecular vibrations, sound waves, ocean waves, oil refineries, chemical plants, profit of a company, inventory of a factory or shop, and the state of an economy. 7.1 Introduction to linear maps Before reading this chapter you should quickly read the brief review of function notation given in Appendix 6.9. As stated in Appendix 6.9, a function f with domain X and codomain Y (notation f : X → Y ) is a rule which associates exactly one element y = f(x) of Y to each element x ∈ X. Note that an element x in the domain X is called an “argument” of the function and the corresponding element y = f(x) in the codomain Y is usually called either “the function value of x” or the “image of x under f”. Linear maps are an important special class of functions, in which both the domain and the codomain are vector spaces (that is, all arguments and values of the functions are vectors), and in which the two vector-space operations of addition and scalar multiplication are “preserved” by the function in the sense that: Addition Condition. The function value of a sum of the two vectors is equal to the sum of the function values of the vectors. Scalar Multiplication Condition. The function value of a scalar multiple of a vector is equal to the scalar multiple of the function value of the vector. c©2020 School of Mathematics and Statistics, UNSW Sydney 80 CHAPTER 7. LINEAR TRANSFORMATIONS A more formal mathematical definition of a linear map is as follows. Definition 1. Let V andW be two vector spaces over the same field F. A function T : V →W is called a linear map or linear transformation if the following two conditions are satisfied. Addition Condition. T (v + v′) = T (v) + T (v′) for all v,v′ ∈ V , and Scalar Multiplication Condition. T (λv) = λT (v) for all λ ∈ F and v ∈ V . The domain and codomain of a linear map can be any vector spaces. In Section 7.5, we shall be concentrate specifically on linear maps from Rn to Rm. Unless otherwise stated, the following propositions and theorems are true for all linear maps. The adjective “linear” in “linear map” suggests that the idea of a linear map arose from the geometric idea of a line. The connection between the equation of a line and a linear map is shown in Figure 1 (a) and (b) and in Example 1 below. Example 1. Show that the function T : R→ R defined by T (x) = a0 + a1x for x ∈ R, where a0, a1 ∈ R are constants, is a linear map if and only if a0 = 0. Solution. We check the conditions of the definition of a linear map. Firstly, the domain R and codomain R are both vector spaces as R is a vector space. Further, we have, for x, x′ ∈ R, T (x+ x′) = a0 + a1(x+ x′), whereas, T (x) + T (x′) = (a0 + a1x) + (a0 + a1x′) = 2a0 + a1(x+ x′). Thus, the addition condition is satisfied if and only if a0 = 2a0, that is, if and only if a0 = 0. For a0 = 0, we check the scalar multiplication condition, and obtain T (λx) = a1(λx) = λ(a1x) = λT (x), as required. Thus, the conditions for T to be a linear map are satisfied if and only if a0 = 0. ♦ Note. 1. The equation y = T (x) = a0 + a1x is the equation of a line in R 2. Example 1 shows that the equation of a line defines a linear map if and only if the line goes through the origin. 2. The function T (x) = a0 + a1x is a polynomial of degree 1 and is often called a linear poly- nomial. Example 1 shows that a “linear polynomial” is a “linear map” if and only if the constant term in the polynomial is zero. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.1. INTRODUCTION TO LINEAR MAPS 81 x T (x) 0 x T (x) 0 Figure 1(a). A linear map. Figure 1(b). A linear polynomial which is NOT a linear map. Example 2. Show that the function T : R3 → R2 defined by T (x) = (−5x2 + 4x3 x1 + 2x3 ) for x = x1x2 x3 ∈ R3, is a linear map. Solution. The domain R3 and codomain R2 are both vector spaces. We next check the addition and scalar multiplication conditions. Addition condition. For x,x′ ∈ R3, we have x+ x′ = x1 + x′1x2 + x′2 x3 + x ′ 3 ∈ R3, and hence T (x+ x′) = (−5(x2 + x′2) + 4(x3 + x′3) (x1 + x ′ 1) + 2(x3 + x ′ 3) ) = (−5x2 + 4x3 x1 + 2x3 ) + (−5x′2 + 4x′3 x′1 + 2x ′ 3 ) = T (x) + T (x′). Thus the addition condition is satisfied. Scalar multiplication condition. For x ∈ R3 and λ ∈ R, we have λx = λx1λx2 λx3 ∈ R3, and hence T (λx) = (−5(λx2) + 4(λx3) λx1 + 2(λx3) ) = λ (−5x2 + 4x3 x1 + 2x3 ) = λT (x). Thus, the scalar multiplication condition is also satisfied, and therefore T is a linear map. ♦ We shall now summarise some useful properties that are true for all linear maps. In all of the following propositions and theorems the domain V and the codomain W are assumed to be vector spaces over the same field F. Here is a useful proposition regarding linear maps. c©2020 School of Mathematics and Statistics, UNSW Sydney 82 CHAPTER 7. LINEAR TRANSFORMATIONS Proposition 1. If T : V →W is a linear map, then 1. T (0) = 0 and 2. T (−v) = −T (v) for all v ∈ V . An informal way of stating these results is that a linear map always: 1. transforms the zero vector in the domain into the zero vector in the codomain, and 2. transforms the negative of a vector v in the domain into the negative of the corresponding function value T (v) in the codomain. Proof. (1). Since V is a vector space, we have from Proposition 2 of Section 6.2 that 0v = 0 for all v ∈ V . Thus, T (0) = T (0v) = 0T (v) = 0, where we have first used the scalar multiplication condition of a linear map and then applied Proposition 2 of Section 6.2 to the vector T (v) ∈W . (2). Since V is a vector space, we have from Proposition 2 of Section 6.2 that −v = (−1)v for all v ∈ V . Hence, T (−v) = T ((−1)v) = (−1)T (v) = −T (v), where we have again used the scalar multiplication condition of a linear map, and then Proposition 2 of Section 6.2 applied to the vector T (v) ∈W . Proposition 1 may often be used to provide a quick proof that some given function is not linear. Example 3. Show that the function T : R2 → R defined by T ( x1 x2 ) = 4x1 + 3(x2 − 6) is not linear. Solution. T ( 0 0 ) = −18 6= 0, and hence T is not linear. ♦ Example 4. Show that the function T : R→ R defined by T (x) = x2 is not linear. Solution. T (3) = 9, but T (6) = 36 6= 2× 9. Hence T is not linear. ♦ To prove that a given map is not linear, it is easiest to provide a specific example that contravenes one of the conditions. One should also check first that the given map, takes the zero vector to the zero vector. WARNING. The converses of the two results in Proposition 1 are not true in general as shown in the following example. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.1. INTRODUCTION TO LINEAR MAPS 83 Example 5. The function T (x) = x3 satisfies both T (−x) = −T (x) and T (0) = 0. However, it is not a linear map by the following counterex- ample. For x = 1, T (2× 1) = 8 6= 2(1)3 = 2T (1). ♦ x T (x) 0 Figure 2. The two conditions in the definition of a linear map are closely related to the two fundamental vector operations of addition and scalar multiplication. We therefore expect there to be a close relationship between linear combinations and linear maps. This relationship is given in Theorems 2 and 3 below. Theorem 2. A function T : V →W is a linear map if and only if for all λ1, λ2 ∈ F and v1,v2 ∈ V T (λ1v1 + λ2v2) = λ1T (v1) + λ2T (v2). (#) Proof. Let T be a linear function. Then, T (λ1v1 + λ2v2) = T (λ1v1) + T (λ2v2) (from the addition condition) = λ1T (v1) + λ2T (v2) (using the scalar multiplication condition twice), and hence (#) is satisfied. Conversely, suppose (#) is satisfied. Then, for λ1 = λ2 = 1, the condition the (#) becomes the addition condition, while for λ2 = 0 the condition reduces to the scalar multiplication condition. The proof is complete. Theorem 2 can be used to simplify the test for linearity, since it means that only one condition must be checked instead of the two separate conditions of the original definition. Example 6. Show that the function T : R2 → R3, defined by T (x) = 3x1 − x24x2 5x1 + 6x2 for x = (x1 x2 ) ∈ R2, is a linear map. Solution. For x,x′ ∈ R2 and λ, λ′ ∈ R, we have λx+ λ′x′ = ( λx1 + λ ′x′1 λx2 + λ ′x′2 ) ∈ R2, c©2020 School of Mathematics and Statistics, UNSW Sydney 84 CHAPTER 7. LINEAR TRANSFORMATIONS and hence T (λx+ λ′x′) = 3(λx1 + λ′x′1)− (λx2 + λ′x′2)4(λx2 + λ′x′2) 5(λx1 + λ ′x′1) + 6(λx2 + λ ′x′2) = λ 3x1 − x24x2 5x1 + 6x2 + λ′ 3x′1 − x′24x′2 5x′1 + 6x ′ 2 = λT (x) + λ′T (x′). Thus, from Theorem 2, T is a linear map. ♦ A generalisation of Theorem 2 is also of considerable importance in the theory and applications of linear maps. Theorem 3. If T is a linear map with domain V and S is a set of vectors in V , then the function value of a linear combination of S is equal to the corresponding linear combination of the function values of S, that is, if S = {v1, . . . ,vn} and λ1,. . .,λn are scalars, then T (λ1v1 + · · ·+ λnvn) = λ1T (v1) + · · · + λnT (vn). Proof. This, left as an exercise (see question 5), is based on an easy inductive argument. Theorem 3 has many uses. Some examples of its use are as follows. Example 7. Let T : R3 → R2 be a linear map with values T 10 0 = (3 7 ) T 01 0 = (−5 6 ) , T 00 1 = (−2 8 ) . Find the function value at x = x1x2 x3 . Solution. We have x = x1x2 x3 = x1 10 0 + x2 01 0 + x3 00 1 . From Theorem 3, the function value at x is T x1x2 x3 = x1T 10 0 + x2T 01 0 + x3T 00 1 = x1 ( 3 7 ) + x2 (−5 6 ) + x3 (−2 8 ) = ( 3x1 − 5x2 − 2x3 7x1 + 6x2 + 8x3 ) . ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 7.2. LINEAR MAPS FROM Rn TO Rm AND m× n MATRICES 85 Example 8. Show that the function T : R3 → R2 with function values, T 10 0 = (3 7 ) , T 01 0 = (−5 6 ) , T 00 1 = (−2 8 ) , and T 11 1 = (−4 20 ) , is not a linear map. Solution. We have 11 1 = 10 0 + 01 0 + 00 1 . Hence, if T is a linear map, we have from Theorem 3 that T 11 1 = T 10 0 + T 01 0 + T 00 1 = ( 3 7 ) + (−5 6 ) + (−2 8 ) = (−4 21 ) . But, T 11 1 = (−4 20 ) 6= (−4 21 ) , and hence T is not a linear map. ♦ Examples 7 and 8 are actually special cases of the following extremely important result. Theorem 4. For a linear map T : V →W , the function values for every vector in the domain are known if and only if the function values for a basis of the domain are known. Further, if B = {v1, . . . ,vn} is a basis for the domain V then for all v ∈ V we have T (v) = x1T (v1) + · · ·+ xnT (vn), where x1, . . . , xn are the scalars in the unique linear combination v = x1v1+ · · ·+xnvn of the basis B. Proof. It follows from Theorem 3 that T (v) = x1T (v1) + · · ·+ xnT (vn). The theorem follows immediately. 7.2 Linear maps from Rn to Rm and m× n matrices If you look at the examples of functions with domain Rn and codomain Rm given in the previous section, you will see that if T : Rn −→ Rm and x = x1... xn ∈ Rn, then we can write T (x) as Ax where A is an m× n matrix. In this section we are going to show that every matrix A represents a linear map and conversely that every linear map with domain Rn and codomain Rm can be c©2020 School of Mathematics and Statistics, UNSW Sydney 86 CHAPTER 7. LINEAR TRANSFORMATIONS represented by a matrix. Because of the close relation between T and the corresponding A and we prefer to write Ax for T (x) instead of xA, the vector x must be a column vector. We begin with the following theorem. Theorem 1. For each m× n matrix A, the function TA : Rn → Rm, defined by TA(x) = Ax for x ∈ Rn, is a linear map. Proof. We check the addition and scalar multiplication conditions, using the properties A(x+ x′) = Ax+Ax′ and A(λx) = λAx from Chapter 5. Addition Condition. For all x,x′ ∈ Rn, we have TA(x+ x ′) = A(x+ x′) = Ax+Ax′ = TA(x) + TA(x′). Scalar Multiplication Condition. For all λ ∈ R and x ∈ Rn, we have TA(λx) = A(λx) = λ(Ax) = λTA(x). Thus, since both the addition and scalar multiplication conditions are satisfied, TA(x) = Ax is a linear map. The matrix equation Ax = y therefore has the interpretation that y = TA(x) = Ax is the function value of TA at the point x, or, for linear equations, the vector y may be regarded as the function value of the vector x. Example 1. Find a linear map TA such that TA(x) = Ax for the matrix A = 3 4−1 0 −5 6 . Solution. Since A has 3 rows and 2 columns, the domain is R2, the codomain is R3, and the map TA : R 2 → R3 is given by TA ( x1 x2 ) = Ax = 3x1 + 4x2−x1 −5x1 + 6x2 . ♦ Theorem 1 and Example 1 show that a matrix can be used to define a linear map. We shall now show that every linear map with domain Rn and codomain Rm can be represented by an m×n matrix with real entries. The basic theorem which establishes this result is the following. Theorem 2 (Matrix Representation Theorem). Let T : Rn → Rm be a linear map and let the vectors ej for 1 6 j 6 n be the standard basis vectors for R n. Then the m × n matrix A whose columns are given by aj = T (ej) for 1 6 j 6 n has the property that T (x) = Ax for all x ∈ Rn. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.2. LINEAR MAPS FROM Rn TO Rm AND m× n MATRICES 87 Proof. Every vector x ∈ Rn can be written as a unique linear combination of the standard basis vectors, that is, x = x1... xn = x1e1 + · · ·+ xnen. Then, from Theorem 3 of Section 7.1, we have T (x) = T (x1e1 + · · ·+ xnen) = x1T (e1) + · · ·+ xnT (en) = x1a1 + · · · + xnan, where aj = T (ej). Now aj ∈ Rm for 1 6 j 6 n, and hence from Proposition 3 of Section 6.4 the linear combination can be rewritten in the matrix form Ax, where A is the matrix with the aj as its columns. Thus, T (x) = Ax and the proof is complete. The Representation Theorem can be used to construct a matrix for any given linear map with domain Rn and codomain Rm, or more generally from Fn to Fm for any field F. Example 2. Find a matrix A such that T (x) = Ax for the linear map T : R3 → R2 defined by T x1x2 x3 = (3x1 − 5x2 + 6x3 5x2 + 31x3 ) . [Notice that we are using columns.] Solution. The first column of the matrix A is the vector given by T (e1) = T 10 0 = (3 0 ) , the second column is given by T (e2) = T 01 0 = (−5 5 ) , and the third column is given by T (e3) = T 00 1 = ( 6 31 ) . Thus, the matrix A is A = ( 3 −5 6 0 5 31 ) . ♦ An alternative method, which is often simpler, of writing down the matrix for a given linear function is shown in the following example. c©2020 School of Mathematics and Statistics, UNSW Sydney 88 CHAPTER 7. LINEAR TRANSFORMATIONS Example 3. Find a matrix A such that T (x) = Ax for the linear map T : R4 → R3 defined by TA x1 x2 x3 x4 = 2x1 − 3x2 + 4x3 − 5x4−2x1 + 3x4 x1 − 5x2 + 6x3 − 8x4 . Solution. As usual for a linear map with domain Rn and codomain Rm (here n = 4 and m = 3), the components of the function value look like the left hand side of a system of linear equations. This system of equations is given by T x1 x2 x3 x4 = 2x1 − 3x2 + 4x3 − 5x4−2x1 + 3x4 x1 − 5x2 + 6x3 − 8x4 = 2 −3 4 −5−2 0 0 3 1 −5 6 −8 x1 x2 x3 x4 Then the coefficient matrix, namely A = 2 −3 4 −5−2 0 0 3 1 −5 6 −8 , has the required property that T (x) = Ax for all x ∈ R4. ♦ In this section, we have shown that a matrix always defines a linear map and that a linear map between the vector spaces Rn and Rm can always be represented by a matrix. This result can easily be generalised to linear maps between any two finite–dimensional vector spaces. This theorem is of fundamental importance in both the mathematical theory of linear maps and in applying the ideas of linear maps to practical problems. See Section 7.6. 7.3 Geometric examples of linear transformations In this section we shall examine some of the geometric mappings which can be represented by linear maps and matrices. These mappings include stretching and compression, reflections, rotations, projections, and the dot and cross products with a fixed vector. We shall begin by looking at geometric interpretations which can be given to simple types of matrices. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.3. GEOMETRIC EXAMPLES OF LINEAR TRANSFORMATIONS 89 Example 1 (Reflection in R2). The simplest examples of reflections in R2 are reflections in one of the coordinate axes. An example of a reflection in the x1-axis is shown in Figure 3. Note that the reflection of the point with the position vector x = ( x1 x2 ) is the point represented by the position vector x′ = ( x1 −x2 ) . This reflection can be represented by the 2× 2 diagonal matrix with a negative diagonal entry given by A = ( 1 0 0 −1 ) , since Ax = ( 1 0 0 −1 )( x1 x2 ) = ( x1 −x2 ) = x′. Note that we know that this reflection is a linear map since we have found a matrix that describes the effect of the reflection. ♦ x1 x2 b b x = ( x1 x2 ) x′ = ( x1 −x2 ) 0 Figure 3: A reflection in the x1-axis. Note that a linear transformation from Rn to Rm will map the position vector of a point in an n-dimensional space to the position vector of a point in an m-dimensional space. The following proposition tells us a linear transformation will map a line to a line or a point. Proposition 1. Suppose that T : Rn −→ Rm is a linear map. It maps a line in Rn to either a line or a point in Rm. Proof. In Chapter 1, a line in Rn through a point represented by a parallel to v 6= 0 is defined to be the set {x ∈ Rn : x = a+ λv for some λ ∈ R} . By Theorem 2 in Section 7.1, T (a+ λv) = T (a) + λT (v). Hence T maps the line to the following subset of Rm. {y ∈ Rm : y = T (a) + λT (v) for some λ ∈ R} . This set is a line when T (v) 6= 0 and it contain a single vector T (a) otherwise. REMARK. Using similar argument, we can show that T maps a line segment with end points of position vectors a and b to a line segment with end points of position vectors T (a) and T (b). Example 2 (Stretching and compression in R2). Let A be a 2 × 2 diagonal matrix with positive diagonal entries, that is, a matrix of the form A = ( λ1 0 0 λ2 ) with λ1 > 0, λ2 > 0. c©2020 School of Mathematics and Statistics, UNSW Sydney 90 CHAPTER 7. LINEAR TRANSFORMATIONS Then the function value of x = ( x1 x2 ) is y = TA(x) = Ax, where y = ( y1 y2 ) = ( λ1 0 0 λ2 )( x1 x2 ) = ( λ1x1 λ2x2 ) . Thus, b1 = λ1x1 and b2 = λ2x2, and hence the effect of the matrix is simply to multiply the first component x1 by the scalar λ1 and the second component x2 by the scalar λ2. Note that the first standard basis vector e1 = ( 1 0 ) is transformed into ( λ1 0 ) = λ1e1, that is, its direction remains the same but it is either stretched (if λ1 > 1) or compressed (if λ1 < 1). Similarly, the second standard basis vector e2 = ( 0 1 ) is transformed into ( 0 λ2 ) = λ2e2 with a resulting stretching if λ2 > 1 or compression if λ2 < 1. Figure 4(a) shows a picture of a 5-point star with vertices A(1, 5), B(4, 3), C(3,−1), D(−1,−1) and E(−2, 3). Suppose X is the point (1, 3) on the line segment BE. Hence the position vectors of these points are respectively a = ( 1 5 ) , b = ( 4 3 ) , c = ( 3 −1 ) , d = (−1 −1 ) , e = (−2 3 ) and x = ( 1 3 ) . b A B CD E X x1 x2 Figure 4(a): A 5-point star. When λ1 = λ2 = 2, the matrix A is ( 2 0 0 2 ) . The points in Figure 4(a) will be “transformed” to A′, B′, C ′, D′, E′ and X ′ according to Aa = ( 2 0 0 2 )( 1 5 ) = ( 2 10 ) , and similarly, Ab = ( 8 6 ) , Ac = ( 6 −2 ) , Ad = (−2 −2 ) , Ae = (−4 6 ) and Ax = ( 2 6 ) . By Theorem 1 and the remark after it, the line segment AB will be transformed to A′B′ and so on. The star will be transformed to one shown in Figure 4(b). c©2020 School of Mathematics and Statistics, UNSW Sydney 7.3. GEOMETRIC EXAMPLES OF LINEAR TRANSFORMATIONS 91 When λ1 = λ2 = 0.5, the matrix A is ( 0.5 0 0 0.5 ) . The points in Figure 4(a) will be “trans- formed” according to Aa = ( 0.5 2.5 ) , Ab = ( 2 1.5 ) , Ac = ( 1.5 −0.5 ) , Ad = (−0.5 −0.5 ) , Ae = (−1 1.5 ) and Ax = ( 0.5 1.5 ) . The star will be transformed to one shown in Figure 4(c). b A′ B′ C ′D′ E′ X ′ x1 x2 Figure 4(b): Image under A = ( 2 0 0 2 ) . b A′ B′ C ′D′ E′ X ′ x1 x2 Figure 4(c): Image under A = ( 0.5 0 0 0.5 ) . Figure 4(d) shows the image of the 5-point star when A = ( 2 0 0 1 ) . The star is stretched to twice the width horizontally. b A′ B′ C ′D′ E′ X ′ x1 x2 Figure 4(d): Image under A = ( 2 0 0 1 ) . ♦ Example 3 (Rotation in a plane). Suppose that X is an arbitrary point in a plane and then X is rotated about the origin O anticlockwise by an angle α to a new position X ′. Let x and x′ be the position vectors of the points X and X ′ respectively. Show that the function Rα : R2 −→ R2 such that Rα(x) = x ′ is a linear transformation. Find the matrix Aα such that Aαx = x′. c©2020 School of Mathematics and Statistics, UNSW Sydney 92 CHAPTER 7. LINEAR TRANSFORMATIONS Solution. We know that Rα will be linear if, for all vectors a, b ∈ R2 and scalar λ ∈ R, the following two conditions are satisfied. Addition condition Rα(a+ b) = Rα(a) +Rα(b) Scalar multiplication condition Rα(λa) = λRα(a). We can see the addition condition from Figure 5. The vector formed adding a and b first then rotating the sum a+ b is the same as the one formed by first rotating a and b then adding these rotated vectors. a b 0 Add a and b−−−−−−−−−−→ first a b 0 a+ b Rotate a and b first Rotate a+ b Rα(a) Rα(b) 0 Add Rα(a) and−−−−−−−−−−−−→ Rα(b) 0 Rα(a) +Rα(b) Figure 5: The geometry of the addition condition for rotations. You should attempt to draw a picture to illustrate the scalar multiplication condition. In any case, since both the addition condition and the scalar multiplication condition hold, Rα is a linear transformation. By Theorem 2 in Section 7.2, the columns of the matrix Aα are Rα(e1) and Rα(e2). From c©2020 School of Mathematics and Statistics, UNSW Sydney 7.3. GEOMETRIC EXAMPLES OF LINEAR TRANSFORMATIONS 93 Figure 6 and the fact that the length of both Rα(e1) and Rα(e2) are 1, we have Rα(e1) = ( cosα sinα ) and Rα(e2) = (− sinα cosα ) . P M Q N O Rα(e1) = −−→ OP Rα(e2) = −−→ OQ OM = ON = cosα PM = QN = sinα αα Figure 6. The matrix Aα = ( cosα − sinα sinα cosα ) , is called the rotation matrix for angle α. ♦ Example 4 (Projections). The projection of a vector x ∈ Rn on a fixed, non-zero vector b ∈ Rn is given by projbx = x · b |b|2 b. Show that the function T : Rn → Rn defined by T (x) = projbx for x ∈ Rn. is a linear map. Solution. Clearly, the domain and codomain are vector spaces. Instead of proving that T is linear by geometric properties, we use the algebraic properties of the dot product. For all x,x′ ∈ Rn T (x+ x′) = projb(x+ x ′) = (x+ x′) · b |b|2 b = x · b+ x′ · b |b|2 b = T (x) + T (x ′), and hence the addition condition is satisfied. Finally, for all x ∈ Rn and λ ∈ R, T (λx) = (λx) · b |b|2 b = λ ( x · b |b|2 b ) = λT (x). Thus, the scalar multiplication condition is also satisfied, and therefore T is a linear map. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 94 CHAPTER 7. LINEAR TRANSFORMATIONS Example 5 (Dot Product). Let b be a fixed vector in Rn. Show that the function T : Rn → R, defined by T (x) = b · x for x ∈ Rn, is a linear map. The proof that T is a linear map is similar to the previous example and is left as an exercise. ♦ The following examples show the importance of the linear maps defined by dot product. Example 6. From Example 5, for each 1 6 i 6 n the function Pi : R n −→ R is defined by Pi(x) = ei · x for x ∈ Rn, where ei is the ith standard basis element. It is not difficult to see that if x = x1... xn , the value Pi(x) is simply xi, the ith component of x. ♦ This example can be generalised to any basis of Rn. Example 7. Suppose that {v1, v2, . . . , vn} is an orthonormal basis for Rn and 1 6 i 6 n. The function Pi : R n −→ R is defined by Pi(x) = vi · x for x ∈ Rn. By the argument used in Example 6 in Section 6.6 we can prove that if x = λ1v1+ · · ·+ λnvn, the value Pi(x) is simply λi, the coefficient of vi in the unique way of writing x as a linear combination of the basis vectors. ♦ 7.4 Subspaces associated with linear maps There are two important subspaces associated with a linear map. These subspaces are called the kernel (or null space, which is the name Maple uses) of the linear map and the image (or range) of the map. Informally, the kernel is the set of zeroes of the function, and the image is the set of all function values. This is shown diagrammatically in Figure 7. You should of course not take this picture too literally — all the sets drawn as discs are vector spaces! c©2020 School of Mathematics and Statistics, UNSW Sydney 7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 95 b bO O ker(T ) im(T ) T V W Figure 7: Kernel and image. 7.4.1 The kernel of a map You will be familiar with the fact that one of the important properties of functions (for example of quadratics or polynomials) is the values of their zeroes. The set of zeroes of a linear map is also of importance. Definition 1. Let T : V → W be a linear map. Then the kernel of T (written ker(T )) is the set of all zeroes of T , that is, it is the subset of the domain V defined by ker(T ) = {v ∈ V : T (v) = 0}. Example 1. Showing that a vector v is in the kernel of a linear map T is simply a verification that T (v) = 0. In particular, 0 ∈ ker(T ) for any linear map T , since T (0) = 0. ♦ Example 2 (Dot Product). In Example 5 of Section 7.3, we showed that the function T : Rn → R defined by T (x) = b · x for x ∈ Rn is a linear map. The kernel of T is ker(T ) = {x ∈ Rn : b · x = 0} , that is, ker(T ) is the set of vectors which are orthogonal to the given fixed vector b. For the special case that x ∈ R3, the equation b ·x = 0 is the point-normal form of the equation of a plane in R3, and hence ker(T ) corresponds to the points on a plane with normal b which passes through the origin. ♦ For the important special case of a linear map TA : R n → Rm associated with an m× n matrix A, the kernel has a simple interpretation. For matrices, the definition of kernel becomes: c©2020 School of Mathematics and Statistics, UNSW Sydney 96 CHAPTER 7. LINEAR TRANSFORMATIONS Definition 2. For an m×n matrix A, the kernel of A is the subset of Rn defined by ker(A) = {x ∈ Rn : Ax = 0} , that is, it is the set of all solutions of the homogeneous equation Ax = 0. Example 3. Suppose that A = ( 1 2 3 6 ) and x = ( 2 −1 ) . Since Ax = ( 1 2 3 6 )( 2 −1 ) = ( 1× 2 + 2× (−1) 3× 2 + 6× (−1) ) = ( 0 0 ) , which means that x ∈ ker(A) by definition. To find the kernel of a matrix A, we need to find the solution set of the equation Ax = 0. Example 4. Find the kernel of the matrix A, where A = 1 4 2 73 6 0 15 2 −4 −8 2 . Solution. The kernel is the set of all solutions of the homogeneous system of equations Ax = 0. An equivalent row-echelon form U for A is U = 1 4 2 70 −6 −6 −6 0 0 0 0 . We then set parameters to the variables of the non-leading columns — x3 = λ1 and x4 = λ2. By back substitution, we obtain the solution of Ax = 0 as x = x1 x2 x3 x4 = λ1 2 −1 1 0 + λ2 −3 −1 0 1 , and hence, ker(A) = x ∈ R 4 : x = λ1 2 −1 1 0 + λ2 −3 −1 0 1 for λ1, λ2 ∈ R . In this example, the kernel can be interpreted geometrically as a plane in R4 through the origin parallel to 2 −1 1 0 and −3 −1 0 1 . ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 97 A very important property of the kernel of a linear map is given in the following theorem. Theorem 1. If T : V → W is a linear map, then ker(T ) is a subspace of the domain V . Proof. We use the Subspace Theorem (Theorem 1) of Section 6.3 and prove that ker(T ) is a non-empty subset of V which is closed under addition and scalar multiplication. It is not the empty set, since T (0) = 0 and so 0 ∈ ker(T ). Suppose that v,v′ ∈ ker(T ) and λ ∈ F. Since T is linear, it satisfies the addition and scalar multiplication conditions, so we have T (v + v′) = T (v) + T (v′) = 0 and T (λv) = λT (v) = 0, and hence both v + v′ and λv are in ker(T ). Thus, ker(T ) is closed under addition and scalar multiplication, and the proof is complete. The dimension of the kernel is important and is given a special name. Definition 3. The nullity of a linear map T is the dimension of ker(T ). The nullity of a matrix A is the dimension of ker(A). Proposition 2. Let A be an m × n matrix with real entries and TA : Rn → Rm the associated linear transformation. Then ker(TA) = ker(A) Proof. TA(x) = 0⇔ Ax = 0. The nullity of a matrix A can be easily obtained from the properties of row-echelon forms by using the following result. Proposition 3. For a matrix A: nullity(A) = maximum number of independent vectors in the solution space of Ax = 0 = number of parameters in the solution of Ax = 0 obtained by Gaussian elimination and back substitution = number of non-leading columns in an equivalent row-echelon form U for A. Although a general proof of this proposition is not difficult to construct, we shall restrict ourselves to looking at an example. Example 4 (continued). Find nullity(A), and a basis for ker(A), for A = 1 4 2 73 6 0 15 2 −4 −8 2 . c©2020 School of Mathematics and Statistics, UNSW Sydney 98 CHAPTER 7. LINEAR TRANSFORMATIONS Solution. We have found that any vector x ∈ ker(A) can be written as x = λ1 2 −1 1 0 + λ2 −3 −1 0 1 . This is a linear combination of the two vectors in the parametric vector form for the solution of Ax = 0. These two vectors are linearly independent, since if x = 0 then the parameters λ1 and λ2 are both zero (look at the third and fourth rows of the linear combination). Thus, we obtain a basis for ker(A), 2 −1 1 0 , −3 −1 0 1 and hence nullity(A) = dim(ker(A)) = 2. This illustrates Proposition 3. ♦ For matrices, there is a close relationship between linear independence of the columns and the nullity of the matrix. Proposition 4. The columns of a matrix A are linearly independent if and only if nullity(A) = 0. Proof. From Proposition 1 of Section 6.5, the columns of A are linearly independent if and only if x = 0 is the only solution of Ax = 0. That is, if and only if 0 is the only element of ker(A), in which case, nullity(A) = dim(ker(A)) = 0. 7.4.2 Image The range or image of a function is the set of all function values (see, for example, Appendix 7.10). In this course we will usually use the term image rather than range. A formal definition of the image of a linear map is as follows. Definition 4. Let T : V → W be a linear map. Then the image of T is the set of all function values of T , that is, it is the subset of the codomain W defined by im(T ) = {w ∈W : w = T (v) for some v ∈ V }. For the special case of a linear map associated with a real m×n matrix, the definition becomes: Definition 5. The image of an m× n matrix A is the subset of Rm defined by im(A) = {b ∈ Rm : b = Ax for some x ∈ Rn} . We have met this set several times before. In the language of linear equations, this set im(A) is just the set of all right-hand-side vectors b for which the equation Ax = b has a solution, and in c©2020 School of Mathematics and Statistics, UNSW Sydney 7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 99 vector-space language it is just the span of the columns of the matrix A, that is, the column space of A. Thus, we have range(A) = im(A) = col(A) = span (columns of A) = {b ∈ Rm : Ax = b has a solution} . These connections mean that any questions about the image of a matrix can be solved by the methods previously given for linear equations and spans in Chapter 6. It is useful to give an example, as it will serve to review some of the previous results on vector spaces and linear equations. Example 4 (continued). Find conditions on a vector b for b to be in im(A) where A = 1 4 2 73 6 0 15 2 −4 −8 2 . Solution. We look for conditions on b = b1b2 b3 ∈ R3 for Ax = b to have a solution. For hand calculations on a small system of equations, the simplest method of solution is as follows. Instead of put b1, b2, b3 on the right hand side, the three right-hand columns of the following augmented matrix are coefficients of b1, b2, b3. (A|b) = 1 4 2 7 1 0 03 6 0 15 0 1 0 2 −4 −8 2 0 0 1 . On reduction to row-echelon form using Gaussian elimination, we find (U |y) = 1 4 2 7 1 0 00 −6 −6 −6 −3 1 0 0 0 0 0 4 −2 1 . This system of equations has a solution if and only if the components of b satisfy 4b1 − 2b2 + b3 = 0. In this case, im(A) has a geometric interpretation as a plane through the origin in R3 with normal 4−2 1 . In vector-space language, im(A) is a two-dimensional subspace of R3. ♦ Note that for larger matrices it is preferable to use computer packages such as Maple to solve the equations. An extremely important property of the image of a linear map is as follows. Theorem 5. Let T : V → W be a linear map between vector spaces V and W . Then im(T ) is a subspace of the codomain W of T . c©2020 School of Mathematics and Statistics, UNSW Sydney 100 CHAPTER 7. LINEAR TRANSFORMATIONS Proof. We use the Subspace Theorem (Theorem 1 of Section 6.3) and show that im(T ) is a non-empty subset of W which is closed under vector addition and scalar multiplication. We note first that im(T ) is a subset of W . Since, from Proposition 1 of Section 7.1, T (0) = 0, we see that 0 ∈ im(T ). Closure under addition. If w,w′ ∈ im(T ), then w = T (v) for some v ∈ V and w′ = T (v′) for some v′ ∈ V, and hence, w +w′ = T (v) + T (v′) = T (v + v′). But, since V is a vector space, v + v′ ∈ V , and therefore w + w′ ∈ im(T ) and im(T ) is closed under addition. Closure under scalar multiplication. If w ∈ im(T ) and λ ∈ F , then w = T (v) for some v ∈ V, and hence λw = λT (v) = T (λv). But, since V is a vector space, λv ∈ V , and therefore λw ∈ im(T ) and im(T ) is closed under scalar multiplication. The proof is complete. The next result is obvious. Proposition 6. Let A be an m × n matrix with real entries and TA : Rn → Rm the associated linear transformation. Then im(A) = im(TA) We have shown in Theorem 5 that im(T ) is always a subspace of the codomain of T . Thus, the fundamental vector-space properties of basis and dimension must apply to im(T ). The dimension of the image is very important and it has therefore been given a special name. Definition 6. The rank of a linear map T is the dimension of im(T ). The rank of a matrix A is the dimension of im(A). The rank is usually regarded as one of the most important properties of a matrix, since it is the maximum number of linearly independent right-hand-side vectors for which a solution to Ax = b can be found. Some important properties of the rank of a matrix are summarised in the following proposition. Proposition 7. For a matrix A: rank(A) = maximal number of linearly independent columns of A = number of leading columns in a row-echelon form U for A c©2020 School of Mathematics and Statistics, UNSW Sydney 7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 101 Proof. From before, im(A) = col(A) = span (columns of A). A basis for span(columns of A) is a maximal set of linearly independent columns of A. One maximal set of linearly independent columns of A are the columns which reduce to leading columns in a row-echelon form U . Hence, number of leading columns = number of linearly independent columns of A = number of vectors in basis for col(A) = dim(col(A)) = dim(im(A)) = rank(A). Example 4 (continued). Find rank(A), and a basis for im(A), for A = 1 4 2 73 6 0 15 2 −4 −8 2 . Solution. Since there are two leading columns (1 and 2) in the row-echelon form U = 1 4 2 70 −6 −6 −6 0 0 0 0 , hence rank(A) = 2. A basis for im(A) therefore contains two vectors. One maximal set of linearly independent columns of A is columns 1 and 2 of A, and hence a basis for im(A) is 13 2 , 46 −4 . ♦ 7.4.3 Rank, nullity and solutions of Ax = b Example 4 illustrates the following important fact about the rank and nullity of a matrix. Theorem 8 (Rank-Nullity Theorem for Matrices). For any matrix A, rank(A) + nullity(A) = number of columns of A. Proof. Let U be an equivalent row-echelon form for A obtained by the Gaussian elimination algorithm. Then, from Proposition 7, rank(A) = number of leading columns in U . Also, from Proposition 3, nullity(A) = number of non-leading columns in U . But, of course, number of leading columns + number of non-leading columns = total number of columns in U = total number of columns in A, and the result is proved. The above theorem is equivalent to the following result for linear maps between finite dimen- sional vector spaces. Theorem 9 (Rank-Nullity Theorem). Suppose V and W are finite dimensional vector spaces and T : V →W is linear. Then rank(T ) + nullity(T ) = dim(V ). c©2020 School of Mathematics and Statistics, UNSW Sydney 102 CHAPTER 7. LINEAR TRANSFORMATIONS A proof of Theorem 9, using a suitably constructed basis of V , is given in Section 7.9. For matrices, a very common use of rank and nullity is to classify the types of solution of a system of linear equations Ax = b. The basic results are summarised in the following proposition. Theorem 10. The equation Ax = b has: 1. no solution if rank(A) 6= rank([A|b]), and 2. at least one solution if rank(A) = rank([A|b]). Further, i) if nullity(A) = 0 the solution is unique, whereas, ii) if nullity(A) = ν > 0, then the general solution is of the form x = xp + λ1k1 + · · ·+ λνkν for λ1, . . . , λν ∈ R, where xp is any solution of Ax = b, and where {k1, . . . ,kν} is a basis for ker(A). Proof. Let U and (U |y) be equivalent row-echelon forms for A and (A|b) obtained by the Gaus- sian elimination algorithm. Now, from Chapter 4, we know that Ax = b has a solution if and only if the right-hand-side column y is a non-leading column, and hence if and only if the numbers of leading columns in U and [U |y] are equal. But, from Proposition 7, rank(A) = number of leading columns in U , and rank([A|b]) = number of leading columns in (U |y), and thus Ax = b has a solution if and only if the ranks of A and (A|b) are equal. The proof of parts 2(i) and 2(ii) follows immediately from the relation between solutions of a non-homogeneous system and the corresponding homogeneous system (see Chapter 4) and the fact that nullity(A) is equal to the maximum number of linearly independent solutions of the homogeneous equation Av = 0. Note. A similar type of solution to the general solution in 2(ii) above is also obtained as the solution of a linear differential equation. In this differential equation case, xp is called a “particular solution” and the parametric terms are called the “complementary function”. The similarity between the two types of solution is due to the fact that both the matrix and differential equation problems involve linear functions. Example 5. Illustrate the above rules with the system of equations given by the augmented matrix (A|b) = 0 0 2 −1 3 11 −2 12 3 4 −1 3 2 1 4 6 0 . Solution. Gaussian elimination gives (U |y) = 1 −2 1 2 3 4 −1 0 8 −12 −5 −6 3 0 0 2 −1 3 1 . The system has a solution as y is a non-leading column. The number of leading columns in U is 3, and hence rank(A) = 3. Similarly, rank(A|b) = 3. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.5. FURTHER APPLICATIONS AND EXAMPLES OF LINEAR MAPS 103 On back substitution, the parametric vector form of the solution is found to be x = − 716 13 32 1 2 0 0 + λ1 −3116 21 32 1 2 1 0 + λ2 −3116 21 32 3 2 0 1 = xp + λ1k1 + λ2k2. Note that A − 716 13 32 1 2 0 0 = 1−1 0 ; A −3116 21 32 1 2 1 0 = 00 0 ; A −3116 21 32 3 2 0 1 = 00 0 , and that the number of non-leading columns of U = 2 = nullity(A) = the number of parameters in solution. This parametric vector form is the general solution of Ax = b, and as expected it is the sum of a “particular solution” of Ax = b and a “complementary function” which is a linear combination of two linearly independent solutions of Ax = 0 (a basis for ker(A)). ♦ 7.5 Further applications and examples of linear maps Although the theory that we have developed so far in this chapter applies to all linear maps, most examples have been restricted to maps for which the domain is Rn and the codomain is Rm. In this section we will give some examples of linear maps in which the domain and codomain are other kinds of vector spaces. A simple, but useful, map in any vector space is the map which takes a vector to itself. Example 1. The identity map idV : V → V on a vector space V is defined by idV (v) = v for all v ∈ V. This map is linear, since for all v,v′ ∈ V and all scalar λ, idV (v + v ′) = v + v′ = idV (v) + idV (v′) and idV (λv) = λv = λ idV (v). ♦ In Theorems 2 and 3 of Section 7.1 we showed that linear maps had the important property that they preserved linear combinations. As the next example shows, every linear combination can also be regarded as the image of a linear map. c©2020 School of Mathematics and Statistics, UNSW Sydney 104 CHAPTER 7. LINEAR TRANSFORMATIONS [H] Example 2. Let S = {v1, . . . ,vn} be a subset of a vector space V and let x1, . . . , xn ∈ R. Show that the map T given by T (x) = x1v1 + · · ·+ xnvn where x = x1... xn ∈ Rn is a linear map. Solution. The rule obviously defines a function, since T (x) is uniquely determined for each x ∈ Rn. To prove that T is linear we use Theorem 2 of Section 7.1. Suppose x,x′ ∈ Rn and λ, λ′ ∈ R. Then λx+ λ′x′ = (λx1 + λ′x′1, . . . , λxn + λ ′x′n), and hence T (λx+ λ′x′) = (λx1 + λ′x′1)v1 + · · ·+ (λxn + λ′x′n)vn = λ(x1v1 + · · ·+ xnvn) + λ′(x′1v1 + · · · + x′nvn) = λT (x) + λ′T (x′). Thus T is a linear map. ♦ This example shows that all properties of linear combinations discussed in Chapter 6 can in fact be restated in the language of linear maps. Example 3. Let V be a vector space over the real numbers, and let B = {v1, . . . , vn} be an ordered basis for V . For any v ∈ V , we can write the vector uniquely as a linear combination of B, v = x1v1 + · · ·+ xnvn. Show that the rule T : V → Rn defined by T (v) = x1... xn for v ∈ V is a linear map. Solution. Obviously the function T has domain V and codomain Rn which are vector spaces. To prove that this function is a linear map, we check the addition and scalar multiplication conditions. For all λ ∈ R and v,v′ ∈ V , we can write in a unique way that v = x1v1 + · · ·+ xnvn and v′ = x′1v1 + · · ·+ x′nvn. Since v + v′ = (x1 + x′1)v1 + · · ·+ (xn + x′n)vn and λv = (λx1)v1 + · · ·+ (λxn)vn c©2020 School of Mathematics and Statistics, UNSW Sydney 7.5. FURTHER APPLICATIONS AND EXAMPLES OF LINEAR MAPS 105 we have T (v + v′) = x1 + x ′ 1 ... xn + x ′ n = x1... xn + x ′ 1 ... x′n = T (v) + T (v′) and T (λv) = λx1... λxn = λ x1... xn = λT (v), and hence T is a linear map. ♦ [X] REMARK: The above example simply says that the function which maps a vector to its coor- dinate vector with respect to a basis is linear. We shall now give some examples of linear maps associated with the vector spaces of polynomials and real-valued functions. Example 4. Show that the function T : C3 → P2(C) defined by T a0a1 a2 = p, where a0, a1, a2 ∈ C and p(z) = a0 + a1 + (a2 + 3a0)z + a1z 2 for z ∈ C, is a linear map. Before we solve this problem, note that an argument of T is a complex vector a = a0a1 a2 ∈ C3, while the corresponding function value p = T (a) is a complex polynomial of degree less than or equal to 2. Some function values are the polynomials given by T 12 3 (z) = 1 + 2 + (3 + 3)z + 2z2 = 3 + 6z + 2z2, T −20 i (z) = −2 + (i− 6)z, T 00 0 (z) = 0. Solution. We use 7.1.2. Let λ, λ′ ∈ C, a = a0a1 a2 ∈ C3, a′ = a′0a′1 a′2 ∈ C3 and let s = T (λa+ λ′a′), p = T (a), and q = T (a′). Then s = T λa0 + λ′a′0λa1 + λ′a′1 λa2 + λ ′a′2 , and hence c©2020 School of Mathematics and Statistics, UNSW Sydney 106 CHAPTER 7. LINEAR TRANSFORMATIONS s(z) = λa0 + λ ′a′0 + λa1 + λ ′a′1 + ( λa2 + λ ′a′2 + 3(λa0 + λ ′a′0) ) z + (λa1 + λ ′a′1)z 2 = λ ( a0 + a1 + (a2 + 3a0)z + a1z 2 ) + λ′ ( a′0 + a ′ 1 + (a ′ 2 + 3a ′ 0)z + a ′ 1z 2 ) = λp(z) + λ′q(z). Thus, T (λa+ λ′a′) = s = λp+ λ′q = λT (a) + λ′T (a′), and hence ♦ As the next examples show, calculus provides many important applications of linear maps as differentiation and integration are both associated with linear maps. Example 5 (Differentiation of polynomials). Let Pn(R) be the vector space of real polynomials of degree less than or equal to n. Show that the function D : Pn(R)→ Pn−1(R), defined by D(p) = p′, where p′(x) = dp dx for p ∈ Pn(R) and x ∈ R, is a linear map. Solution. Firstly, we note that if p is a polynomial of degree k then the derivative p′ exists and is a polynomial of degree k−1. Hence, if p ∈ Pn(R) then p′ ∈ Pn−1(R), and thus D : Pn(R)→ Pn−1(R) is a function. We now prove D is linear by checking the addition and scalar multiplication conditions of the definition of a linear map. For all p, q ∈ Pn(R), we have from the properties of derivatives that (p+ q)′(x) = d dx ( p(x) + q(x) ) = d dx p(x) + d dx q(x) = p′(x) + q′(x). Hence, D(p + q) = (p+ q)′ = p′ + q′ = D(p) +D(q), and the addition condition is satisfied. Further, for all p ∈ Pn(R) and λ ∈ R, we have that d dx (λp(x)) = λ d dx p(x), and hence D(λp) = λD(p) and the scalar multiplication condition is also satisfied. Thus, D is a linear map. (Note that nullity (D) = 1 and rank(D) = n.) ♦ Example 6 (Integration of Polynomials). Show that the function I : Pn(R) → Pn+1(R), defined by I(p) = q, where q(x) = ∫ x 0 p(t)dt for p ∈ Pn(R) and x ∈ R, is a linear map. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.5. FURTHER APPLICATIONS AND EXAMPLES OF LINEAR MAPS 107 Before solving this problem, we give some examples of function values of I. If p is the zero polynomial we have I(p) = ∫ x 0 0 dx = 0, whereas if p is the polynomial of degree 2 defined by p(x) = 1− 3x+ 4x2 then I(p) = ∫ x 0 (1− 3t+ 4t2)dt = x− 3 2 x2 + 4 3 x3 is a polynomial of degree 3. Solution. We will prove that I is a linear map by using Theorem 2 of Section 7.1. For ease of writing, we let q1 = I(p1), q2 = I(p2) and q = I(λ1p1 + λ2p2), where p1, p2 ∈ Pn(R) and λ1, λ2 ∈ R. Then, from the properties of integration, q(x) = ∫ x 0 ( λ1p1(t) + λ2p2(t) ) dt = λ1 ∫ x 0 p1(t)dt+ λ2 ∫ x 0 p2(t)dt = λ1q1(x) + λ2q2(x). Thus, I(λ1p1 + λ2p2) = q = λ1q1 + λ2q2 = λ1I(p1) + λ2I(p2), and hence I is a linear map. (Note that nullity (I) = 0 and rank(I) = n+ 1.) ♦ The next example is one of a class of so-called integral transforms which have many uses in mathematics, science, engineering and economics. [X] Example 7. The Laplace transform. Let s and a be real numbers, and let Va be the set of real-valued functions on the interval (0,∞) defined by Va = { f ∈ R[(0,∞)] : ∫ ∞ 0 e−stf(t)dt exists for a < s <∞ } . Now, from the theory of integration, if f, g ∈ Va and λ, µ ∈ R then∫ ∞ 0 e−st(λf(t) + µg(t))dt exists for a < s <∞, and thus λf + µg ∈ Va. Hence, from the Alternative Subspace Theorem of Section 6.8, Va is a subspace of the vector space R[(0,∞)] of all real-valued functions with domain (0,∞). We now define a function L : Va → R[(a,∞)] with function values L(f) = fL, where fL is the function from the domain (a,∞) to the codomain R defined by fL(s) = ∫ ∞ 0 e−stf(t)dt for a < s <∞. fL is called the Laplace transform of the function f . ♦ We shall now prove that L is a linear map. c©2020 School of Mathematics and Statistics, UNSW Sydney 108 CHAPTER 7. LINEAR TRANSFORMATIONS Proof. Let f, g ∈ Va and λ, µ ∈ R. Then, as noted above, the function h = λf +µg is an element of Va, and its Laplace transform hL = L(λf + µg) satisfies hL(s) = ∫ ∞ 0 e−sth(t)dt = ∫ ∞ 0 e−st ( λf(t) + µg(t) ) dt = λ ∫ ∞ 0 e−stf(t)dt+ µ ∫ ∞ 0 e−stg(t)dt = λfL(s) + µgL(s). Thus, L(λf + µg) = hL = λfL + µgL = λL(f) + µL(g), and hence L is a linear map. The Laplace transform is widely used in, for example, the solution of linear differential equations and the theory of dynamical systems. To understand the technique, work through question 54. It has extensive applications in electrical engineering, computer science, physics, applied and pure mathematics, and so on. To finish this section, we shall describe some applications of linear maps in the areas of optics, chemical engineering, electrical engineering, and population dynamics. [H] Example 8 (Optics). White light is made up of the seven colours: red, orange, yellow, green, blue, indigo, and violet. Assume that a green filter transmits 0% of red and violet, 5% of orange and indigo, 20% of yellow and blue, and 90% of the green light that falls on it. This green filter can be represented by a linear map, and the kernel and image of the map have a simple interpretation. We let a1, a2, a3, a4, a5, a6 and a7 be the intensities of the red, orange, yellow, green, blue, indigo and violet light respectively in the incoming light. Then the filter can be represented by the very simple linear map T : R7 → R7 given by T (a) = 0 0 0 0 0 0 0 0 0.05 0 0 0 0 0 0 0 0.2 0 0 0 0 0 0 0 0.9 0 0 0 0 0 0 0 0.2 0 0 0 0 0 0 0 0.05 0 0 0 0 0 0 0 0 a1 a2 a3 a4 a5 a6 a7 . The kernel of the filter is the set of all possible incident light such that there is no transmitted light. The filter will transmit no light if the incoming light contains only red and violet light. Thus, a basis for the kernel of the map T which models the filter is [1 0 0 0 0 0 0]T (red light only) and [0 0 0 0 0 0 1]T (violet light only), and the nullity is 2. The image of the filter is the transmitted light. The transmitted light can only contain orange, yellow, green, blue, and indigo, and these colours may be taken as a basis for the image. As five basic colours are transmitted, the rank is 5. Mathematically, a basis for the image of the map T is the set of 5 vectors [0 1 0 0 0 0 0]T , [0 0 1 0 0 0 0]T , [0 0 0 1 0 0 0]T , [0 0 0 0 1 0 0]T and [0 0 0 0 0 1 0]T . ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 7.6. [X] REPRESENTATION OF LINEAR MAPS BY MATRICES 109 [X] Example 9 (Population Dynamics). As a simple model of the growth of a human population in a given country, we neglect males, and divide females into the six age groups of 0–14, 15–29, 30–44, 45–59, 60–74 and 75–89. It is found that, on average, 5% of the 0–14 group, 3% of the 15–29 group, 5% of the 30–44 group, 10% of the 45–59 group, 40% of the 60–74 group and 100% of the 75–89 group die in a fifteen-year period, whereas, on average, 0% of the 0–14 group, 50% of the 15–29 group, 45% of the 30–44 group, 6% of the 45–59 group and 0% of the 60–74 and 75–89 groups give birth to a female baby in a fifteen-year period. The population at any time can be represented by a linear-transformation model as follows. Starting at some convenient time, say January 1 1970, at which the population of the country is known, we divide time into intervals of length 15 years. Let k represent the kth of these 15-year periods, starting with k = 0 in the period 1970–1984. Thus, k = 1 represents 1985–1999, k=2 represents 2000–2014, etc. We then let x1(k) number of females of age 0–14 in interval k x2(k) number of females of age 15–29 in interval k x3(k) number of females of age 30–44 in interval k x4(k) number of females of age 45–59 in interval k x5(k) number of females of age 60–74 in interval k x6(k) number of females of age 75–89 in interval k Now, if we know the values of xj(k), 1 6 j 6 6 in a given interval k, we can calculate the values of xj(k + 1), 1 6 j 6 6 in the k + 1th interval from the given data. For example, the females in the age group 15–29 in interval k+1 are the survivors of those in the age group 0–14 in interval k. Thus, x2(k+1) = 0.95x1(k). The numbers in all groups other than the 0–15 group can be obtained in a similar fashion. In our model, the only way that females can enter the 0–15 group is to be born from mothers in the 15–29, 30–44 and 45–59 groups. Thus, x1(k + 1) = 0.50x2(k) + 0.45x3(k) + 0.06x4(k). We therefore obtain the model x(k + 1) = x1(k + 1) x2(k + 1) x3(k + 1) x4(k + 1) x5(k + 1) x6(k + 1) = 0 0.5 0.45 0.06 0 0 0.95 0 0 0 0 0 0 0.97 0 0 0 0 0 0 0.95 0 0 0 0 0 0 0.90 0 0 0 0 0 0 0.60 0 x1(k) x2(k) x3(k) x4(k) x5(k) x6(k) = Ax(k). Thus, the population vector in time interval k+1 is the image under the linear map whose matrix is given above of the population vector in interval k. ♦ 7.6 [X] Representation of linear maps by matrices We have seen in Section 7.2 that every linear map between the vector spaces Rn and Rm can be represented by a matrix. The next theorem shows that this result can be generalised to any linear map between any finite-dimensional vector spaces. c©2020 School of Mathematics and Statistics, UNSW Sydney 110 CHAPTER 7. LINEAR TRANSFORMATIONS Theorem 1 (General Matrix Representation Theorem). Let T : V →W be a linear map from an n-dimensional vector space V to an m-dimensional vector space W , and let BV = {v1, . . . ,vn} be an ordered basis for V and BW = {w1, . . . ,wm} be an ordered basis for W . Then, there is a unique m× n matrix A such that [T (v)]BW = A[v]BV . Further, A is the matrix whose columns are aj = [T (vj)]BW for 1 6 j 6 n. Proof. Let w = T (v) ∈W and [v]BV = x1... xn , i.e. v = x1v1 + · · · + xnvn. By Theorem 3 of Section 7.1, we have w = T (v) = T (x1v1 + · · ·+ xnvn) = x1T (v1) + · · · + xnT (vn). Now, we have shown (Example 3 of Section 7.5) that taking coordinate vectors is a linear map, and hence, on using Theorem 3 of Section 7.1, we have [w]BW = [T (v)]BW = x1[T (v1)]BW + · · · + xn[T (vn)]BW = x1a1 + · · ·+ xnan, where aj = [T (vj)]BW . Finally, we note that aj ∈ Rm for 1 6 j 6 n, and hence, from Proposition 3 of Section 6.4, the linear combination can be rewritten in the matrix form Ax, and the theorem is proved. Note. 1. This theorem says that if T maps v to T (v) then the matrix A transforms the coordinate vector [v]BV for v with respect to an ordered basis BV of the domain into the coordinate vector [T (v)]BW for T (v) with respect to an ordered basis BW of the codomain. 2. It is important to note that the matrix A depends only on the map T , the ordered basis BV and the ordered basis BW . It does not depend on the particular vector v of V whose image is being found. Theorem 1 provides a straightforward algorithm for finding a matrix representation. Algorithm 1. Constructing a matrix representation for a linear map. 1. Find a basis BV = {v1, . . . ,vn} for the domain V and a basis BW = {w1, . . . ,wm} for the codomain W . 2. Find the function values T (vj), 1 6 j 6 n, of the domain basis vectors. 3. Find the coordinate vectors [T (vj)]BW of the function values T (vj) with respect to the codomain basis BW . c©2020 School of Mathematics and Statistics, UNSW Sydney 7.6. [X] REPRESENTATION OF LINEAR MAPS BY MATRICES 111 4. Construct the m × n matrix A with the coordinate vectors [T (vj)]BW , 1 6 j 6 n, as its columns. Example 1. Construct the matrix representation of the derivative map D : P3(R) → P2(R), defined by D(p) = p′, where p′(x) = dp dx for x ∈ R, with respect to the standard bases of P3(R) and P2(R). Solution. As shown in Example 5 of Section 7.5, D is a linear map, and hence it can be represented by a matrix. We again follow algorithm 1. For the domain P3(R), the standard basis is { 1, x, x2, x3 } . The function values of the basis vectors are D(1) = 0, D(x) = 1, D(x2) = 2x, D(x3) = 3x2. The coordinate vectors of these function values with respect to the standard basis { 1, x, x2 } of the codomain are 00 0 , 10 0 , 02 0 , 00 3 respectively. Hence the matrix is A = 0 1 0 00 0 2 0 0 0 0 3 . As an example of the use of this matrix, we find D(p) for p(x) = 1− 3x+ 4x2 + 7x3. The coordinate vector for p with respect to the standard basis { 1, x, x2, x3 } of the domain P3(R) is 1 −3 4 7 . Then the coordinate vector for D(p) with respect to the standard basis {1, x, x2} of the codomain P2(R) is A 1 −3 4 7 = −38 21 , and hence D(p) is given by ( D(p) ) (x) = −3 + 8x+ 21x2. This is clearly the derivative of the polynomial p. ♦ A similar procedure to that given in Example 1 can be used to find a matrix representation for definite integration of polynomials. In the simple case described in Example 1, it is obviously a waste of time to go through the matrix formalism. However, in more complicated examples, it is often useful to be able to use the powerful and efficient algorithms of matrix algebra to solve problems involving differentiation of polynomials. Another example involving polynomials is as follows. c©2020 School of Mathematics and Statistics, UNSW Sydney 112 CHAPTER 7. LINEAR TRANSFORMATIONS Example 2. Find the matrix with respect to standard bases in domain and codomain for the linear transformation T : P2 → C3 defined by T (a0 + a1z + a2z 2) = 2a0 + 3a2−a2 4a1 + 6a2 . Solution. For the domain, the standard basis is { 1, z, z2 } and hence the images of the domain basis vectors are T (1) = 20 0 , T (z) = 00 4 , T (z2) = 3−1 6 . For the codomain, the standard basis for C3 is 10 0 , 01 0 , 00 1 . The coordinate vectors for this basis are just the three image vectors given above, and hence the matrix is A = 2 0 30 0 −1 0 4 6 . ♦ In the above examples, we have restricted the choice of bases to standard bases in domain and codomain. However, it is frequently possible to achieve great simplifications in calculations involving linear maps by using special choices of bases. Two examples of the simplification which can be achieved in this way are given below. Example 3. A linear map TA : R 3 → R3 is defined by TA(x) = Ax, where A = 3.12 0.16 −0.324.76 −1.32 −7.36 2.8 −1.6 −1.8 . Find the matrix which represents TA with respect to the bases in both domain and codomain given by the columns of the matrix B = 1 0 4−3 2 1 2 1 2 . Solution. The images of the basis vectors for the domain basis are TA 1−3 2 = A 1−3 2 = 2−6 4 , TA 02 1 = 0−10 −5 , TA 41 2 = 123 6 . From algorithm 1, the columns of the matrix representing TA with respect to the given codomain basis are just the coordinate vectors for the above images with respect to the codomain basis. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.6. [X] REPRESENTATION OF LINEAR MAPS BY MATRICES 113 These are 20 0 , 0−5 0 , and 00 3 respectively. The required matrix has these three coordinate vectors as its columns, and it is therefore X = 2 0 00 −5 0 0 0 3 . ♦ Note that, in this example, we have found a diagonal matrix to represent a transformation TA. A general theory which shows the conditions under which a diagonal matrix can be found to represent a linear map will be developed in Chapter 8. Example 4. A linear map TA : R 3 → R4 is defined by TA(x) = Ax, where A = 1 4 2 3 4 −1 −2 0 5 3 −4 4 . Find the matrix U which represents TA for the standard basis in the domain R 3 and for the basis in the codomain R4 which consists of the column vectors of the matrix L = 1 0 0 0 3 1 0 0 −2 −1 1 0 3 2 6 1 . Solution. The standard basis for R3 is 10 0 , 01 0 , 00 1 , and hence the images of the domain basis vectors are T 10 0 = A 10 0 = 1 3 −2 3 , T 01 0 = A 01 0 = 4 4 0 −4 , T 00 1 = 2 −1 5 4 . These images are just the columns of the matrix A. Now, following Algorithm 1, we must find the coordinate vectors for T 10 0 , T 01 0 and T 00 1 with respect to the codomain basis given by the columns of L, using the standard Gaussian elimination algorithm. c©2020 School of Mathematics and Statistics, UNSW Sydney 114 CHAPTER 7. LINEAR TRANSFORMATIONS They turn out to be 1 0 0 0 , 4 −8 0 0 and 2 −7 2 0 . Thus U = 1 4 2 0 −8 −7 0 0 2 0 0 0 . Note that L is “lower triangular”, U is “upper triangular”, and that LU = A. ♦ Both of the above examples illustrate the importance of suitable choices of bases in solving problems about linear maps. If the bases are suitably chosen, a matrix representation for a map will usually take on a very simple form from which the properties of the map can be immediately read off. We have, of course, used a special case of this approach repeatedly throughout these notes, where we have solved most problems about linear equations, vector spaces, and linear maps by constructing row-echelon form matrices from which the solutions can be immediately read off. 7.7 [X] Matrix arithmetic and linear maps We have seen in Sections 7.2 and 7.6 that every matrix defines a linear map and that every linear map between finite-dimensional vector spaces can be represented by a matrix. We would therefore expect that properties of matrices — such as matrix addition, multiplication of a matrix by a scalar, matrix multiplication, inverse of a matrix — should also have an interpretation in terms of properties of linear maps. In this section, we shall show that matrix addition corresponds to addition of linear maps, that multiplication of a matrix by a scalar corresponds to multiplication of a linear map by a scalar, and that matrix multiplication corresponds to composition of linear maps. The relationship between inverses of linear maps and matrix inverses is discussed in Section 7.8. Proposition 1 (Addition). Let A and B be real m × n matrices and let TA : Rn → Rm and TB : R n → Rm be the linear maps given by TA(x) = Ax and TB(x) = Bx for all x ∈ Rn. Then the sum T = TA + TB is the linear map T : R n → Rm given by T (x) = (A+B)x for all x ∈ Rn. Proof. By definition of function addition, the sum T = TA + TB is the linear map T : Rn → Rm given by T (x) = (TA + TB)(x) = TA(x) + TB(x) for all x ∈ Rn. Hence, on using the definition of TA and TB and the distributive law for matrix addition and multiplication, we have T (x) = Ax+Bx = (A+B)x for all x ∈ Rn. Proposition 2 (Multiplication by a Scalar). Let A be a real m× n matrix and TA : Rn → Rm be the linear map defined by TA(x) = Ax for all x ∈ Rn. Then the scalar multiple T = λTA is the linear map T : Rn → Rm given by T (x) = (λA)x for all x ∈ Rn. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.8. [X] ONE-TO-ONE, ONTO AND INVERTIBLE LINEAR MAPS AND MATRICES 115 Proof. By definition of multiplication of a function by a scalar, the function T = λTA is given by T (x) = (λTA)(x) = λTA(x) for all x ∈ Rn. Hence, on using the definition of TA, we have T (x) = λ(Ax) = (λA)x for all x ∈ Rn. Proposition 3 (Multiplication and Composition). Let A be a real m× n matrix and B be a real n × p matrix and let TA : Rn → Rm be the linear map defined by TA(x) = Ax for all x ∈ Rn and let TB : R p → Rn be the linear map defined by TB(x) = Bx for all x ∈ Rp. Then the composite TA ◦ TB is the linear map T : Rp → Rm defined by T (x) = (AB)x for all x ∈ Rp. Proof. The composite TA ◦ TB is the linear map T : Rp → Rm defined by T (x) = (TA ◦ TB)(x) = TA(TB(x)) for all x ∈ Rp. Hence, on substituting the function values for TA and TB and using the associative law of matrix multiplication, we have T (x) = TA(Bx) = A(Bx) = (AB)x for all x ∈ Rp. Note. B is acting first, then A. Similar relationships hold between linear transformations on general finite-dimensional vector spaces and their corresponding matrices. We shall give just one example of these more general theorems, whose proof we shall leave to the exercises. Proposition 4. Let U , V and W be finite-dimensional vector spaces with bases BU , BV and BW respectively. Let T : U → V and S : V →W . Then a) S ◦ T is a linear transformation from U →W . b) If the matrix for T with respect to BU and BV is AT , the matrix for S with respect to BV and BW is AS and the matrix for S ◦ T with respect to BU and BW is AST , then AST = ASAT . 7.8 [X] One-to-one, onto and invertible linear maps and matrices In this section we shall discuss the main results involving the ideas of one-to-one, onto and inverses for linear maps and matrices and we shall show how these results for linear maps compare with the results for general functions summarised in Appendix 7.10. c©2020 School of Mathematics and Statistics, UNSW Sydney 116 CHAPTER 7. LINEAR TRANSFORMATIONS 7.8.1 Linear maps The definitions of one-to-one, onto and inverse for linear maps are virtually identical to the defini- tions for general functions. However, for ease of reading it is convenient to restate them here. On applying the definitions of one-to-one, onto and inverse given in Appendix 7.10 to linear maps, we obtain the following definitions. Definition 1. A linear map T : V →W is said to be: a) one-to-one if for all v1,v2 ∈ V , T (v1) = T (v2) only if v1 = v2. b) onto if for all w ∈ W there exists v ∈ V such that w = T (v), that is, if im(T ) =W . Definition 2. Let T : V → W be a linear map. Then a function S : W → V is called an inverse of T if it satisfies the two conditions: a) S ◦T = idV , where idV is the identity map on V defined by idV (v) = v for all v ∈ V . b) T ◦S = idW , where idW is the identity map on W defined by idW (w) = w for all w ∈W . For linear maps, there is a simple connection between a map being one-to-one and the kernel of the map. This connection, which is not in general true for all functions, is as follows. Proposition 1. A linear map T : V →W is one-to-one if and only if ker(T ) = {0}, that is, if and only if nullity(T ) = 0. Proof. We first prove that if T is one-to-one then ker(T ) = {0}. Now, as T is a linear map T (0) = 0, and as T is one-to-one T (v) = T (0) only if v = 0. Thus, T (v) = 0 only if v = 0, and hence ker(T ) = {0}. We next prove that if ker(T ) = {0} then T is one-to-one. We let v1,v2 satisfy T (v1) = T (v2), that is, T (v1)−T (v2) = 0. But, as T is a linear map, T (v1)−T (v2) = T (v1−v2). Then as ker(T ) = {0}, we have T (v1 − v2) = 0 only if v1 − v2 = 0. Thus, T (v1) = T (v2) only if v1 = v2, and hence T is one-to-one. The result stated in Proposition 1 is not true in general for non-linear functions — for example, it is not true for the function of Example 1 of Appendix 7.10 which is neither one-to-one nor onto even though f(x) = x2 = 0 only for x = 0. Proposition 2. If the codomain W of a linear function T : V → W is finite dimensional then T is onto if and only if rank(T ) = dim(W ). Proof. We know that im(T ) is a subspace of W and rank(T ) = dim(im(T )) so, by Theorem 8 of Section 6.6, rank(T ) = dim(W ) if and only if im(T ) =W , that is, if and only if T is onto. The following result is also of importance for linear maps. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.8. [X] ONE-TO-ONE, ONTO AND INVERTIBLE LINEAR MAPS AND MATRICES 117 Proposition 3. If the domain and codomain of a linear map T : V → W are finite-dimensional then: a) If T is one-to-one and onto then dim(V ) = dim(W ). b) If dim(V ) = dim(W ) and T is one-to-one then T is onto. c) If dim(V ) = dim(W ) and T is onto then T is one-to-one. Proof. The proofs of the three parts are based on the Rank-Nullity Theorem and Propositions 1 and 2 above. a) If T is one-to-one then nullity(T ) = 0, and if T is onto then rank(T ) = dim(W ). Therefore, if T is one-to-one and onto, we have from the Rank-Nullity Theorem that dim(V ) = rank(T ) + nullity(T ) = dim(W ) + 0 = dim(W ). b) If dim(V ) = dim(W ) and T is one-to-one then nullity(T ) = 0, and so rank(T ) = dim(V )− nullity(T ) = dim(W ), and therefore T is onto. c) If dim(V ) = dim(W ) and T is onto then rank(T ) = dim(W ), and so nullity(T ) = dim(V )− rank(T ) = dim(V )− dim(W ) = 0, and therefore T is one-to-one. Proposition 4. If a linear map has an inverse then the inverse is also a linear map. Proof. Let T : V →W be a linear map with inverse S : W → V . Then, for all v ∈ V and for all w ∈W , we have v = S(w) if and only if w = T (v). To prove S is linear, we must show that for all w1,w2 ∈W and all scalars λ1, λ2 ∈ F S(λ1w1 + λ2w2) = λ1S(w1) + λ2S(w2). Now if S(w1) = v1 and S(w2) = v2, we have w1 = T (v1) and w2 = T (v2), and hence on using the fact that T is linear we obtain λ1w1 + λ2w2 = λ1T (v1) + λ2T (v2) = T (λ1v1 + λ2v2). Thus, by definition of S, S(λ1w1 + λ2w2) = λ1v1 + λ2v2 = λ1S(w1) + λ2S(w2). The proof is complete. c©2020 School of Mathematics and Statistics, UNSW Sydney 118 CHAPTER 7. LINEAR TRANSFORMATIONS A fundamental result which summarises the main properties of inverses of linear maps is as follows. Theorem 5. If V and W are finite-dimensional vector spaces and if T : V → W is a linear map then the following statements are equivalent: 1. T is invertible, that is, there exists a linear map S : W → V such that (S ◦ T )(v) = v for all v ∈ V and such that (T ◦ S)(w) = w for all w ∈W . 2. T is one-to-one and onto. 3. dim(V ) = dim(W ) and there exists S : W → V such that (T ◦ S)(w) = w for all w ∈W . 4. dim(V ) = dim(W ) and there exists S : W → V such that (S ◦ T )(v) = v for all v ∈ V . Further, the map S in statements 1, 3 and 4 is the inverse of T . Note. The phrase “The statements are equivalent” means that if one statement is true then all statements are true, and also that if one statement is false then all statements are false. Proof. The equivalence of statements 1 and 2 follows immediately from Theorem 1 of Section 7.10. We shall now show that statements 1 and 3 are equivalent by first showing that statement 1 implies statement 3 and then by showing that statement 3 implies statement 1. 1 implies 3. If T is invertible, then there exists S : W → V such that (T ◦ S)(w) = w for all w ∈ W . Further, as T is invertible it is also one-to-one and onto, and hence, from part (a) of Proposition 3, dim(V ) = dim(W ). 3 implies 1. We first prove that T is onto. Let w ∈ W . Then, as S is a function from W to V , there exists v ∈ V such that S(w) = v. Then, on first taking the function value of v under T and next using the property of T ◦ S given in statement 3, we have T (v) = T (S(w)) = (T ◦ S)(w) = w. Thus, we have shown that for all w ∈ W , there exists v ∈ V such that T (v) = w, and hence T is onto. Then, from part (c) of Proposition 3, T is also one-to-one, and thus T is one-to-one and onto and therefore invertible. To complete the proof we prove that 1 and 4 are equivalent by proving first that 1 implies 4 and then that 4 implies 1. 1 implies 4. The proof of this is virtually identical to the proof given above that 1 implies 3 and hence we omit it. 4 implies 1. We first prove T is one-to-one. Let T (v1) = T (v2) for v1,v2 ∈ V . On taking the composite of both sides with S, we have (S ◦ T )(v1) = (S ◦ T )(v2). Then, using the property of S ◦ T given in 4, we have v1 = (S ◦ T )(v1) = (S ◦ T )(v2) = v2. Thus, T (v1) = T (v2) only if v1 = v2, and hence T is one-to-one. Then, from part (b) of Proposi- tion 3, T is also onto, and thus T is one-to-one and onto and therefore invertible. The proof is complete. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.9. [X] PROOF OF THE RANK-NULLITY THEOREM 119 7.9 [X] Proof of the Rank-Nullity Theorem In this section we give a proof of the general Rank-Nullity Theorem stated in Section 7.4.3. Theorem 1 (Rank-Nullity Theorem). If V is a finite dimensional vector space and T : V → W is linear then rank(T ) + nullity(T ) = dim(V ). Proof. Recall that rank(T ) = dim(im(T )) = number of vectors in a basis for im(T ), and that nullity(T ) = dim(ker(T )) = number of vectors in a basis for ker(T ). Let {w1, . . . ,wr}, where r = rank(T ), be a basis for im(T ). Then, since wj ∈ im(T ), there exists an element vj ∈ V such that T(vj) = wj. Let {vr+1, . . . ,vr+ν}, where ν = nullity(T ), be a basis for ker(T ). We shall now prove that the combined set S = {v1, . . . ,vr+ν} is a basis for the domain V , and hence that r + ν = dim(V ). We first prove that S is linearly independent. Suppose, λ1v1 + · · ·+ λr+νvr+ν = 0. (#) Taking the image of this linear combination and using the fact that linear maps preserve linear combinations, we have λ1T (v1) + · · · + λr+νT (vr+ν) = T (0) = 0. Now, for j > r, vj ∈ ker(T ), and hence T (vj) = 0 for j > r. Further, for j 6 r, T (vj) = wj. On substituting these results in the previous equation, we have λ1w1 + · · ·+ λrwr = 0. But the set {w1, . . . ,wr} is linearly independent (since it is a basis for im(T )), and thus λj = 0 for j 6 r. On substituting these values of the scalars in (#) we have λr+1vr+1 + · · ·+ λr+νvr+ν = 0. But the set {vr+1, . . . ,vr+ν} is linearly independent (it is a basis for ker(T )), and thus λj = 0 for j > r. We have therefore proved that (#) is satisfied only if λj = 0 for 1 6 j 6 r+ν, and hence S is linearly independent. Now we show that S is a spanning set for V . Since S ⊆ V , span(S) is a subset of V . To complete the proof that span(S) = V we must prove that V ⊆ span (S), that is, if v ∈ V then v ∈ span (S). Suppose v ∈ V . Then w = T (v) exists and is an element of im(T ). Now the set {w1, . . . ,wr} is a basis for im(T ), and hence there exist scalars such that w = λ1w1 + · · ·+ λrwr. Using these scalars we now form the linear combination vI = λ1v1 + · · ·+ λrvr. c©2020 School of Mathematics and Statistics, UNSW Sydney 120 CHAPTER 7. LINEAR TRANSFORMATIONS Now, by definition of vI and the vj for j 6 r, and on using the fact that linear maps preserve linear combinations, we have T (vI) = w. We now define vR = v − vI . Now vR satisfies T (vR) = T (v − vI) = T (v)− T (vI) = w −w = 0, and hence vR ∈ ker(T ). Then, since {vr+1, . . . ,vr+ν} is a basis for ker(T ), there exist scalars such that vR = λr+1vr+1 + · · · + λr+νvr+ν . Then, on adding the linear combinations for vI and vR, we have v = vI + vR = λ1v1 + · · ·+ λrvr + λr+1vr+1 + · · ·+ λr+νvr+ν , and hence v ∈ span (S). We have therefore established that S is a linearly independent spanning set for V , and hence dim(V ) = r + ν = rank(T ) + nullity(T ) as asserted in the theorem. 7.10 One-to-one, onto and inverses for functions A fundamental problem about functions is the relationship between the points in the codomain and the points in the domain. Now, we know that by the definition of function each point in the domain of a function has exactly one point in the codomain as its function value. However, for a given point in the codomain it is possible in general that either (i) it is not the function value of any point in the domain, (ii) it is the function value of exactly one point in the domain, or (iii) it is the function value of more than one point in the domain. To cover these possibilities the following definitions are introduced. Definition 1. The range or image of a function is the set of all function values, that is, for a function f : X → Y , im(f) = {y ∈ Y : y = f(x) for some x ∈ X}. Definition 2. A function is said to be onto (or surjective) if the codomain is equal to the image of the function, that is, a function f : X → Y is onto if for all y ∈ Y there exists an x ∈ X such that y = f(x). Definition 3. A function is said to be one-to-one (or injective) if no point in the codomain is the function value of more than one point in the domain, that is, a function f : X → Y is one-to-one if f(x1) = f(x2) if and only if x1 = x2. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.10. ONE-TO-ONE, ONTO AND INVERSES FOR FUNCTIONS 121 Note that a function is onto when every point in the codomain is a function value and that it is one-to-one when each function value corresponds to exactly one point in the domain. Further, a function is one-to-one and onto if and only if every point in the codomain is the function value of exactly one point in the domain. Example 1. The function f : R→ R, defined by f(x) = x2 for x ∈ R, is neither one-to-one nor onto. It is not one-to-one, since, for example, x21 = x 2 2 is true for x1 = 3 and x2 = −3 6= x1. It is not onto, since there are some y ∈ R (y < 0) for which no x ∈ R exists such that y = x2. ♦ Example 2. The function f : [0,∞)→ R defined by f(x) = x2 for x > 0 is one-to-one but not onto. It is one-to-one, since if x21 = x 2 2 (and x1 > 0 and x2 > 0) then x1 = x2. However, it is not onto for the reason given in Example 1. ♦ Example 3. The function f : [0,∞)→ [0,∞) defined by f(x) = x2 for x > 0 is both one-to-one and onto. It is one-to-one for the reason given in Example 2. The function is also onto, since for all y > 0 there is an x > 0 such that y = f(x) = x2. ♦ The definition of inverse of a function is as follows. Definition 4. Let f : X → Y be a function. Then a function g : Y → X is called an inverse of f if it satisfies the two conditions: a) g ◦ f = idX , where idX is the identity function in X with the property that idX(x) = x for all x ∈ X. b) f ◦ g = idY , where idY is the identity function in Y with the property that idY (y) = y for all y ∈ Y . An alternative way of stating (a) is that (g ◦ f)(x) = g(f(x)) = x for all x ∈ X, and an alternative way of stating (b) is that (f ◦ g)(y) = f(g(y)) = y for all y ∈ Y . An important connection between the existence of an inverse and the properties of one-to-one and onto is given in the following proposition. Theorem 1. A function has an inverse if and only if the function is both one-to-one and onto. c©2020 School of Mathematics and Statistics, UNSW Sydney 122 CHAPTER 7. LINEAR TRANSFORMATIONS [X] Proof. Let f : X → Y be a function with domain X and codomain Y . We first prove that if f has an inverse g : Y → X then f is one-to-one and onto. To prove one-to-one, we note that if f(x1) = f(x2) then on taking the composition with the inverse g we obtain g(f(x1)) = g(f(x2)), and hence from condition (a) of the definition of inverse x1 = g(f(x1)) = g(f(x2)) = x2. Thus, for all x1, x2 ∈ X, f(x1) = f(x2) only if x1 = x2 and f is one-to-one. We now prove that f is onto. From condition (b) of the definition of inverse, y = f(g(y)) for all y ∈ Y . But g is a function with codomain X, and hence g(y) = x for x ∈ X. Thus, for all y ∈ Y there exists an x = g(y) ∈ X such that y = f(x), and hence f is onto. To complete the proof of the theorem, we must prove that if f is one-to-one and onto then f has an inverse function g : Y → X. Now, as f is an onto function, for each y ∈ Y there exists x ∈ X such that y = f(x). Further, as f is one-to-one, y1 = f(x1) = f(x2) = y2 only if x1 = x2, and hence for each y ∈ Y there exists a unique x ∈ X such that y = f(x). We can therefore define a function g : Y → X by the rule: For each y ∈ Y define g(y) = x where x is the unique x ∈ X such that y = f(x). This function g then has the property that y = f(x) = f(g(y)) = (f ◦g)(y) for all y ∈ Y , and hence g satisfies condition (b) of the definition of inverse. To complete the proof we must prove that the function g also satisfies condition (a). Now, as f is a function with domain X and codomain Y , for each x ∈ X there exists a unique y ∈ Y such that y = f(x). But, as y ∈ Y , we have from the definition of g that x = g(y). Therefore x = g(y) = g(f(x)) = (g ◦ f)(x) for all x ∈ X. The proof is complete. 7.11 Linear transformations and MAPLE The main result of this chapter is the Matrix Representation Theorem. Accordingly, we can do calculations with linear transformations by getting Maple to manipulate the corresponding matrices. Usually you will have to calculate the appropriate matrix by hand, but Maple can handle some linear transformations directly. [Be aware that Maple displays vectors as rows but treats them as columns.] Consider for example the linear transformation of projecting onto a fixed vector b ∈ Rn (see Section 7.3). In the following, the LinearAlgebra command Norm(b,2) calculates ||b||. with(LinearAlgebra): b:=<1,2,3>; T:=a->(b.a/norm(b,2)^2).b); a:=<1,0,-1>; T(a); We could then get Maple to find the matrix for T (with respect to the standard basis) by using A:=< T(e1) | T(e2) | T(e3) >; where e1, e2, e3 have been previously defined as the standard basis vectors in R3. Indeed, you can even get Maple to check that the operator T defined above is linear. c©2020 School of Mathematics and Statistics, UNSW Sydney 7.11. LINEAR TRANSFORMATIONS AND MAPLE 123 x:=Vector(3,i->X[i]); y:=Vector(3,i->Y[i]); T(x+y)-T(x)-T(y); simplify(%); This last calculation will give zero, showing that T (x + y) = T (x) + T (y). You should have a go at showing that T preserves scalar multiplication as well (this turns out to be just a little more complicated for Maple!). You can also check that the matrix defined above actually does give the linear transformation T by using: simplify(A.x - T(x)); Many of the standard calculations concerning linear transformations have special commands in the LinearAlgebra package. In particular you should look at the procedures NullSpace, ColumnSpace, and Rank. As usual, the details of these commands are available using the on-line help facil- ity. For example to find out about NullSpace (which calculates the kernel), enter the command ?NullSpace. c©2020 School of Mathematics and Statistics, UNSW Sydney 124 CHAPTER 7. LINEAR TRANSFORMATIONS Problems for Chapter 7 Problems 7.1 : Introduction to linear maps 1. [R] Explain why the function S : [−1, 1] → R defined by S(x) = 5x for x ∈ [−1, 1] is not a linear map. Then show that the function T : R → R defined by T (x) = 5x for x ∈ R is a linear map. 2. [R] For the following examples, determine whether T is a linear map by using Definition 1. a) T : R2 → R4 defined by T (x) = 3x1 − x2 2x1 + 4x2 −3x1 − 3x2 x2 for x = ( x1 x2 ) ∈ R2. b) T : R4 → R3 defined by T (x) = −2x1 + 5x36x1 − 8x2 + 2x4 −2x1 + 4x2 − 3x3 for x = x1 x2 x3 x4 ∈ R4. c) T : R3 → R2 defined by T (x) = ( 3x1 + 4 −2x1 + 3x2 − x3 ) for x = x1x2 x3 ∈ R3. d) T : R4 → R3 defined for x = x1 x2 x3 x4 ∈ R4 by T (x) = x1 13 −2 + x2 0−4 2 + x3 −2−3 4 + x4 −41 0 . e) T : R3 → R2 defined by T (x) = ( 3x22 − x3 x1 − 4x2 ) for x = x1x2 x3 ∈ R3. 3. [X] Consider the complex numbers as a real vector space. Specify the “natural” domain and codomain for each of the following functions of a complex number and determine if the function is a linear function with F = R. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 7 125 a) T (z) = Re(z) , b) T (z) = Im(z), c) T (z) = |z|, d) T (z) = Arg (z) , e) T (z) = z¯ . 4. [R] Show that the sine function T : R→ R, defined by T (x) = sin(x) for x ∈ R, satisfies parts 1 and 2 of Proposition 1 of Section 7.1 but that it is not a linear map. 5. [H] Use proof by induction to prove Theorem 3 of Section 7.1. 6. [R] If {v1,v2} are linearly independent in a real vector space V and v3 = 2v1 + v2, is there a linear map T : W → R2 where W = span(v1,v2) such that T (v1) = ( 1 2 ) , T (v2) = (−3 2 ) , T (v3) = (−1 3 ) ? 7. [R] A linear map T : R3 → R4 has function values given by T 10 0 = 1 2 3 4 , T 01 0 = −3 0 1 4 , T 00 1 = 4 0 −5 6 . Find T 2−1 4 and T x1x2 x3 . 8. [R] Show that any function T : R3 → R4 with function values given by T 10 0 = 1 2 3 4 , T 01 0 = −3 0 1 4 , T 00 1 = 4 0 −5 6 , and T 1−3 1 = 2 4 3 1 is not a linear map. 9. [R] A linear function T : R3 → R2 has function values given by T 12 3 = (4 1 ) , T −21 −4 = (−1 2 ) , T 1−1 2 = (−4−2 ) . Write 02 1 as a linear combination of the vectors in the basis 12 3 , −21 −4 , 1−1 2 of R3. Hence find T 02 1 . HINT. Use Theorem 3 of Section 7.1. 10. [H] Given that T 12 3 = (4 1 ) , T −21 4 = (−1 2 ) and T 17 13 = ( 4−2 ) , show that T is not a linear map. c©2020 School of Mathematics and Statistics, UNSW Sydney 126 CHAPTER 7. LINEAR TRANSFORMATIONS Problems 7.2 : Linear maps from Rn to Rm and m× n matrices 11. [R] For any function in question 2 which is a linear map, find a matrix A such that T (x) = Ax for x in the domain by using the results of the Matrix Representation Theorem of Section 7.2. 12. [R] For any function in question 2 which is a linear map, write a system of linear equations for T (x) for x in the domain, and hence find a matrix A such that T (x) = Ax. Check that the matrices you obtain in this question are the same as the matrices that you obtained in the previous question. Problems 7.3 : Geometric examples of linear transformations 13. [R] For each of the following 2 × 2 matrices, draw a picture to show Ae1, Ae2, Ab, where e1 and e2 are the standard basis vectors in R 2 and where b = 2e1 + 3e2. a) ( 2 0 0 0.7 ) , b) ( −2 0 0 2 ) , c) ( −2 0 0 −3 ) , d) ( 6 −2 6 −1 ) , e) ( 4 −4 3 −4 ) . 14. [R] Draw the image of the star in Figure 4(a) on page 90 under each of the transformations defined by the matrices in Quesion 13. 15. [R] Let T be the rotation in the plane R2 through angle π 3 in the anti-clockwise direction. Find the matrix which represents the linear transformation T . 16. [H] Let x be the position vector of a point X in R2, and let x′ be the position vector of the point X ′ which is the reflection of X in the x2-axis. (That is, assume that a mirror is placed along the x2-axis and that X ′ is the reflection of X in the mirror.) Show that the function T : R2 → R2 defined by T (x) = x′ is a linear map. Find a matrix A which transforms x = ( x1 x2 ) into x′ = ( x′1 x′2 ) . 17. [H] Let x be the position vector of a point X in R3 and let x′ be the position vector of the point X ′ which is the reflection of X in the (x1, x2)-plane. Show that the function T : R3 → R3 defined by T (x) = x′ is a linear map. Find a matrix A which transforms the position vector x1x2 x3 of X into the position vector x′1x′2 x′3 of X ′. 18. [X] Let p be the position vector of a point P in Rn and let q be the position vector of the point Q which is the reflection of P in the line x = λd; λ ∈ R. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 7 127 Show that the function T : Rn → Rn defined by T (p) = q is a linear map. Find a matrix A which transforms p = p1... pn into q = q1... qn . 19. [R] Let b be a fixed vector in R3. Is the function T : R3 → R3 defined by T (x) = b× x for x ∈ R3, where b × x is the cross product, a linear map? Prove your answer. Find a matrix A which transforms the vector x = x1x2 x3 into its function value T (x) = x′1x′2 x′3 . 20. [H] Suppose that b = 10 2 . Prove that the projection function T : R3 → R3 of vectors onto b, which is defined by T (a) = projb(a) = a · b |b|2 b for a ∈ R 3, is a linear map. Find a matrix A which transforms the vector a into its projection T (a). 21. [H] If a is regarded as a fixed non-zero vector, is the function S : Rn → Rn defined by S(b) = { projba for b ∈ Rn\{0} 0 b = 0 a linear map? Prove your answer. 22. [X] Let Aφ, Aθ and Aφ+θ be the matrices for rotations in the plane by angles φ, θ and φ+ θ respectively (see Example 3 of Section 7.3). Prove that AθAφ = Aφ+θ. What is this saying geometrically? 23. [X] Let B = {i, j,k} be an ordered orthonormal basis (Cartesian coordinate system) for a three-dimensional geometric vector space. Let a be a three-dimensional geometric vector and let a′ = Rα(a) be the vector obtained by rotating a anticlockwise by an angle α about an axis parallel to j. If the coordinate vectors of a and a′ are [a]B = a1a2 a3 and [a′]B = a′1a′2 a′3 , find the rule Rα a1a2 a3 = a′1a′2 a′3 and show that it defines a linear map from R3 to R3. Also find the matrix A such that a′ = Aa. c©2020 School of Mathematics and Statistics, UNSW Sydney 128 CHAPTER 7. LINEAR TRANSFORMATIONS Problems 7.4 : Subspaces associated with linear maps 24. [R] Show that the set λ−2λ λ : λ ∈ R is the kernel of the matrix ( 3 1 −1 8 3 −2 ) . 25. [R] Find the kernel and the nullity of each of the following matrices. a) A = 2 −1 31 −2 3 4 1 −1 , b) B = 0 5 152 −2 −4 3 −3 −6 , c) C = 1 2 −1 13 2 0 −2 0 1 −1 1 . Where possible, give a geometric interpretation of the kernels. 26. [R] Find a basis for the kernel, and the nullity, of each of the following matrices. a) D = 1 2 5 01 −1 −4 0 −1 0 1 1 , b) E = 1 1 −1 2 −1 0 5 −4 1 0 0 1 . 27. [H] Let W = {( x1 x2 x3 x4 )T : x1 + x2 + x3 + x4 = x1 + 2x2 + 3x3 + 4x4 = 0 } . Find a matrix A such that W = kerA. 28. [R] Find ker(T ) and nullity(T ) for the linear functions of question 2. 29. [R] Find ker(T ) and nullity(T ) for the linear functions of questions 16 through 20. Give a geometric interpretation of the kernels. 30. [H] Suppose that b = 12 3 . a) Prove that the mapping T : R3 → R3, given by T (x) = b × x for all x ∈ R3, is a linear mapping. b) Find the dimension of the kernel of this mapping. 31. [R] For each given vector b and matrix A, determine if b ∈ im(A). a) b = 1110 4 , A = 1 −2 32 −1 3 4 1 −1 . b) b = 9−2 −4 , A = 0 5 152 −2 −4 3 −3 −6 . c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 7 129 c) b = −2−6 −4 , A = 1 2 −1 13 2 0 −2 0 1 −1 1 . 32. [R] Find conditions on b1, b2, b3 for the vector b = b1b2 b3 ∈ R3 for b to belong to im(A) for the matrices in the preceding question. 33. [R] Find a basis for the image, and the rank, of each of the matrices in questions 25 and 26. 34. [R] By comparing the answers to questions 25, 26 and 33, verify the conclusion of the Rank- Nullity Theorem. 35. [R] Find a basis for the image, and the rank, of each of the matrices A = 1 1 −1 −2 1 0 0 1 4 −1 0 0 0 2 2 0 0 0 0 0 , B = 1 1 0 2 1 0 0 −1 −2 2 −1 −1 1 4 −1 1 1 0 4 2 . 36. [R] Find a basis for R3 which contains a basis of im(C), where C = 1 2 3 42 −4 6 −2 −1 2 −3 1 . 37. [R] Find a basis for R4 which contains a basis of im(D), where D = 1 3 3 −1 7 2 6 3 1 8 3 9 3 4 7 4 12 0 8 4 . 38. [R] A linear map T : R4 → R4 has the property that T 1 0 0 0 = 3 0 0 0 , T 0 1 0 0 = 4 0 0 0 , T 0 0 1 0 = −1 3 0 0 , T 0 0 0 1 = 0 −3 0 1 . a) Write down the matrix representation of T with respect to the standard basis (in both domain and co-domain). b) Find a basis for the image of T and find the rank of T . c) State the dimension of the kernel of T . c©2020 School of Mathematics and Statistics, UNSW Sydney 130 CHAPTER 7. LINEAR TRANSFORMATIONS d) Does the vector 8 −3 1 2 belong to the image of T ? Give reasons. 39. [H] Let A ∈Mnn(R). Show that the following statements are equivalent, that is, show that if any statement is true then all are true, whereas if any statement is false then all are false. a) For all x and y in Rn, Ax = Ay if and only if x = y. b) ker(A) = {0}. c) nullity(A) = 0. d) rank(A) = n. e) im(A) = Rn. f) The columns of A form a basis for Rn. 40. [H] Let A ∈Mmn(R), and let {ej : 1 6 j 6 m} be the set of m standard basis vectors of Rm. If A is of rank r, explain why at most r of the m equations Axj = ej can have solutions. 41. [H] Let A and ej be as in the previous question. If nullity(A) = ν, explain why at least m− n+ ν of the m equations Axj = ej do not have solutions. 42. [H] Let A ∈ Mnn(R), rank(A) = n, and ej be the standard basis vectors for Rn. Prove that each of the n equations Axj = ej , 1 6 j 6 n, has a unique solution. 43. [X] Let T : V → V be a linear map and assume that dim(V ) = n. Show that the following statements are equivalent. a) T (v) = T (w) if and only if v = w for all v,w ∈ V . b) ker(T ) = {0}. c) nullity(T ) = 0. d) rank(T ) = n. e) im(T ) = V . Problems 7.5 : Further applications and examples of linear maps 44. [X] Show that the function T : R4 →M22(R) defined by T (a) = ( a1 a2 a3 a4 ) for a = (a1, a2, a3, a4) ∈ R4 is a linear map. 45. [X] Show that the function T : R4 →M22(R) defined by T (a1, a2, a3, a4) = ( 3a1 − 2a4 a4 + 2a3 −5a2 + 3a3 a1 ) is a linear map. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 7 131 46. [X] Is the function T :M23(R)→ R6 defined by T (A) = ( a11 a12 a13 a21 a22 a23 )T for A = ( a11 a12 a13 a21 a22 a23 ) ∈M23(R) a linear map? 47. [R] Show that the function T : P2 → C3 defined by T (a0 + a1z + a2z 2) = a0a1 a2 is a linear map. [X] Note that T maps a polynomial in P2 into its coordinate vector with respect to the standard basis {1, z, z2}. 48. [H] Show that the function T : C4 → P4 defined by T (a) = p for a = a1 a2 a3 a4 ∈ C4, where p(z) = (a1 − 3a2) + (2a3 − 3a4)z + a2z3 + (3a1 − a2 + 2a3 + 4a4)z4 for all z ∈ C, is a linear map. 49. [R] Show that the function T : P3(R)→ P3(R) defined by T (p) = 4p′ + 3p, where p′(x) = dp dx , is a linear map. 50. [R] Is the function T : P3(R)→ P3(R) defined by T (p) = q, where q(x) = 4xp′(x)− 8p(x) for x ∈ R, a linear map? Prove your answer. 51. [H] Show that the function T : P3(R)→ P4(R) defined by T (p) = q, where q(x) = ∫ x 0 p(t)dt for x ∈ R, is a linear map. 52. [X] Let V be the subset of the vector space R[R] of all real-valued functions on R defined by V = { f ∈ R[R] : ∫ x 0 f(t)dt exists for all x ∈ R } . c©2020 School of Mathematics and Statistics, UNSW Sydney 132 CHAPTER 7. LINEAR TRANSFORMATIONS Show that V is a subspace of R[R], and then show that the rule T : V →R[R] defined by T (f) = g, where g(x) = ∫ x 0 f(t)dt for f ∈ V and x ∈ R, is a linear map. 53. [X] A function S : R→ Z is defined by S(x) = y, where y is the integer obtained on rounding x to the nearest integer. Is S a linear map? A function T : R→ R is defined by T (x) = y, where y is the integer obtained on rounding x to the nearest integer. Is T a linear map? 54. [X] Let y be a real-valued function with domain R such that y and its first two derivatives y′ and y′′ exist, and such that the Laplace transforms (see Example 7 of Section 7.5) of y, y′ and y′′ also exist on the interval (0,∞). Given that y(0) = 1 and y′(0) = 2 and that y satisfies the differential equation y′′(x) + 4y′(x) + 3y = e−3x, find an explicit formula for the Laplace transform yL(s) of y in terms of s. HINT. Take the Laplace transform of the differential equation and use integration by parts to find formulae for the Laplace transforms of y′ and y′′ in terms of yL(s). 55. [H] Consider the mapping T : P3(R)→ R2 defined by T (p(x)) = ( a− b c− d ) where p(x) = a+ bx+ cx2 + dx3. a) Prove that T is linear. b) Show that p(x) = 3x3 + 3x2 − 2x− 2 is in the kernel of T . 56. [H] Consider the function T : R4 → P1 defined by T a b c d = (a− 2b) + (c+ d)x. a) Find T 1 −3 2 −4 . b) Show T is a linear transformation. c) Write down a non-zero vector in R4 which lies in ker(T ). 57. [H] A linear map T : C3 → P3 has function values given by T 10 0 = 1 + (2 + i)z − 3z3, T 01 0 = (4− 3i)z + z2, T 00 1 = −2 for z ∈ C. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 7 133 Find T i2 −1 and T x1x2 x3 . 58. [X] Let Pn be the real vector space of polynomials of degree less than or equal to n, and take its standard basis to be {1, x, x2, . . . , xn}. For p(x) ∈ P3, let (T (p)) (x) ∈ P4 be defined by (T (p)) (x) = ∫ x 0 p(t)dt. a) Show that T is a linear transformation from P3 to P4. b) Calculate the matrix A of this linear transformation with respect to the standard bases of P3 and P4. c) Find a basis for the image of T , im(T ). d) Find a basis for the kernel of T , ker(T ). 59. [R] A car manufacturer produces a station wagon, a four-wheel drive, a hatchback and a sedan model. Each model is made from steel, plastics, rubber and glass, and it also requires a number of hours of labour to produce. The requirements per car of these inputs for each model are as shown in the following table. steel plastics rubber glass labour (tonnes) (tonnes) (tonnes) (tonnes) (hours) station wagon 1 0.5 0.1 0.2 1 4-wheel drive 1.5 0.6 0.2 0.15 1.5 hatchback 0.8 0.7 0.2 0.2 1.1 sedan 0.9 0.6 0.25 0.3 0.9 Construct a matrix which can be used to express the factory input as a linear function of the factory output. Problems 7.6 : [X] Representation of linear maps by matrices 60. [R] Let idR2 be the identity map for R 2. Find a matrix representation of this map with respect to standard bases in domain and codomain. 61. [X] Let idR2 be the identity map for R 2. Find a matrix representation of this map with respect to the domain basis {( 1 1 ) , ( 1 −1 )} and the codomain basis {( 1 3 ) , ( 1 2 )} . 62. [X] A linear mapping G : P2 → P2 has matrix representation A = 1 2 −1−1 1 0 3 4 1 c©2020 School of Mathematics and Statistics, UNSW Sydney 134 CHAPTER 7. LINEAR TRANSFORMATIONS with respect to the standard basis {1, x, x2} in both domain and co-domain. Find G(p), where p(x) = −3 + x+ 5x2. 63. [X] For each of the linear maps in questions 48 to 51, find a matrix which represents the linear map for standard bases in the domain and codomain. 64. [X] Using your results of the previous question or otherwise, find the kernel, nullity, image and rank of the linear maps in questions 48 to 51. 65. [X] For the linear map T : Pn(R)→ Pn(R) defined by T (p) = q, where q(x) = x2 d2p dx2 − 3xdp dx + 3p(x) for x ∈ R, find a matrix which represents T with respect to standard bases in domain and codomain. Hence, or otherwise, find the kernel and nullity of T . 66. [X] Let V be a vector space and let B = {u1,u2,u3} be an orthonormal basis for V . Let a ∈ V be a vector whose coordinate vector with respect to B is a1a2 a3 . Let a′1a′2 a′3 be the coordinate vector of a with respect to the basis B′ = {u′1,u′2,u′3} given by u′1 = 1√ 2 u1 + 1√ 2 u3, u′2 = − 1√ 2 u1 + 1√ 2 u3, u′3 = −u2 Show that B′ is an orthonormal basis, then show that the rule T a1a2 a3 = a′1a′2 a′3 is a linear map from R3 to R3, and find a matrix representation for this function with respect to standard bases for R3 in domain and codomain. Finally, show that the matrix you have constructed is a matrix representation of the identity map idV : V → V with respect to the basis B in domain and B′ in codomain. Problems 7.7 : [X] Matrix arithmetic and linear maps 67. [X] Let T : V → W and S : V → W be linear maps. Let BV be a basis for V and BW be a basis for W and let A and B be the matrices representing T and S with respect to the bases BV and BW . Prove that A+ B is the matrix representing the sum function T + S with respect to the bases BV and BW . 68. [X] Let T : U → V and S : V → W be linear maps. Let BU , BV and BW be bases for U , V and W respectively. Let A be the matrix representing T with respect to bases BU and BV and let B be the matrix representing S with respect to bases BV and BW . Prove that the matrix product BA is the matrix which represents the composition function S◦T : U →W with respect to the bases BU and BW . c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 7 135 Problems 7.8 : [X] One-to-one, onto and invertible linear maps and matrices 69. [X] Let V = C[R], the vector space of all continuous real-valued functions on R. Let B = {ex, (x− 1)ex, (x− 1)(x− 2)ex} ⊆ V a) Prove that B is linearly independent. b) Let W = span (B), and let D : W → W denote the linear transformation D(f) = f ′, where f ′ is the derivative of f. i) Find the matrix for D with respect to the ordered basis B of W. ii) Find the matrix for the linear transformation T = D ◦D. iii) Hence or otherwise, prove that for every g ∈W, there exists f ∈W such that f ′′ = g. 70. [X] For a field F, define T :M22(F)→ F3 by T ([ a11 a12 a21 a22 ]) = a22a12 − a21 3a11 + a12 . a) Show that T is a linear transformation. b) Find the kernel of T and the nullity of T . c) Find the rank of T . d) Is T one-to-one (injective)? Give a brief reason for your answer. e) Find the matrix of T with respect to the standard bases of M22(F) and F 3. Problems 7.11 : Linear transformations and MAPLE 71. [M] Consider the following MAPLE output > with(LinearAlgebra): > A:=<<2,4,-2,4>|<-3,-6,3,-6>|<1,2,-2,1>|<-1,-3,2,-2>|<1,-1,-1,-1>>; A := 2 −3 1 −1 1 4 −6 2 −3 −1 −2 3 −2 2 −1 4 −6 1 −2 −1 > b:=; b := a1 a2 a3 a4 c©2020 School of Mathematics and Statistics, UNSW Sydney 136 CHAPTER 7. LINEAR TRANSFORMATIONS > GaussianElimination(); 2 −3 1 −1 1 a1 0 0 −1 1 0 a3 + a1 0 0 0 −1 −3 a2 − 2 a1 0 0 0 0 0 a4 − a1 − a3 − a2 a) Find a basis for col(A), the column space of A. b) What is the dimension of col(A)? c) Under what conditions does (a1, a2, a3, a4) belong to col(A). d) Find a basis for the kernel, or null space, of A. e) What are the values of the rank and nullity of A? c©2020 School of Mathematics and Statistics, UNSW Sydney 137 Chapter 8 EIGENVALUES AND EIGENVECTORS . . . she set to work very carefully, nibbling first at one and then at the other, and growing sometimes taller, and sometimes shorter,. . . Lewis Carroll, Alice in Wonderland. Eigenvalues and eigenvectors are of great theoretical and practical importance. Some practical applications of eigenvalues and eigenvectors include the following. 1. Oscillations. For example, vibrating strings, organ pipes, wing flutter on an aircraft, vibra- tions of buildings and bridges, etc. 2. Quantum Physics and Chemistry. Structure of atoms, molecules, nuclei, solids etc. 3. Electronics and Electrical Engineering. Microwave oscillators, amplifiers, signal transmission, communications networks, etc. 4. Economics. Stability of economic systems, dynamic econometric models, Leontief input- output models, inventory models, stock market models, etc. 5. Biological and Ecological Systems. Solution of population models, stability of ecological systems etc. In this chapter we shall only be able to give a brief introduction to this extremely important topic. A general theory of eigenvalues and eigenvectors and some applications of them is given in the second year mathematics courses. 8.1 Definitions and examples We are concerned with linear maps in which the domain and the codomain are the same vector space, that is, with linear maps of the form T : V → V . The fundamental questions asked are: 1. Given a map T , are there vectors v ∈ V which are related in a very simple way to their images T (v) ∈ V ? c©2020 School of Mathematics and Statistics, UNSW Sydney 138 CHAPTER 8. EIGENVALUES AND EIGENVECTORS 2. [X] Is there a choice of basis for V such that the matrix representing T for this basis takes on a very simple form? The answer to both of these questions is yes. For question 1, we look for vectors for which T (v) is a multiple of v. Formally, we have Definition 1. Let T : V → V be a linear map. Then if a scalar λ and non-zero vector v ∈ V satisfy T (v) = λv, then λ is called an eigenvalue of T and v is called an eigenvector of T for the eigenvalue λ. Note. An eigenvector is non-zero, but zero can be an eigenvalue. Example 1. For infinitely differentiable real-valued functions f , the derivative D(f) = f ′, where f ′(x) = df dx for x ∈ R defines a linear map D. The exponential function satisfies D(eλx) = λeλx, and hence eλx is an eigenvector ofD with eigenvalue λ. It should be noted that the great importance of exponential functions in calculus is due to the fact that they are the only functions f where f ′ is a multiple of f . ♦ Calculus and its applications provides a very rich source of eigenvalue and eigenvector problems. However, we are mainly concerned in this course with algebraic problems involving linear maps between finite-dimensional vector spaces. These linear maps can always be represented by matrices, and hence we will be concerned in the remainder of this chapter with eigenvalues and eigenvectors of matrices. When dealing with eigenvalues and eigenvectors of matrices we will be forced to use complex numbers for our scalar field. The fundamental reason for this is that the eigenvalues of a matrix are actually zeroes of some polynomial and, as we have seen in Chapter 3, we can only be certain of finding zeroes when the polynomials are complex polynomials. Thus, the “natural” field of scalars for eigenvalues and eigenvectors is the set of complex numbers C, and the “natural” vector spaces are the complex vector spaces Cn (see Example 2 of Section 6.1). For the special case of a matrix, Definition 1 becomes: Definition 2. Let A ∈ Mnn(C) be a square matrix. Then if a scalar λ ∈ C and non-zero vector v ∈ Cn satisfy Av = λv, then λ is called an eigenvalue of A and v is called an eigenvector of A for the eigenvalue λ. c©2020 School of Mathematics and Statistics, UNSW Sydney 8.1. DEFINITIONS AND EXAMPLES 139 Example 2. For the diagonal 2× 2 matrix, A = ( λ1 0 0 λ2 ) , the standard basis vectors e1 = ( 1 0 ) and e2 = ( 0 1 ) satisfy A ( 1 0 ) = ( λ1 0 ) = λ1 ( 1 0 ) and A ( 0 1 ) = ( 0 λ2 ) = λ2 ( 0 1 ) . Thus, e1 is an eigenvector of A with eigenvalue λ1 and e2 is an eigenvector of A with eigenvalue λ2. A picture of this result is shown in Figure 1 for the special case of λ1 = 3 and λ2 = −2. ♦ e1 Ae1 = 3e1 e2 Ae2 = −2e2 0 Figure 1: The eigenvectors of the diagonal matrix ( 3 0 0 −2 ) . Example 3. For the matrix A = 0 1 00 0 1 20 −24 9 and the vector v = 15 25 by matrix multiplication, we have Av = 5v, and hence v is an eigenvector of A for the eigenvalue λ = 5. ♦ 8.1.1 Some fundamental results The fundamental theoretical results for eigenvalues and eigenvectors draw on results given in pre- vious chapters on linear equations, polynomials, vector spaces, linear maps, and determinants. The following theorem is extremely important. Theorem 1. A scalar λ is an eigenvalue of a square matrix A if and only if det(A− λI) = 0, and then v is an eigenvector of A for the eigenvalue λ if and only if v is a non-zero solution of the homogeneous equation (A− λI)v = 0, i.e., if and only if v ∈ ker(A− λI) and v 6= 0. Proof. From Definition 2, A is a square matrix, and an eigenvalue λ and corresponding eigenvector v of A satisfy the equation Av = λv, where v 6= 0. This equation can be rearranged in the form 0 = Av − λv = Av − λIv = (A− λI)v, where I is an identity matrix of the same size as A. c©2020 School of Mathematics and Statistics, UNSW Sydney 140 CHAPTER 8. EIGENVALUES AND EIGENVECTORS Now, A− λI is a square matrix, and hence (by a proposition in Chapter 5) the equation (A− λI)v = 0 can have a non-zero solution if and only if det(A− λI) = 0. Thus, λ is an eigenvalue if and only if det(A− λI) = 0 and the first part of the theorem is proved. Then, if λ is an eigenvalue, v is an eigenvector if and only if it is a non-zero solution of the above homogeneous equation, that is, if and only if v ∈ ker(A − λI) and v 6= 0. The proof is complete. Note. The set of all eigenvectors of A for eigenvalue λ is therefore equal to ker(A − λI) with 0 removed. Also, there are infinitely many eigenvectors corresponding to a single eigenvalue. A second fundamental result for the theory of eigenvalues is the following. Theorem 2. If A is an n × n matrix and λ ∈ C, then det(A − λI) is a complex polynomial of degree n in λ. This theorem can be proved in a straightforward, but tedious, fashion by direct expansion of the determinant det(A− λI). For example, for n = 3, we have A− λI = a11 a12 a13a21 a22 a23 a31 a32 a33 − λ 1 0 00 1 0 0 0 1 = a11 − λ a12 a13a21 a22 − λ a23 a31 a32 a33 − λ . Then, by direct evaluation of det(A− λI) by expansion along the first column, we obtain det(A− λI) = (a11 − λ) ( (a22 − λ)(a33 − λ)− a32a23 ) − a21 ( a12(a33 − λ)− a32a13 ) + a31 ( a12a23 − (a22 − λ)a13 ) = −λ3 + terms containing λ2, λ and constants. Hence, for n = 3, det(A− λI) is a complex polynomial of degree 3 as stated in Proposition 2. Definition 3. For a square matrix A, the polynomial p(λ) = det(A− λI) is called the characteristic polynomial for the matrix A. Example 4. For the 3× 3 matrix, A = 1 −1 23 −4 −1 5 1 2 , the characteristic polynomial is the cubic p(λ) = ∣∣∣∣∣∣ 1− λ −1 2 3 −4− λ −1 5 1 2− λ ∣∣∣∣∣∣ = −λ3 − λ2 + 16λ+ 50. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 8.1. DEFINITIONS AND EXAMPLES 141 We can now apply the theory of roots of complex polynomials developed in Chapter 3 to obtain the following fundamental result. Theorem 3. An n × n matrix A has exactly n eigenvalues in C (counted according to their multiplicities). These eigenvalues are the zeroes of the characteristic polynomial p(λ) = det(A−λI). Proof. From Proposition 2, the characteristic polynomial p(λ) = det(A − λI) is a polynomial of degree n over the complex field. Thus, from the Factorisation Theorem of Chapter 3, p has exactly n zeroes (counted according to their multiplicities) which from Theorem 1 are the eigenvalues of A. Example 5. For the matrix A of Example 4, the roots of the cubic characteristic polynomial are (to 4-figure accuracy) 4.688, −2.844 + 1.605i, −2.844− 1.605i, and these are the three eigenvalues of A. ♦ Note. 1. The equation p(λ) = 0 is called the characteristic equation for A. 2. Theorem 3 is of fundamental theoretical importance, as it proves the existence of eigenvalues of a matrix. However, with the exception of 2× 2 and specially constructed larger matrices, modern methods of finding eigenvalues of matrices do not make use of this theorem. These efficient modern methods are currently available in standard matrix software packages such as MAPLE, MATLAB. 8.1.2 Calculation of eigenvalues and eigenvectors As stated above, Theorem 3 provides a practical method for finding eigenvalues of 2×2 or specially constructed larger matrices. The corresponding eigenvectors can then be obtained from Theorem 1. Some examples of the calculation of eigenvalues and eigenvectors for simple matrices are as follows: Example 6. For a diagonal matrix the diagonal entries are eigenvalues and the standard basis vectors are eigenvectors. For example, if A = ( λ1 0 0 λ2 ) then det(A− λI) = det (( λ1 0 0 λ2 ) − ( λ 0 0 λ )) = ∣∣∣∣λ1 − λ 00 λ2 − λ ∣∣∣∣ = (λ1 − λ)(λ2 − λ) = 0 has the solutions λ = λ1 and λ = λ2. Then for the eigenvector corresponding to λ = λ1, we solve (A− λ1I)v = ( 0 0 0 λ2 − λ1 )( v1 v2 ) = 0. It is obvious that one of the solutions is v = ( 1 0 ) = e1. Similarly, e2 is a solution of the homogeneous equation (A− λ2I)v = 0, and hence e2 is one of the eigenvectors corresponding to the eigenvalue λ2. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 142 CHAPTER 8. EIGENVALUES AND EIGENVECTORS Example 7. Find eigenvalues and eigenvectors of A = ( 2 2 2 2 ) . Solution. The first step is to find the eigenvalues from the characteristic polynomial. We have p(λ) = det(A− λI) = ∣∣∣∣2− λ 22 2− λ ∣∣∣∣ = λ2 − 4λ. Note that A− λI is obtained from A by subtracting λ from each diagonal element of A, and that the characteristic polynomial is a quadratic. The roots of the characteristic equation are λ1 = 0 and λ2 = 4. Note that, as asserted in Theorem 3, there are two eigenvalues for the 2× 2 matrix A. The next step is to find an eigenvector for each eigenvalue by finding ker(A−λI), first for λ = 0, and then for λ = 4. For eigenvalue λ1 = 0, the eigenvectors are the non-zero vectors in ker(A). By row reduction,( 2 2 0 2 2 0 ) R2 = R2 −R1−−−−−−−−−−−−−−→ ( 2 2 0 0 0 0 ) and then, back substitution gives ker(A) = span (v1) where v1 = (−1 1 ) . The set of eigenvectors corresponding to the eigenvalue 0 is then{ t (−1 1 ) : t 6= 0 } . For λ2 = 4, the required eigenvectors are ker(A− 4I) (with 0 deleted) where A− 4I = ( −2 2 2 −2 ) . Solving (A− 4I)v = 0 in the same way, we find that v2 = ( 1 1 ) is a basis for ker(A− 4I) and the set of eigenvectors corresponding to the eigenvalue 4 is then{ t ( 1 1 ) : t 6= 0 } . [Note that the scalar field is assumed to be C, so t ∈ C.] ♦ Example 8. Find eigenvalues and eigenvectors of A = ( 2 1 −1 4 ) . c©2020 School of Mathematics and Statistics, UNSW Sydney 8.1. DEFINITIONS AND EXAMPLES 143 Solution. The eigenvalues are solutions of the characteristic equation∣∣∣∣ 2− λ 1−1 4− λ ∣∣∣∣ = (2− λ)(4 − λ) + 1 = (λ− 3)2 = 0. Hence, there is one eigenvalue λ = 3 with multiplicity 2. Eigenvectors. The eigenvectors are vectors v 6= 0 ∈ ker(A− 3I), where A− 3I = ( −1 1 −1 1 ) . On solving (A−3I)v = 0, we find that the only solution is v = t ( 1 1 ) for t ∈ C. A matrix with fewer linearly independent eigenvectors than columns, as in this example, is called a defective matrix (poor thing). ♦ As the next example shows, it is also possible to have a 2 × 2 matrix A with one eigenvalue (with multiplicity 2) and two linearly independent eigenvectors. Example 9. The matrix A = ( 3 0 0 3 ) has eigenvalue λ = 3 (with multiplicity 2) and ker(A − 3I) is span {( 1 0 ) , ( 0 1 )} . ♦ Example 10. Find all eigenvalues and eigenvectors of the matrix A = ( 1 2 −2 1 ) . Solution. As usual the eigenvalues are solutions of the characteristic equation det(A − λI) = 0, that is, of ∣∣∣∣1− λ 2−2 1− λ ∣∣∣∣ = λ2 − 2λ+ 5 = 0. In this case the roots of the quadratic are the complex numbers λ1 = 1 + 2i and λ2 = 1− 2i. The eigenvectors for λ1 = 1 + 2i, A− (1 + 2i)I = ( −2i 2 −2 −2i ) . An equivalent row-echelon form is U = ( −2i 2 0 0 ) , and the eigenvectors are v = t (−i 1 ) with t ∈ C, t 6= 0. c©2020 School of Mathematics and Statistics, UNSW Sydney 144 CHAPTER 8. EIGENVALUES AND EIGENVECTORS For λ2 = 1− 2i the eigenvectors are v = t ( i 1 ) with t ∈ C, t 6= 0. ♦ The characteristic polynomial of a real 2× 2 matrix has real coefficients, so has two real roots, one real root with multiplicity 2, or a pair of distinct conjugate complex roots, so the matrix has two real eigenvalues, one real eigenvalue with multiplicity 2, or two distinct conjugate complex eigenvalues. Examples 7 – 10 above show all these possibilities. If A is a complex 2 × 2 matrix, its characteristic polynomial has complex coefficients, and either two distinct complex roots or one complex root with multiplicity 2, and the matrix has two eigenvalues or one eigenvalue with multiplicity 2. For each eigenvalue the space spanned by its corresponding eigenvector(s) is called the eigenspace for that eigenvalue. Thus, when we write down the eigenvectors for a given eigenvalue, we are really recording the basis vectors for the corresponding eigenspace. 8.2 Eigenvectors, bases, and diagonalisation In the examples of the preceding section, we have seen that, with one exception (Example 8), we obtain two linearly independent eigenvectors for a 2× 2 matrix. Since a 2× 2 matrix A represents a linear map whose domain is C2, these two eigenvectors form a basis for the domain. The matrix of Example 8 has one independent eigenvector, and it does not form a basis for the domain. These results can be generalised to matrices of arbitrary size. Theorem 1. If an n × n matrix has n distinct eigenvalues then it has n linearly independent eigenvectors. [X] Proof. Let the set of n distinct eigenvalues of the n×n matrix A be {λ1, . . . , λn} and let vi be a corresponding eigenvectors for λi, 1 6 i 6 n. We shall now prove that S = {v1, . . . ,vn}. is linearly independent. Suppose µ1v1 + · · ·+ µnvn = 0. (#) We show that µ1 = 0. In similar fashion, µ2 = · · · = µn = 0, so v1, · · · ,vn are linearly independent. Apply the matrix (A− λ2I) (A− λ3I) · · · (A− λnI) to both sides of #, then we have If j 6= 1, (A− λ2I) · · · (A− λnI)vj = (λj − λ2) (λj − λ3) · · · (λj − λn)vj = 0 if j 6= 1, (A− λ2I) · · · (A− λnI)v1 = (λ1 − λ2) (λ1 − λ3) · · · (λ1 − λn)v1 6= 0. So µ1(λ1 − λ2) · · · (λ1 − λn)v1 = 0, and that is µ1 = 0. Note. Even if the n × n matrix does not have n distinct eigenvalues, it may have n linearly independent eigenvectors. In Examples 2 and 6 of Section 8.1, we have seen that it is very easy to write down eigenvalues and eigenvectors of diagonal matrices. The next theorem shows that it is sometimes possible to find an equivalent diagonal matrix for a given matrix. c©2020 School of Mathematics and Statistics, UNSW Sydney 8.2. EIGENVECTORS, BASES, AND DIAGONALISATION 145 Theorem 2. If an n × n matrix A has n linearly independent eigenvectors, then there exists an invertible matrix M and a diagonal matrix D such that M−1AM = D. Further, the diagonal elements of D are the eigenvalues of A and the columns of M are the eigen- vectors of A, with the jth column of M being the eigenvector corresponding to the jth element of the diagonal of D. Conversely if M−1AM = D with D diagonal then the columns of M are n linearly independent eigenvectors of A. [X] Proof. Let the n linearly independent eigenvectors of A be {v1, . . . ,vn}. We now form the matrix M with these vectors as its columns, i.e., M = ( v1 v2 · · · vn ) . Then, from the usual rules of matrix multiplication, we have AM = ( Av1 Av2 · · · Avn ) , and from Avi = λivi we have AM = ( λ1v1 λ2v2 · · · λnvn ) . Following the usual rules of matrix multiplication, we can rewrite this equation in the matrix form AM = ( v1 v2 · · · vn ) λ1 0 · · · 0 0 λ2 · · · 0 ... ... . . . ... 0 0 · · · λn =MD, where D = λ1 0 · · · 0 0 λ2 · · · 0 ... ... . . . ... 0 0 · · · λn is the diagonal matrix of eigenvalues. Thus, AM = MD. Further, since the columns of M are a basis for Cn, the equation Mx = b has a unique solution for all b ∈ Cn, and hence M is invertible. Then, on multiplying the equation AM =MD on the left by M−1, we have M−1AM = D. Conversely if M−1AM = D then AM =MD and M is invertible. Let M = ( v1 v2 · · · vn ) and D = λ1 . . . λn then from the first columns of the matrix products on the two sides of AM = MD, we have Av1 = λ1v1. Similarly Avi = λivi, 1 6 i 6 n. Finally the columns of an invertible matrix are linearly independent. c©2020 School of Mathematics and Statistics, UNSW Sydney 146 CHAPTER 8. EIGENVALUES AND EIGENVECTORS Definition 1. A square matrix A is said to be a diagonalisable matrix if there exists an invertible matrix M and diagonal matrix D such that M−1AM = D. Example 1. Show that the matrix A = ( 3 2 2 3 ) is diagonalisable and find an invertible matrix M and diagonal matrix D such that M−1AM = D. Solution. We first find the eigenvalues and eigenvectors of A in the usual way. The eigenvalues are λ1 = 5, λ2 = 1 and corresponding eigenvectors are v1 = ( 1 1 ) and v2 = ( 1 −1 ) . Clearly, v1 and v2 are linearly independent. (Theorem 1 guarantees this, since λ1 6= λ2.) Thus we may apply Theorem 2, letting D be a diagonal matrix with the eigenvalues as its diagonal elements, and M be the matrix with corresponding eigenvectors as its columns. For example, D = ( 5 0 0 1 ) and M = ( 1 1 1 −1 ) . are the required diagonal matrix D and a suitable invertible matrix M . ♦ Note. 1. The results can be checked by direct multiplication of M−1AM . In the above example, we readily obtain M−1 = ( 1 2 1 2 1 2 −12 ) , and then M−1AM =M−1 ( 3 2 2 3 )( 1 1 1 −1 ) = ( 1 2 1 2 1 2 −12 )( 5 1 5 −1 ) = ( 5 0 0 1 ) = D. 2. The choice of D and M is not unique. For example, we could reverse the order of the eigenvalues and set D = ( 1 0 0 5 ) , M = ( 1 1 −1 1 ) . Also non-zero multiples of eigenvectors are eigenvectors, so multiplying any column of M by a non-zero scalar would produce another valid diagonalising matrix. 8.3 Applications of eigenvalues and eigenvectors Some important practical applications have already been noted at the beginning of this chapter. Many of these applications arise from the study of dynamical systems. A dynamical system is essentially any system which changes in time. Some examples of such systems include an electrical power network, a bridge oscillating in a wind, the population of a city or country, an ant colony, a forest, the Australian economy, an atom, an atomic nucleus. c©2020 School of Mathematics and Statistics, UNSW Sydney 8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 147 8.3.1 Powers of A A typical problem in, for example, the study of dynamical systems is to find Ak for positive integers k. There are two results which enable us to easily solve this problem. Proposition 1. Let D be the diagonal matrix D = λ1 0 . . . 0 0 λ2 0 ... . . . ... 0 0 . . . λn . Then, for k > 1, Dk = λk1 0 . . . 0 0 λk2 0 ... . . . ... 0 0 . . . λkn . Proof. The proof is by induction. The result is obviously true for k = 1. Now assume that the result is true for k = m. Then, on multiplying out, Dm+1 = DDm = λ1 0 . . . 0 0 λ2 0 ... . . . ... 0 0 . . . λn λm1 0 . . . 0 0 λm2 0 ... . . . ... 0 0 . . . λmn = λm+11 0 . . . 0 0 λm+12 0 ... . . . ... 0 0 . . . λm+1n . Hence, if the result is true for m it is also true for m+ 1. But, we have already seen that it is true for m = 1, and hence it is true for all positive integers k. The second result that we need is as follows: Proposition 2. If A is diagonalisable, that is, if there exists an invertible matrix M and diagonal matrix D such that M−1AM = D, then Ak =MDkM−1 for integer k > 1. Proof. The proof is by induction. On multiplying M−1AM = D on the left by M and on the right by M−1, we obtain A =MDM−1, c©2020 School of Mathematics and Statistics, UNSW Sydney 148 CHAPTER 8. EIGENVALUES AND EIGENVECTORS and hence the statement of the proposition is true for k = 1. Now suppose the statement of the proposition is true for k = m. Then Am+1 = AAm =MDM−1MDmM−1 =MDDmM−1 =MDm+1M−1, and hence the statement of the proposition is also true for m + 1. Thus, the statement of the proposition is true for all positive integers k. Example 1. Find Ak for A = ( 3 2 2 3 ) . Solution. The first step is to check that A is diagonalisable, and, if it is, to find the matrix M of eigenvectors and diagonal matrix D of eigenvalues such that A =MDM−1. From Example 1 of Section 8.2, suitable matrices are: D = ( 5 0 0 1 ) ; M = ( 1 1 1 −1 ) ; M−1 = ( 1 2 1 2 1 2 −12 ) . Then, Ak =MDkM−1 = ( 1 1 1 −1 )( 5k 0 0 1k ) M−1 = ( 5k 1 5k −1 )( 1 2 1 2 1 2 −12 ) = ( 1 2(5 k + 1) 12(5 k − 1) 1 2(5 k − 1) 12(5k + 1) ) . As a check on this solution, note that we obtain I if we substitute k = 0, and A if we substitute k = 1. ♦ Note. [X] Given a diagonalisable matrix A, we can give meaning to its exponential, using the power series expansion of ex. We substitute A into the expansion ex = 1 + x+ 1 2! x2 + 1 3! x3 + ... replacing 1 by I. Since A is diagonalisable, we can write A = PDP−1, with D diagonal, and a simple calculation shows that I + PDP−1 + 1 2! (PDP−1)2 + 1 3! (PDP−1)3 + ... = P (I +D + 1 2! D2 + 1 3! D3 + ...)P−1. We define this to be the exponential of the matrix. In the case of a 2×2 matrix with distinct eigenvalues λ1, λ2, by adding the entries in the matrix, we have eA = P ( 1 + λ1 + 1 2!λ 2 1 + 1 3!λ 3 1 + ... 0 0 1 + λ2 + 1 2!λ 2 2 + 1 3!λ 3 2 + ... ) P−1 c©2020 School of Mathematics and Statistics, UNSW Sydney 8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 149 = P ( eλ1 0 0 eλ2 ) P−1. In a similar way, one can define the sine and cosine (etc) of a matrix. 8.3.2 Solution of first-order linear differential equations A typical problem in many applications is to find the solution of a pair of first-order linear differential equations with constant coefficients of the form dy1 dt = a11y1 + a12y2 dy2 dt = a21y1 + a22y2, with initial conditions y1(0) and y2(0) given. If t represents time, this pair of equations represents a simple “continuous-time dynamical system.” For example, in a model of a population, y1(t) might be the number of females at time t and y2(t) the number of males at time t. The system of equations then describes how the numbers of females and males change with time. One method of solution of this system is as follows. We first write the equations in matrix form, with y = ( y1 y2 ) , A = ( a11 a12 a21 a22 ) , and obtain dy dt = Ay, with y(0) = ( y1(0) y2(0) ) . In this matrix form it is clear that there is no real restriction on the number of components of the vector y. Equations of this type are important in the study of dynamical systems, where they are given the special name of state-space equations. The vector y in the equation is then called the state vector and t represents time. This type of equation is a generalisation of the one-dimensional, first-order, linear differential equation with constant coefficients that you have met in calculus, and which is of the form dy dt = ay; y(0) = y0 = constant. This equation has a solution y(t) = y0e at. It is therefore plausible to guess that the n- dimensional equation will have a similar exponential type of solution. We therefore guess an exponential solution of the form: y = u(t) = veλt, where λ is a constant scalar and v is a constant vector. On substituting this guess or “trial solution” into the matrix equation we obtain dy dt = λveλt = Ay = Aveλt, c©2020 School of Mathematics and Statistics, UNSW Sydney 150 CHAPTER 8. EIGENVALUES AND EIGENVECTORS which can be rearranged to give eλt(Av − λv) = 0. Now, eλt 6= 0 for all t, and hence our guess is actually a solution only if (A − λI)v = 0. We therefore arrive at the result: Proposition 3. y(t) = veλt is a solution of dy dt = Ay if and only if λ is an eigenvalue of A and v is an eigenvector for the eigenvalue λ. Example 2. Find solutions of dy dt = Ay where A = ( 3 2 2 3 ) . Solution. We first find the eigenvalues and eigenvectors of A. We have obtained these previously, and they are: λ1 = 5, v1 = ( 1 1 ) and λ2 = 1, v2 = ( 1 −1 ) . Hence, two solutions of the equation are u1(t) = e 5t ( 1 1 ) and u2(t) = e t ( 1 −1 ) . ♦ The next point to notice is that the linearity of the differential equation leads to the following proposition. Proposition 4. If u1(t) and u2(t) are two solutions of the equation dy dt = Ay, then any linear combination of u1 and u2 is also a solution. Proof. Let y(t) = α1u1(t) + α2u2(t), where α1 and α2 are scalars. Then d dt ( α1u1(t) + α2u2(t) ) = α1 du1 dt + α2 du2 dt = α1Au1 + α2Au2 = A(α1u1 + α2u2), and the result is proved. c©2020 School of Mathematics and Statistics, UNSW Sydney 8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 151 Example 2 (continued). In our example, we therefore have that y(t) = α1e 5t ( 1 1 ) + α2e t ( 1 −1 ) . is a solution of the linear differential equation. Although we have not proved it, the above solution is the general solution of the original differential equation, that is, every solution of the differential equation is of the above form. Now, since there are two unknown scalars in the general solution, two extra conditions must be specified in order to completely determine the solution. Typical conditions are that the value of y(t) = ( y1(t) y2(t) ) is given at some t, for example, at t = 0. ♦ Example 2 (continued). Find the solution of dy dt = Ay for A = ( 3 2 2 3 ) , given that y(0) = ( 1 −2 ) . Solution. On substituting t = 0 into our general solution of the differential equation, and equating y(0) to the given vector, we obtain y(0) = α1 ( 1 1 ) + α2 ( 1 −1 ) = ( 1 −2 ) . We can now obtain α1 and α2 by solving this pair of linear equations in the usual way. We find α1 = −12 and α2 = 32 , and hence the solution of the differential equation is y(t) = −1 2 e5t ( 1 1 ) + 3 2 et ( 1 −1 ) . ♦ One reason for considering these linear first-order systems of differential equations is that every linear differential equation can be written as a system of linear first-order differential equations. We will illustrate the method with an example. Example 3. Convert the second-order differential equation d2y dt2 + 4 dy dt − 5y = 0 to a system of first-order differential equations. Solution. First define new variables by y1 = y and y2 = dy1 dt = dy dt . Then, on differentiating y2 and using the differential equation, we find dy2 dt = d2y dt2 = 5y − 4dy dt = 5y1 − 4y2. c©2020 School of Mathematics and Statistics, UNSW Sydney 152 CHAPTER 8. EIGENVALUES AND EIGENVECTORS The original second-order equation is therefore equivalent to the pair of first-order equations dy1 dt = y2 dy2 dt = 5y1 − 4y2. This pair of equations can then be rewritten in matrix form as dy dt = Ay, where A = ( 0 1 5 −4 ) . ♦ It is useful to compare the matrix method of solution of a second-order linear differential equation with the method of solution usually taught in calculus courses. The final results obtained by the two methods are, of course, the same. Example 4 (Matrix method). Guess a solution of form y(t) = u(t) = eλtv and substitute in the differential equation. Then, u is a solution if λ and v satisfy the eigenvector equation Av = λv. The eigenvalues are solutions of the characteristic equation det(A− λI) = 0, that is, of det(A− λI) = ∣∣∣∣−λ 15 −4− λ ∣∣∣∣ = λ2 + 4λ− 5 = 0. The roots of the quadratic give the eigenvalues λ1 = −5 and λ2 = 1. A solution to (A+5I)v1 = 0 is the eigenvector v1 = (−1 5 ) , and a solution to (A− I)v2 = 0 is the eigenvector v2 = ( 1 1 ) . The general solution is therefore y(t) = α1e −5t ( −1 5 ) + α2e t ( 1 1 ) . Since y(t) = y1(t), the solution for y(t) in the original second-order equation is y(t) = y1(t) = −α1e−5t + α2et. ♦ Example 5 (Calculus method). We first guess a solution y(t) = eλt, and substitute in the original second-order differential equation to obtain the so-called characteristic equation λ2 + 4λ− 5 = 0. Note that this characteristic equation is identical to the characteristic equation det(A− λI) = 0 of the matrix method. See question 24 of the problems for a generalisation of this result. The roots of the quadratic are λ1 = −5 and λ2 = 1, and hence the general solution is y(t) = α1e −5t + α2et, which is identical to the solution from the matrix method. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 153 In the above example, it is clear that the calculus method gives a much quicker solution than the matrix method. However, the matrix method has the great advantage that it works for a much larger class of differential equations than does the calculus method. One reason that the matrix method works for a larger class of differential equations is that any single higher-order differential equation can be easily converted into a system of first-order equations, but it is extremely difficult to convert a given system of first-order equations into a single higher-order differential equation. It should also be pointed out that the matrix method described above will not work if the matrix A is not diagonalisable. However, an extension of the matrix method which uses “Jordan forms” can be developed to handle this case. Example 6. The atoms in a laser can exist in two states, an “excited state” and a “ground state”. The laser is initially pumped so that it has 80% of its atoms in the excited state and the remaining 20% in the ground state. When the laser is operating, 70% of the excited atoms decay to the ground state per second, whereas 40% of the ground state atoms are pumped up to the excited state per second. Find the percentage of atoms in each state at a time t seconds after the laser starts to operate. Solution. Let x1(t) = % of atoms in excited state at time t x2(t) = % of atoms in ground state at time t. During operation the laser is described by the pair of differential equations dx1 dt = −70x1(t) + 40x2(t) dx2 dt = 70x1(t)− 40x2(t) That is, in matrix form, dx dt = Ax(t), where A = ( −70 40 70 −40 ) . The eigenvalues of A are λ1 = 0 and λ2 = −1.1, and corresponding eigenvectors are v1 = ( 4 7 ) and v2 = (−1 1 ) . The general solution is therefore x(t) = α1 ( 4 7 ) + α2e −1.1t ( −1 1 ) . The initial condition of the laser is given as x(0) = ( 80 20 ) , and hence the values of α1 and α2 can be determined from the equations x(0) = ( 80 20 ) = α1 ( 4 7 ) + α2 ( −1 1 ) , for which the solution is α1 = 9 1 11 and α2 = −43 711 . c©2020 School of Mathematics and Statistics, UNSW Sydney 154 CHAPTER 8. EIGENVALUES AND EIGENVECTORS Thus, the complete solution is x1(t) = 1 11 ( 400 + 440e−1.1t ) x2(t) = 1 11 ( 700 − 440e−1.1t) Note that, as t → ∞, e−1.1t → 0, and hence the laser settles into a “steady-state” operation in which there are 400/11 = 36 411% of the atoms in the excited state and 700/11 = 63 7 11% of the atoms in the ground state. The “steady-state” solution for large t is a scalar multiple of the eigenvector v1 = ( 4 7 ) corresponding to the eigenvalue λ1 = 0. ♦ 8.3.3 [X] Markov chains Matrices are very useful in studying many discrete-time dynamical systems. Dynamical systems are ones where the state of the system at stage k + 1 depends solely on the state at stage k. Example 7. In a certain experiment, a psychologist was testing the learning abilities of rats by getting them to run a maze. The experimenter started with 100 rats, none of which had previously run the maze. She then set each of the 100 rats in turn at the maze and noted whether it successfully ran the maze. She then repeated the process several more times. She found that, on average, 10% of the rats which failed at one attempt were successful on their next attempt, whereas 95% of the rats which were successful at one attempt were also successful at their next attempt. (These numbers are meant for illustration only. They are not taken from actual experimental data). For this experiment, calculate the approximate number of rats which successfully run the maze on the 3rd run, the 20th run and the 50th run. Solution. Let x1(k) = number of rats successfully completing the maze at the kth run x2(k) = number of rats failing the maze at the kth run. Then, in the (k + 1)th run, we have x1(k + 1) = 0.95x1(k) + 0.10x2(k) x2(k + 1) = 0.05x1(k) + 0.90x2(k), which can be written in matrix form as x(k + 1) = Ax(k), where A = ( 0.95 0.10 0.05 0.90 ) . We note that the unique solution of the equation is x(k) = Akx(0), as can easily be checked by direct substitution in x(k + 1) = Ax(k). c©2020 School of Mathematics and Statistics, UNSW Sydney 8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 155 In our problem, x(0) = ( 0 100 ) , since at the beginning there were 100 rats, none of which had successfully run the maze. We now calculate Ak to complete the solution. The eigenvalues of A are λ1 = 1 and λ2 = 0.85, and corresponding eigenvectors are v1 = ( 2 1 ) and v2 = (−1 1 ) . Thus, A is diagonalisable and suitable choices for M , D and M−1 are M = ( 2 −1 1 1 ) , D = ( 1 0 0 0.85 ) , and M−1 = ( 1 3 1 3 −13 23 ) . Thus, x(k) =MDkM−1x(0) = ( 2 −1 1 1 )( 1k 0 0 (0.85)k )( 1 3 1 3 −13 23 )( 0 100 ) = 100 3 ( 2 ( 1− (0.85)k)( 1 + 2(0.85)k ) ) . Note: As a check, if we substitute k = 0 in this expression, we obtain x(0) = ( 0 100 ) . Further, for k = 1, we have x(1) = 100 3 ( 2(1− 0.85) (1 + 2(0.85)) ) = ( 10 90 ) , which equals Ax(0) = ( 0.95 0.10 0.05 0.90 )( 0 100 ) = ( 10 90 ) . Then, for k = 3 the solution is x(3) = (25.72, 74.28), and hence approximately 26 rats will successfully complete the maze on the third run. For k = 20, the solution is x(0) = ( 64.08 35.92 ) , and hence approximately 64 rats will successfully complete the maze on the 20th run. For k = 50, the solution is x(50) = ( 66.65 33.35 ) , and hence approximately 67 rats will successfully complete the maze on the 50th run. Note that, for large values of k, x(k) is approximately equal to ( 6623 3313 ) , which corresponds to approximately 67 rats successfully completing the maze on a given run. Note that this solution for large values of k is a scalar multiple of the eigenvector ( 2 1 ) corresponding to the eigenvalue λ1 = 1, which is the eigenvalue of A with largest magnitude. ♦ Systems such as the one in Example 7 are called Markov chains. In these systems the objects or individuals can be in one of a certain number of states and the system is modelled by the matrix equation x(k + 1) = Ax(k) c©2020 School of Mathematics and Statistics, UNSW Sydney 156 CHAPTER 8. EIGENVALUES AND EIGENVECTORS where x(k) = x1(k)... xn(k) gives the number of individuals in each of the n states at time k. The n×n matrix A has the property that all its entries are non-negative, and, for j = 1, . . . , n, ∑n i=1 aij = 1. In other words, each column sums to 1. The number aij is the probability that an individual changes from state j to state i. Usually what we are interested in is finding the long term behaviour of such a system. That is, how Ak behaves as k → ∞. It turns out that the behaviour exhibited in Example 7 is typical of these systems. For any such matrix, λ = 1 is an eigenvalue, and indeed is the eigenvalue of largest magnitude. In almost all cases, Akx(0) converges to a multiple of the eigenvector corresponding to λ = 1. The limit vector in these cases depends only on the number of individuals involved, and not on the initial distribution of the individuals into the particular states. We conclude this section by proving that λ = 1 is always an eigenvalue of such matrix, i.e. the columns sums are all one. Lemma 5. If λ is an eigenvalue of A, then λ is also a eigenvalue of AT . Proof. Question 13 in the problems for this chapter. Theorem 6. Suppose that A is n × n matrix and that the sum of each of the columns of A is 1. Then A has 1 as an eigenvalue. Proof. The hypothesis on A = (aij) is that a11 + a21 + · · ·+ an1 = 1 a12 + a22 + · · ·+ an2 = 1 ... ... a1n + a2n + · · · + ann = 1 or equivalently, AT 1 1 ... 1 = 1 1 ... 1 Thus, 1 is an eigenvalue of AT , and by the preceding lemma, is thus an eigenvalue of A. (Note that, in general, (1 1 · · · )T will not be its eigenvector.) 8.4 Eigenvalues and MAPLE The LinearAlgebra package in Maple has procedures for doing all the calculations described in this section. The command with(LinearAlgebra): loads the LinearAlgebra commands. If A is a square matrix then Determinant(A-t); c©2020 School of Mathematics and Statistics, UNSW Sydney 8.4. EIGENVALUES AND MAPLE 157 produces the characteristic polynomial for A. Actually, Maple has a command which will also do this directly. A slight complication here is that CharacteristicPolynomial(A,t); gives the polynomial det(tI−A). This of course has the same roots as the polynomial we use. You can (and should at least once!) use solve or fsolve to find the roots of this equation, or you can use the eigenvalues command directly. You can then use kernel to find the eigenvectors. evals:=Eigenvalues(A); NullSpace(A-evals[1]); This will give you a set containing a basis for the eigenspace. Use op to strip off the braces if necessary. You can also get Maple to do all this for you at once, and more. The command EV:=Eigenvectors(A); returns a sequence with two elements. The first is a Vector with the eigenvalues as entries and the second is a Matrix whose columns are the eigenvectors in the same order. This matrix is thus a diagonalising matrix for A, if one exists. Thus if you then do EV[2]^(-1).A.EV[2] you will get a diagonal matrix — the same matrix that DiagonalMatrix(EV[1]); would give. c©2020 School of Mathematics and Statistics, UNSW Sydney 158 CHAPTER 8. EIGENVALUES AND EIGENVECTORS Problems for Chapter 8 Problems 8.1 : Definitions and examples 1. [R] Let A = ( 3 0 0 −4 ) , B = ( 2 0 0 2 ) , C = ( −3 0 0 0 ) , and let e1 and e2 be the standard basis vectors for R 2. a) Write down the eigenvalues and eigenvectors of A, B and C. b) Draw a sketch of e1, Ae1, e2, Ae2. Then, for some vector x which is not parallel to either e1 or e2 draw a sketch of x and Ax. c) Repeat part (b) for the matrix B. Comment on any differences you observe between the results for A and B. d) Repeat part (b) for the matrix C. Again comment on any differences you observe between the results for A and C. e) For x 6= 0, prove algebraically that Ax is parallel to x if and only if x is parallel to either e1 or e2, that Bx is parallel to x for all x and that Cx is parallel to e1 for all x. 2. [R] Show that the vector ( 1 1 ) is an eigenvector of the matrix ( 5 −3 2 0 ) and find the corre- sponding eigenvalue. 3. [X] Let A be a fixed 3× 3 matrix and define a linear map T :M33 →M33 by T (X) = AX. If λ is a real eigenvalue of T corresponding to an invertible eigenvector X, find λ in terms of det(A). 4. [H] Let T be the linear map which reflects vectors in R2 about the line y = x. a) Explain why ( 1 1 ) and ( 1 −1 ) are eigenvectors of T and give their corresponding eigenvalues. b) Find the matrix A such that Tx = Ax for all x ∈ R2. 5. [R] Find the eigenvalues and eigenvectors for a) A = ( 6 −2 6 −1 ) , b) A = (−5 2 −6 3 ) . 6. [X] For each of the matrices in the preceding question find two independent eigenvectors v1 and v2. On one diagram sketch the lines ℓ1 = {x : x = µv1, µ ∈ R} ℓ2 = {x : x = µv2, µ ∈ R} c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 8 159 and the parallelogram P = {x : x = µ1v1 + µ2v2, for 0 6 µ1 6 1, 0 6 µ2 6 1}. Then identify and sketch (on a separate diagram) {y : y = Ax, x ∈ ℓ1} {y : y = Ax, x ∈ ℓ2} {y : y = Ax, x ∈ P}. Describe the linear mapping Tx = Ax geometrically. 7. [R] Find the eigenvalues and eigenvectors of the following matrices. In each case, note if the eigenvalues are real, occur in complex conjugate pairs, or are general complex numbers. Also note if the eigenvectors form a basis for C2. a) ( 1 2 2 1 ) , b) ( 2 1 0 2 ) , c) ( 3 5 0 −6 ) , d) ( 0 −2 1 2 ) , e) ( 4 2i 2i 6 ) , f) ( 4 −2i 2i 6 ) . 8. [H] Show that the eigenvalues of a square row-echelon form matrix U are equal to the diag- onal elements of the matrix. (A square row-echelon form matrix is also called an upper triangular matrix). 9. [R] Find the eigenvalues and eigenvectors of the row-echelon matrix U = 2 −4 1 3 0 −2 1 −3 0 0 3 3 0 0 0 5 . 10. [R] Find the eigenvalues and eigenvectors of the following matrices. a) A = 1 3 02 2 0 0 0 6 . b) B = 3 0 00 −4 −1 0 6 3 . Problems 8.2 : Eigenvectors, bases, and diagonalisation 11. [R] For each of the matrices in Questions 7, 9 and 10, decide if the matrix is diagonalisable, and if it is find an invertible matrix M and a diagonal matrix D such that D =M−1AM . 12. [H] Show that if λ is an eigenvalue of A then λ is also an eigenvalue of the matrix A′ = B−1AB, where B is any invertible matrix. Also show that if v is an eigenvector of A for eigenvalue λ then B−1v is an eigenvector of A′ for eigenvalue λ. c©2020 School of Mathematics and Statistics, UNSW Sydney 160 CHAPTER 8. EIGENVALUES AND EIGENVECTORS 13. [H] Show that if λ is an eigenvalue of A then λ is also an eigenvalue of AT . HINT: Use the characteristic equation and the properties of determinants. 14. [X] Let A be an n× n matrix. Let TA : Cn → Cn be the linear transformation defined by TA(x) = Ax for x ∈ Cn. Let the columns of an n× n matrix B be an ordered basis for Cn. Show that the matrix representing TA with respect to the basis formed by the columns of B is B −1AB. HINT. The method used in Example 3 of Section 7.6 might be helpful. NOTE. Modern methods of finding eigenvalues search for a change of basis which makes A′ = B−1AB into an upper triangular matrix. As shown in question 4 the eigenvalues are then the diagonal elements of the upper triangular matrix. The actual algorithms for finding the change of basis are complicated. 15. [X] Let T : V → V be linear. Show that if B is any basis for V and A is the matrix representing T with respect to the basis B in both domain and codomain of T then the eigenvalues of T and A are the same. What is the relation between the eigenvectors of T and A? Problems 8.3 : Applications of eigenvalues and eigenvectors 16. [R] Let A = ( 0 6 1 −1 ) . Diagonalise A and hence find A5. 17. [R] Let A = ( 0 3 8 2 ) . a) Find the eigenvalues and eigenvectors of A. b) Find matrices P and D such that A = PDP−1. c) Write down an expression for An in terms of P and D. Hence evaluate AnP. 18. [R] A first-order linear difference equation (often called a first-order linear recurrence relation) is an equation of the form xk+1 = Axk, where k = 0, 1, 2, . . . , and where A is a fixed matrix. The solution of this equation is xk = A kx0, as you can check by direct substitution. For the diagonalisable matrices of questions 7 a,c, find Ak and hence evaluate xk. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 8 161 19. [R] For each of the diagonalisable matrices of questions 5, 9 and 10 find general solutions of the differential equations dy dt = Ay. 20. [R] a) Find the eigenvalues and eigenvectors of ( 2 3 1 4 ) . b) Hence solve the system of differential equations: dx1 dt = 2x1 + 3x2, dx2 dt = x1 + 4x2. 21. [R] Solve the following systems of differential equations, given that x(0) = y(0) = 100. a) dx dt = 5x− 8y dy dt = x− y b) dx dt = 3x− 15y dy dt = x− 5y 22. [R] Solve the following second-order linear differential equations with constant coefficients by the “calculus method” and by the matrix method and compare your answers. a) 5 d2y dt2 − 6dy dt + y = 0. b) d2y dt2 − 16y = 0. 23. [R] What happens if you try to solve the second-order equation d2y dt2 + 4 dy dt + 4y = 0 by the matrix method? 24. [X] Consider the second-order linear differential equation a d2y dt2 + b dy dt + cy = 0, where a, b, c ∈ R and a 6= 0. a) Assume that the solutions to the characteristic equation aλ2 + bλ + c = 0 for this second-order differential equation are distinct. By making the substitutions y1 = y and y2 = dy1 dt , convert the differential equation into a system of first-order linear differential equations dy dt = Ay, where y = ( y1 y2 ) . c©2020 School of Mathematics and Statistics, UNSW Sydney 162 CHAPTER 8. EIGENVALUES AND EIGENVECTORS b) Using matrix methods, show that the general solution of this system is y = α1e λ1t ( 1 λ1 ) + α2e λ2t ( 1 λ2 ) . Compare this solution with that obtained using the usual “calculus method” of solving the original second-order linear differential equation. 25. [X] A radioactive isotope A decays at the rate of 2% per century into a second radioactive isotope B, which in turn decays at a rate of 1% per century into a stable isotope C. a) Find a system of linear differential equations to describe the decay process. If we start with pure A, what are the proportions of A, B, and C after 500 years, after 1000 years, and after 1000000 years? First solve this problem using matrix methods, and then try to solve the problem directly by solving the original two differential equations in the right order. b) Explain how the problem would be different if the rates of decay of A and B were both 2% per century. 26. [X] There are 3 mathematics lecturers A, B and C who are teaching parallel streams in al- gebra to a total of 900 students. At the first lecture equal numbers go to each lecture group. After each lecture a certain percentage of the students in each group decide to stay with the same lecturer while the remaining percentage divide evenly among the other two lecturers for the next lecture. If 98% of A’s students stay with A each time, 96% of B’s students stay with B and 94% of C’s students stay with C, find the numbers of students in each group in the 12th lecture and in the 24th lecture. Make the assumption that no students stop attending lectures. HINT. Set up a model as a difference equation of the type given in Question 18. You may use MAPLE to find all eigenvalues and eigenvectors. Alternatively, if you wish to solve the problem by hand calculations, you will need to know that one of the eigenvalues is 1. NOTE. This problem is an example of a Markov chain process. Markov chain processes are important in many areas of mathematics and its applications, such as statistics, psychol- ogy, finance, economics, operations research, queueing theory, inventory theory, diffusion processes, theory of epidemics etc. 27. [X] Repeat the previous question on the following assumptions. After each lecture, 1% of each group stop attending lectures altogether, and the remaining percentage either stay with the same lecturer or divide equally among the other two lecturers for the next lecture. If 97% of A’s students stay with A each time, 95% of B’s students stay with B and 93% of C’s students stay with C, find the numbers of students in each group in the 12th lecture and in the 24th lecture. Also find the total number of students attending lectures in the 12th lecture and in the 24th lecture. 28. [X] Consider a modified version of the population dynamics model of Example 9 of Section 7.5, in which all females are assumed to die at age 74 instead of at age 89, as in the model given. Use eigenvalues and eigenvectors to solve this modified model, given that there are c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 8 163 one million females in each age group at January 1, 1970. What happens to the population for large values of k? NOTE. You will need to use Maple to find the eigenvalues and eigenvectors of the matrix A, and you may also use Maple, if you wish, to carry out all other matrix manipulations required to solve the problem. 29. [X] Let A be a 2 × 2 matrix with the property that all its entries are non-negative and both its columns sum to 1. Show that λ1 = 1 is always an eigenvalue for A, and that if λ2 is another eigenvalue of A then −1 ≤ λ2 6 1. Problems 8.4 : Eigenvalues and MAPLE 30. [M] Show that the matrix of the original population dynamics model of Example 9 of Section 7.5 is not diagonalisable. HINT. Use Maple to find the eigenvalues, and then show that the eigenvalue λ = 0 has multiplicity 2 and that dim(ker(A)) = 1, i.e., a basis for the kernel of A − 0I consists of one vector. NOTE. The original population dynamics model can be solved by a generalisation of the eigenvalue-eigenvector methods which makes use of “Jordan forms”, and is covered in our second year linear algebra subjects. 31. [M] Using the following Maple session, > with(LinearAlgebra): > M:=<<6|2|2>,<-2|8|4>,<0|1|7>>; M := 6 2 2−2 8 4 0 1 7 > I3:=IdentityMatrix(3); I3 := 1 0 00 1 0 0 0 1 > p:=Determinant(M-t*I3); p := 336 − 146 t+ 21 t2 − t3 > solve(p,t); 6, 7, 8 > NullSpace(M-6*I3); 1−1 1 > NullSpace(M-7*I3); 20 1 c©2020 School of Mathematics and Statistics, UNSW Sydney 164 CHAPTER 8. EIGENVALUES AND EIGENVECTORS > NullSpace(M-8*I3); 21 1 a) state the eigenvalues and corresponding eigenvectors for the matrix M , b) find a matrix A such that A−1MA is a diagonal matrix D and write down D, c) calculate A−1 and hence find an explicit formula for Mk where k is a positive integer. c©2020 School of Mathematics and Statistics, UNSW Sydney 165 Chapter 9 INTRODUCTION TO PROBABILITY AND STATISTICS “What IS the use of repeating all that stuff?” the Mock Turtle interrupted . . . Lewis Carroll, Alice in Wonderland. This chapter introduces mathematical probability, random variables, and probability distri- butions. The concepts, methods, and applications are required in statistics courses that include MATH2801/MATH2901 – (Higher) Theory of Statistics, a core subject for the mathematics and statistics majors, and MATH2089/MATH2099, which are compulsory courses for many second year engineering students. Statistics is the science of turning raw data into reliable information on which decisions can be made, given randomness or variation in the original data. As a science, it aims to uncover patterns in observations that can be described by mathematical or heuristic models. It is also concerned with formulating and testing various hypotheses about the context from which the data are drawn. For instance in order to predict voting patterns in an election, opinions are sought from voters by carrying out opinion polls. It might not be possible to obtain the views of all voters, so the preferences of a relatively small sample of voters are obtained. As a result, the opinion poll can only provide an estimate of the true proportion of voters who favour a particular political party. It is important to quantify how accurate this estimate can be expected to be. How many voters must be polled in order for this estimate to be reasonably accurate? How big is the measurement error? Statistical science has the answers to these frequently asked and important questions. These answers have had immediate utility. It traditionally cost time and resources to poll voters, so it mattered whether 1,000 voters formed an adequate sample or whether 10,000 voters were required. This simple but typical example illustrates nicely the three essential aspects of statistical science: data production, data analysis, and statistical inference. Data production How many sample units should be taken, how should they be selected, and what data should be measured on each unit? An important part of data production is controlling the measurement error that invariably arises. c©2020 School of Mathematics and Statistics, UNSW Sydney 166 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Data Analysis In order to be presented transparently, data must be organised into easily understood forms, often as graphics, tables, and summary “statistics”, such as sample means or sample proportions. For opinion polling, a simple summary of the sample proportions provide the information sought. Statistical Inference This is the process of drawing valid conclusions about a whole population based on information obtained from a part of the population. An essential ingredient here is a random sample from the population. Given the data obtained from a random sample of voters, what can one infer about the general voting patterns? The transition from population to random sample is one instance in which the notion of proba- bility becomes important. To create a random sample, we must know the probability that a given member of the population will be selected in the sample. Conversely, probability allows us to model and forecast real-world behaviour in terms of random processes. In these ways, probability theory and statistics play important roles in countless contexts, such as clinical trials, weather forecasting, finance, or traffic control, to name just a few. First, let us recollect some background set theory and notation. 9.1 Some Preliminary Set Theory A probability model consists of two components: 1. A set of possible outcomes; 2. The probability of each outcome or set of outcomes. In this section, we present basic set theory as background material for the first of these components. The second component will be addressed in Section 9.2. Definition 1. A set is a collection of objects. These objects are called elements. We write x ∈ A to express that x is an element of a set A. If x is not an element of A, then we write x /∈ A. Example 1. The set A = {1, 2, 3} has elements 1, 2, and 3. Thus, 1 ∈ A but 4 /∈ A, say. ♦ The above definition is circular and imprecise. For instance, it is vulnerable to Russell’s Paradox (briefly discussed in MATH1081 Discrete Mathematics). One could improve the definition by insisting that each set must have the property that each conceivable element is either completely in the set or completely outside of the set, but not both. However, one must improve the definition further in order to guard it from contradiction, and this is in fact difficult. Fortunately for our purposes, the above naive definition suffices. c©2020 School of Mathematics and Statistics, UNSW Sydney 9.1. SOME PRELIMINARY SET THEORY 167 Definition 2. • A set A is a subset of a set B (written A ⊆ B) if and only if each element of A is also an element of B; that is, if x ∈ A, then x ∈ B. • The power set P(A) of A is set of all subsets of A. • The universal set S is the set that denotes all objects of given interest. • The empty set ∅ (or {}) is the set with no elements. Example 2. The set A = {1, 2, 3} has eight subsets. For instance, {2, 3} ⊆ A. The power set of A is the set of these eight subsets, namely P(A) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} . ♦ Example 3. For problems in 3-dimensional vector geometry, the universal set is usually S = R3. Points, lines, and planes are then subsets of S. ♦ Definition 3. A set S is countable if its elements can be listed as a sequence. More formally, S is countable if and only if there is a one-to-one function from S to N. Example 4. • Every finite set is countable. • The integers are countable since we can list them as follows: 0, 1,−1, 2,−2, . . .. • The rationals are countable. (Challenge: can you list them as a sequence?) • The reals are not countable; this can be shown by a simple and elegant proof known as Cantor’s Diagonal Argument. Sets are often visualised by a Venn diagram as regions in the plane. For instance, here is a Venn diagram of a universal set S containing a set A: S A Definition 4. For all subsets A,B ⊆ S, define the following set operations: • complement of A: Ac = {x ∈ S : x /∈ A} • intersection of A and B: A ∩B = {x ∈ S : x ∈ A and x ∈ B} • union of A and B: A ∪B = {x ∈ S : x ∈ A or x ∈ B} • difference: A−B = {x ∈ S : x ∈ A but x /∈ B} = A ∩Bc c©2020 School of Mathematics and Statistics, UNSW Sydney 168 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Following mathematical convention, “or” in the union definition means “one or the other or both”. Venn diagrams describing the above set operations are given below. S A Ac B S A A ∩B B S A A ∪B B S A A−B Example 5. Let S be all students enrolled in MATH1231, let A be those who are 20 years or older, and let B be those who are engineering students. Then Ac are MATH1231 students at most 19 years old A ∩B are MATH1231 students studying engineering and who are 20 years or older A ∪B are MATH1231 students who study engineering or are 20 years or older A−B are MATH1231 students who do not study engineering but who are 20 years or older.♦ Definition 5. Sets A and B are disjoint (or mutually exclusive) if and only if A ∩B = ∅ A Venn diagram showing two disjoint sets A and B is given below: B S A A ∩B = ∅ Example 6. Let S be the set of people in Australia, let A be the set of people enrolled in MATH1231, and let B be the set of people aged under 10. Then A and B are disjoint. ♦ Definition 6. Disjoint subsets A1, . . . , Ak partition a set B if and only if A1 ∪ · · · ∪Ak = B Note that A and Ac partition the universal set S for each subset A of S. Example 7. The sets A1 = {1, 3}, A2 = {2}, A3 = {4, 5} partition the set B = {1, 2, 3, 4, 5}. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 9.1. SOME PRELIMINARY SET THEORY 169 The following simple result will often be used in the rest of the chapter, sometimes implicitly. Lemma 1. If A1, . . . , An partition S and B is a subset of S, then A1 ∩B, . . . , An ∩B partition B. This result is illustrated below. S B A1 · · · An There are many laws governing set operations. Here are just as few: Distributive Laws A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C) A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C) De Morgan’s Laws (A ∪B)c = Ac ∩Bc (A ∩B)c = Ac ∪Bc These laws can be proved by logical arguments or by sketching the Venn diagrams for the left-hand and right-hand sides of the identities. Venn diagrams for De Morgan’s Laws are given here: B S A (A ∪B)c = Ac ∩Bc B S A (A ∩B)c = Ac ∪Bc Definition 7. If A is a set, then |A| is the number of elements in A. Note that if A and B are disjoint, then |A ∪B| = |A|+ |B| The Inclusion-Exclusion Principle. |A ∪B| = |A|+ |B| − |A ∩B| This result is clear once a Venn diagram is drawn. The Inclusion-Exclusion Principle may be extended to any finite number of sets. For instance, |A ∪B ∪C| = |A|+ |B|+ |C| − |A ∩B| − |A ∩ C| − |B ∩ C|+ |A ∩B ∩ C| . Note that for any subset A of S, we have S = A∪Ac and so |Ac| = |S|−|A|. Hence, for example, |(A ∪B)c| = |S| − |A ∪B|. The following example makes use of this idea. c©2020 School of Mathematics and Statistics, UNSW Sydney 170 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Example 8. Of 20 music students, 7 play guitar, 8 play piano, and 3 play both guitar and piano. How many play neither guitar nor piano? Solution. Let S be the set of all the music students; let G be the set of students who play guitar; and let P be the set of students who play piano. By the information given, |S| = 20 |G| = 7 |P | = 8 |P ∩G| = 3 . By the Inclusion-Exclusion Principle, the number of students who play neither piano nor guitar is |(G ∪ P )c| = |S| − |G ∪ P | = |S| − (|G| + |P | − |G ∩ P |) = 20− (7 + 8− 3) = 8 . Alternatively, we can draw a Venn diagram of the problem and deduce the answer by filling in the number of elements in each region: P G 3 54 8 ♦ 9.2 Probability The notion of luck is ancient and has often been seen as an inherent quality that individuals or objects might possess and whose nature is determined by Fate, whims of the Gods, karmic justice, mana-like association with other instances of luck, and many other mechanisms. By associating with lucky individuals or objects, by acting righteously, or by appealing to the Gods, one might improve one’s luck during one’s present life. Gambling is the competitive realisation of this belief in influencing one’s luck, and it too is ancient. Good gamblers have appeared throughout history, and many prominent and talented mathematicians have focused much of their work on gambling problems and strategies, particularly in the 16-18th centuries. However, most of this work addressed specific problems and was stunted by incorrect intuitions and by an unfortunate focus on ratios and odds. This focus is still present in gambling today, where odds are given, rather than percentages. Apart from a few important exceptions, it was only relatively recently, in the first half of the 20th century, that the notion of luck was treated rigorously and systematically by mathematicians. Of note, A. Kolmogorov put forth a set of axioms in 1933 that provided a solid framework for dealing mathematically with the notion of luck, or in mathematical terms: probability. 9.2.1 Sample Space and Probability Axioms In order to develop a framework for probability, we will first think of any given situation that leads randomly to a set of outcomes as an experiment. Thus, the roll of a die is seen as an experiment, as is the Melbourne Cup; countless other such experiments abound, including financial markets, the weather, election outcomes, or what grade you might get for this course. Definition 1. A sample space of an experiment is a set of all possible outcomes. c©2020 School of Mathematics and Statistics, UNSW Sydney 9.2. PROBABILITY 171 Outcomes are also called sample points. Example 1. Tossing a coin may be seen as an experiment. An appropriate sample space is the set S = {H,T} where H (“head”) and T (“tail”) are the two possible outcomes. Example 2. Tossing a coin 3 times can be seen as another experiment. If the object of the experiment is to determine the resulting coin-flip sequence, then an appropriate sample space is S1 = {HHH,HHT,HTH, THH,HTT, THT, TTH, TTT} . On the other hand, if the object of the experiment were to determine the number of resulting heads, then an appropriate sample space is S2 = {0, 1, 2, 3} . Thus, the experiment and its sample space depends on the type of data that we wish to observe. It is often useful to consider sets of outcomes, particularly if the number of outcomes is large. This leads to the next definition. Definition 2. An event is a subset of a sample space. Note that the set of all events in a sample space S is exactly the power set P(S). Note also that the empty set ∅ and the whole space S are events. Example 3. Toss a coin 3 times and consider the event A that we toss 2 heads. This is the subset of the sample space S1 of Example 2 given by A = {HHT,HTH, THH} . Note that each of the outcomes in A forms an event by itself: {HHT}, {HTH}, {THH}. In each of the above examples, each possible outcome has equal probability. This is not generally true, so we must define probability in full generality. Definition 3. A probability P on a sample space S is any real function on P(S) that satisfies the following conditions: (a) 0 6 P (A) 6 1 for all A ⊆ S; (b) P (∅) = 0; (c) P (S) = 1; (d) If A and B are disjoint, then P (A ∪B) = P (A) + P (B). c©2020 School of Mathematics and Statistics, UNSW Sydney 172 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Example 4. Toss a coin and observe whether H or T is tossed. The appropriate sample space is the set S = {H,T}. Define the probability P on S as follows, for each event A ⊆ S: P (A) = |A| 2 . Then P ({H}) is the probability of tossingH, namely P ({H}) = 12 |{H}| = 12 . Similarly, P ({T}) = 12 . Note that the probability of tossing neither H nor T is P (∅) = 0, and that the probability of tossing either H or T is P (S) = 1. This probability is exactly the probability that one would usually think of when tossing a coin. However, there are many other possible probabilities. For instance, let p be some real number between 0 and 1, and define the function Q on S by Q(∅) = 0 Q({H}) = p Q({T}) = 1− p Q(S) = 1 . It is easy to verify that Q is a probability on S. To find a physical interpretation of this probability, one could think of a coin that is twisted or bent, so that the probability of tossingH is not necessarily the same as that of tossing T . Bear in mind, however, that a probability is a mathematical object that need not always model a real-world phenomenon. Example 5. Toss a coin 3 times and observe the resulting (ordered) sequence of H and T , as in Example 2 above. Let S be the natural sample space consisting of all 8 such sequences. The appropriate probability P is then given as follows, for each event A ⊆ S: P (A) = |A| 8 . For instance, consider the event A that we toss 2 heads. The probability of this happening is P (A) = |A| 8 = |{HHT,HTH, THH}| 8 = 3 8 . Now consider the event B = {HHH} that we toss 3 heads. Since |B| = 1, we see that P (B) = 18 . The probability of tossing at least 2 heads is P (A∪B) which, since A and B are disjoint, equals P (A ∪B) = P (A) + P (B) = 3 8 + 1 8 = 1 2 . Example 6. Roll a die and observe the resulting number. An appropriate sample space is then S = {1, . . . , 6}. The appropriate probability P is given as follows, for each event A ⊆ S: P (A) = |A| 6 . For example, consider the event A that we roll an even number. The probability of this occurring is P (A) = |A| 6 = |{2, 4, 6}| 6 = 3 6 = 1 2 . Theorem 1. Let P be a probability on a sample space S, and let A be an event in S. 1. If S is finite (or countable), then P (A) = ∑ a∈A P ({a}) . c©2020 School of Mathematics and Statistics, UNSW Sydney 9.2. PROBABILITY 173 2. If S is finite and P ({a}) is constant for all outcomes a ∈ S, then P (A) = |A||S| . 3. If S is finite (or countable), then ∑ a∈S P ({a}) = 1 . Note that if S is finite, then P (A) may be seen as size, or ratio, of A compared to S. In general, P (A) may be seen as a measure of how large A is compared to S. Outcomes whose probabilities are all equal are often referred to as “equally likely”. Proof. The finite case of statement 1 follows by induction using the additive condition (d) in the definition of a probability. We will ignore the general case of statement 1 but note that it is often given as an axiom for probabilities. Statement 3 follows immediately from statement 1 and by noting that P (S) = 1. Let us now prove the statement 2, so suppose that S is finite and that P ({a}) is equal to the constant p for all outcomes a ∈ S. By statement 3, 1 = ∑ a∈S P ({a}) = ∑ a∈S p = |S|p , so p = 1 |S| . By statement 1, P (A) = ∑ a∈A P ({a}) = ∑ a∈A 1 |S| = |A| |S| . ♦ Example 7. The natural probabilities P in Examples 4–6 may each be expressed as P (A) = |A| |S| where A is any event in S. This reflects the fact that each outcome is equally likely. Example 8. Pick a ball at random from a bag containing 3 red balls and 7 blue balls. If each ball has the same chance as being picked as any other ball, then the chance of picking a red ball is 310 . Let us express this in mathematical terms. Let S be the sample space consisting of all 10 balls. Next, let A be the event that a red ball is chosen; A is then the set containing the three red balls. Since the probability of each outcome is the same, the probability of picking a red ball is P (A) = |A| |S| = 3 10 , as expected. ♦ The definition of probability only states what is required of a probability; it does not help us decide upon an appropriate probability for a given experiment. This sort of decision is called “allocating the probabilities” and is generally based on one of the following three methods: Method 1. Allocate the probabilities on the basis of any inherent symmetry in the situation. This is what is applied in games of chance, as illustrated by the die-rolling or coin-tossing exam- ples that we have seen. It is how you calculated with probability at school by using counting, permutations, and combinations with equally likely outcomes. It is also used to allocate probabilities in the following sort of experiment. A “wheel of fortune” wheel is spun. The probability that it points to some region which subtends an angle θ is θ360 . This is an example of an experiment with a sample space that is not countable, let alone finite. c©2020 School of Mathematics and Statistics, UNSW Sydney 174 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Method 2. Allocate the probabilities on the basis of experience or large amounts of data. This is what actuaries regularly do when creating life tables. The probability of a typical Australian male aged 60 living to age 65 is 95.4%, based on past history of lots of males aged 60 years. Method 3. Guess or use intuition. This is common in society but should be avoided by non-experts when dealing with serious issues. Indeed, the gambling industry, insurances, supermarket prices, and so on, and even much of politics and lawmaking, all rely on the common person’s inability to properly understand probabilities, particularly when it comes to odds of winning or risk of injury. Actuaries and mathematicians educate themselves to avoid falling for common misconceptions; however, even these experts should not rely heavily on their intuition of probability. 9.2.2 Rules for Probabilities Theorem 2. Let A and B be events of a sample space S. 1. P (A ∪B) = P (A) + P (B)− P (A ∩B) (Addition Rule) 2. P (Ac) = 1− P (A) 3. If A ⊆ B, then P (A) 6 P (B). Proof. 1. This result is connected to the Inclusion-Exclusion Principle. 2. The sets A and Ac partition S, so P (A) + P (Ac) = P (A ∪Ac) = P (S) = 1. 3. The sets A and B −A partition B, so P (A) 6 P (A) + P (B −A) = P (B). Example 9. What is the probability that at least two of n people share the same birthday? Solution. Ignoring leap years, let Y be the set of the 365 days of the year. An experiment could here be to discover the n birthdays, and an associated sample space is Sn = {(b1, . . . , bn) : b1, . . . , bn ∈ Y } . We wish to calculate P (An) for the event An ⊆ Sn that at least two of the n people share the same birthday. Assume that the probability of each person being born on a given date does not depend on the person or on the date. The outcomes of the sample space then have constant probability, namely 1|Sn| = 1 365n . Therefore, P (An) = 1− P (Acn) = 1− |A c n| |Sn| . Now, A c n is the event that none of the n birthdays are the same. Therefore, P (An) = 1− P (Acn) = 1− |Acn| |Sn| = 1− 365 × 364× · · · (365 − n+ 1) 365n . It is thus slightly more likely than not that at least two of 23 people have the same birthday (P (A23) = 50.7%), and it is highly likely that at least two of 57 people share the same birthday (P (A57) = 99.01%). Of course, there will always be at least two people with the same birthday whenever there are more people than days in the year, and this is expressed by the probability P (An) = 1 for n > 365. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 9.2. PROBABILITY 175 Example 10. In some town, 80% of the population has comprehensive car policies, 60% has house cover, and 10% has neither. What percent has both covers? Solution. Let A be the event “a person has comprehensive car cover” and let B be the event “a person has house cover”. For any random person, P (A) = 0.8 , P (B) = 0.6 , and P (Ac ∩Bc) = 0.1 . Hence, P (A ∪B) = 1− P ((A ∪B)c) = 1− P (Ac ∩Bc) = 1− 0.1 = 0.9. Therefore, P (A ∩B) = P (A) + P (B)− P (A ∪B) = 0.8 + 0.6 − 0.9 = 0.5 . In other words, 50% of the population has both covers. ♦ 9.2.3 Conditional Probabilities We now consider what happens if we restrict the sample space from S to some event in S. Example 11. In Example 10 above, 80% of people have comprehensive car cover. However, of those people who have house cover, the percentage who also have comprehensive car cover is 50 60 = 0.833 or 83.3% . Thus when we restrict our sample space to those having house cover, the percentage of those having comprehensive cover changes. We say that the conditional probability of a person having comprehensive cover given that they have house cover is 0.833 . ♦ Definition 4. The conditional probability of A given B is denoted and defined by P (A|B) = P (A ∩B) P (B) provided that P (B) 6= 0 . Lemma 3. For any fixed event B, the function P (A|B) is a probability on S. Proof. Check that the probability conditions are satisfied for P (A|B). Since P (S) = 1, we can write, for each event A of S, P (A) = P (A ∩ S) P (S) = P (A|S) . Just as P (A) can be seen as a measure of A compared to S, P (A|B) = P (A∩B)P (B) can be seen as a measure of A (or the part of A that is contained in B) compared to B. This is illustrated by the following Venn diagrams: S B A B A ∩B c©2020 School of Mathematics and Statistics, UNSW Sydney 176 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Example 12. We roll a die and let A and B be the events that we roll a six and that we roll an even number, respectively. Then P (A) = 16 and P (B) = 3 6 = 1 2 . Since in this case A ∩B = A, P (A|B) = P (A ∩B) P (B) = P (A) P (B) = ( 1 6 )( 1 2 ) = 1 3 . In other words, given that we rolled an even number, the probability of having rolled a six is 13 . This is as one would expect since there are 3 even rolls (2,4,6) of which one is six. In contrast, the probability of rolling an even number, given that we rolled a six, is P (B|A) = P (B ∩A) P (A) = P (A) P (A) = 1 . ♦ Example 13. Consider a bag containing 3 red balls and 3 blue balls. First draw one ball from the bag and then draw another. Let Ri be the events that a red ball is chosen on the ith draw, where i = 1, 2, and define B1(= R c 1) and B2(= R c 2) similarly for the blue balls. The probability of first drawing a red ball or a blue is the same, namely P (R1) = P (B1) = 3 6 = 1 2 . Now, suppose that we first draw a red ball. The bag then contains 2 red balls and 3 blue balls, so the probability of drawing a red ball on the second draw is P (R2|R1) = 25 . We can also calculate this probability from the definition of conditional probability. Of the 6×52 = 15 ways of choosing two of the six balls without order, three of ways give us two red balls. Therefore, we see that P (R1 ∩R2) = 315 = 15 , so P (R2|R1) = P (R1 ∩R2) P (R1) = ( 1 5 )( 1 2 ) = 2 5 . In contrast, P (R2) = P (B2) = 1 2 since there are equally many red and blue balls to begin with. Rearranging the terms in the definition of conditional probability yields the following identities. Multiplication Rule P (A ∩B) = P (A|B)P (B) = P (B|A)P (A) Example 14. Consider the bag of red and blue balls in Example 13. We saw that the probability of drawing a red ball on the second draw, given that we first drew a red ball, is P (R2|R1) = 25 . Similarly, it is easy to see that P (R2|B1) = 35 . To calculate P (R2) without using the symmetry argument in Example 13, first note that R2 is partitioned by R2∩R1 and R2∩B1: either a red ball is first drawn, followed by another red ball, or a blue ball and then a red ball are drawn. Hence by the Multiplication Rule, P (R2) = P ((R2 ∩R1) ∪ (R2 ∩B1)) = P (R2 ∩R1) + P (R2 ∩B1) = P (R2|R1)P (R1) + P (R2|B1)P (B1) = 2 5 × 1 2 + 3 5 × 1 2 = 1 2 , as expected. ♦ Conditional probabilities are implicitly used whenever a tree diagram is drawn. Thus in a typical two stage experiment we have the following tree diagram c©2020 School of Mathematics and Statistics, UNSW Sydney 9.2. PROBABILITY 177 b A Ac B Bc B Bc P (A) P (Ac) P (B|A) P (Bc|A) P (B|Ac) P (Bc|Ac) Example 15. Consider 3 urns containing red and blue balls: • Urn 1 contains 10 balls, of which 3 are red and 7 are blue; • Urn 2 contains 20 balls, of which 4 are red and 16 are blue; and • Urn 3 contains 10 balls, of which 0 are red and 10 are blue. First, an urn is chosen at random; then a ball is chosen from it at random. (a) What is the probability that a red ball is chosen from urn 2? (b) What is the probability of choosing a red ball? (c) If a red ball were chosen, what is then the probability that it came from Urn 2? Solution. Assume that we are equally likely to choose any urn and, given an urn, are equally likely to choose any ball in it. Let U1, U2, and U3 denote the event of choosing Urn 1, 2, and 3, respectively, and let R and B denote the event of then choosing a red or blue ball, respectively. Then P (U1) = P (U2) = P (U3) = 1 3 and P (R|U1) = 3 10 P (R|U2) = 4 20 = 1 5 P (R|U3) = 0 P (B|U1) = 7 10 P (B|U2) = 16 20 = 4 5 P (B|U3) = 1 . (a) Therefore, P (R ∩ U2) = P (R|U2)P (U2) = 1 5 × 1 3 = 1 15 . (b) Similarly, P (R ∩ U1) = 3 10 × 1 3 = 1 10 and P (R ∩ U3) = 0 10 × 1 3 = 0. Now, R = (R ∩ U1) ∪ (R ∩ U2) ∪ (R ∩ U3) and the three terms are disjoint, so P (R) = P (R ∩ U1) + P (R ∩ U2) + P (R ∩ U3) = 1 10 + 1 15 + 0 = 1 6 . (c) Finally, P (U2|R) = P (U2 ∩R) P (R) = ( 1 15 )( 1 6 ) = 2 5 . ♦ The above example shows one instance of a multi-stage experiment. The conditional probabil- ities and the multiplication rule for these sort of experiments can be illustrated and applied using tree diagrams. Such diagrams illustrate all possible outcomes of the multi-stage experiment as well as the way in which one is able to arrive at those outcomes via the various stages. The branches carry the conditional probability of the right hand node given that you are at the left node. The probability of getting from one node to a node to its right is obtained by multiplying the probability on the connecting branches. A typical sequence of branches is of the form c©2020 School of Mathematics and Statistics, UNSW Sydney 178 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS b b b b Start A B really B ∩A C really C ∩B ∩A P (A) P (B|A) P (C|B ∩A) Here, P (C ∩B ∩A) = P (C|B ∩A)P (B ∩A) = P (C|B ∩A)P (B|A)P (A) . Example 16. The tree diagram for the multi-stage experiment given in Example 15 is as follows: b U1 U2 U3 B R B R B R 1 3 1 3 1 3 1 0 4 5 1 5 7 10 3 10 ← P (B ∩ U3) = 13 × 1 = 13 ← P (R ∩ U3) = 13 × 0 = 0 ← P (B ∩ U2) = 13 × 45 = 415 ← P (R ∩ U2) = 13 × 15 = 115 ← P (B ∩ U1) = 13 × 710 = 730 ← P (R ∩ U1) = 13 × 310 = 110 Since sequences of branches represent disjoint events, P (R) = P (R ∩ U1) + P (R ∩ U2) + P (R ∩ U3) = 1 10 + 1 15 + 0 = 1 6 . ♦ Tree diagrams are very useful for visualising and calculating problems involving small numbers of conditional probabilities. However, tree diagrams are infeasible when these numbers are large or only implicitly given. We now derive a mathematical rule that enables us to deal with such cases. In particular, suppose that the n events A1, . . . , An partition the sample space S: S B A1 · · · An Since the sets A1 ∩B, . . . , An ∩B partition the event B, we see that P (B) = P (A1 ∩B) + · · ·+ P (An ∩B) . Applying the Multiplication Rule to each of these n terms yields the following rule. Total Probability Rule If A1, . . . , An partition S and B is an event, then P (B) = n∑ i=1 P (B|Ai)P (Ai) . c©2020 School of Mathematics and Statistics, UNSW Sydney 9.2. PROBABILITY 179 Note that we have already used this rule implicitly, for instance to calculate P (R2) in Example 13, and to calculate P (R) in part (b) of Example 15. Although the Total Probability Rule is a very simple and almost obvious result, it is very useful. Furthermore, it implies Bayes’ Rule, which is non-trivial and which often offers surprising results. Bayes’ Rule If A1, . . . , An partition S and B is an event, then P (Aj |B) = P (B|Aj)P (Aj)n∑ i=1 P (B|Ai)P (Ai) . Proof. Apply the Total Probability Rule to the identity P (Aj |B) = P (Aj ∩B) P (B) . Example 17. Consider the urns and the ball of Example 15 above, and suppose that we drew a blue ball from one of the urns. What is then the probability P (U1|B) that we drew it from Urn 1? Solution. Now, U1, U2, U3 partition S since we must choose exactly one urn. Thus by Bayes’ Rule, P (U1|B) = P (B|U1)P (U1) P (B|U1)P (U1) + P (B|U2)P (U2) + P (B|U3)P (U3) = 7 10 × 13 7 10 × 13 + 45 × 13 + 1× 45 = 7 25 ♦ This example illustrates how Bayes’ Rule allows reverse-inference. To some, this can seem counter-intuitive and has even caused contention and controversy. Nevertheless, Bayes’ Rule re- mains a very useful statistical tool that is widely used in medical trials, court cases, and elsewhere. Example 18. A certain diagnostic test for a disease X indicates with 99% accuracy that a person has X when that person actually has it. Similarly, the test indicates with 98% accuracy that someone does not have X when they do not in fact have it. In medical terms, the test is “positive” if it indicates that a person has the disease and is “negative” otherwise. Suppose that 2% of the population has the disease. Find the probability of a false positive, namely that a person without X still tests positive. Solution. One might guess that this probability would be very small since the test seems so accurate. Let us calculate whether or not this is true. Thus, let D be the event that the person has disease X, let Tp be the event that the test shows positive, and set Tn = T c p . The tree diagram is b D Dc Tp Tn Tp Tn .02 .98 .99 .01 .02 .98 so by Bayes’ Rule, P (Dc|Tp) = P (Tp|D c)P (Dc) P (Tp|Dc)P (Dc) + P (Tp|D)P (D) = .02× .98 .02 × .98 + .99× .02 = 0.497 . c©2020 School of Mathematics and Statistics, UNSW Sydney 180 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Thus, almost 50% positives are false, which might seem outrageously inaccurate. Fortunately, the probability of a false negative, that is, someone with X testing negative, is almost negligible: P (D|Tn) = P (Tn|D)P (D) P (Tn|D)P (D) + P (Tn|Dc)P (Dc) = .01× .02 .01× .02 + .98 × .98 = .000208 . Thus, if someone is tested for disease X and the test shows that they do not have X, then they almost certainly do not have it. On the other hand, if the test is positive, then they might possibly have X, and more accurate (and presumably more expensive or time-consuming) tests can then made made to determine whether they do in fact have X. ♦ Example 19. We modify Example 18 slightly by supposing that fraction x of population has disease X and that 5% test positive when a large random sample is tested. What percentage of the population has disease X? Solution. Here, the tree diagram is b D Dc Tp Tn TP Tn x 1− x .99 .01 .02 .98 By the Total Probability Rule, 0.05 = P (Tp) = P (Tp|D)P (D) + P (Tp|Dc)P (Dc) = .99x + .02(1 − x) = .97x+ .02 . Therefore, x = 0.030.97 ≈ .031 = 3.1% of the population has the disease. ♦ 9.2.4 Statistical Independence Intuitively, two events A and B are mutually independent if one does not influence the probability of the other. This can be expressed as P (A|B) = P (A) and P (B|A) = P (B). That is, the probability of A does not depend on whether or not B is given, and the same is true for the probability of B. Since P (A∩B) = P (A|B)P (B) = P (B|A)P (A), we can express this independence quite elegantly: Definition 5. Events A and B are (statistically) independent if and only if P (A ∩B) = P (A)P (B) Note that in contrast to conditional probability, this definition allows all probabilities, including 0. To visualise statistical independence, let A and B be independent events with P (B) 6= 0. Then, since P (S) = 1, P (A) P (S) = P (A ∩B) P (B) . Thus, the probability measure of A is just as great in comparison to the whole sample space S as is A ∩B, the part of A in B, when compared to B. The following Venn diagram illustrates this: c©2020 School of Mathematics and Statistics, UNSW Sydney 9.2. PROBABILITY 181 S B A A ∩B Note that independence and disjointness are not the same concept, as might be supposed. Indeed, these two concepts are almost opposite in nature: non-empty events A and Ac are disjoint but are strongly dependent, for if A occurs, then Ac cannot (independently) occur, and vice versa. Example 20. We roll a die and let A and B be the events that we roll a six and that we roll an even number, respectively. Then P (A) = 16 and P (B) = 3 6 = 1 2 . Since in this case A ∩B = A, P (A ∩B) = P (A) = 1 6 and P (A)P (B) = 1 6 × 1 2 = 1 12 , so P (A ∩B) 6= P (A)P (B). Therefore, A and B are not independent; that is, they are dependent. This is as one would expect since if A occurs, then B must necessarily occur. Now, define A′ to be the event that we roll either a five or a six. Then since A′ ∩B = A, P (A′) = 2 6 = 1 3 and P (A′ ∩B) = P (A) = 1 6 , so P (A′)P (B) = 13× 12 = 16 = P (A′∩B). Therefore, A′ and B are independent. In other words, the probability of rolling an even number is the same, namely 12 , whether or not one of the numbers five and six is rolled, and the converse is equally true. ♦ Example 21. Roll a dice twice and, for each i = 1, . . . , 6, let Xi and Yi denote the events that we get i on the first roll and second roll, respectively. Under usual conditions, we may assume that the first and second throws have no influence on each other. Therefore, Xi and Yj are independent for all i, j, so P (Xi ∩ Yj) = P (Xi)P (Yj) = 1 6 × 1 6 = 1 36 . This identity allows us to calculate more complicated events, such as the event S4 that the sum of the two rolls is 4. Since S4 is partitioned by X1 ∩ Y3, X2 ∩ Y2, X3 ∩ Y1, we see that P (S4) = P ((X1 ∩ Y3) ∪ (X2 ∩ Y2) ∪ (X3 ∩ Y1)) = P (X1 ∩ Y3) + P (X2 ∩ Y2) + P (X3 ∩ Y1) = 1 36 + 1 36 + 1 36 = 1 12 . This result could also have been obtained by viewing this experiment as having a sample space of the 36 equally likely outcomes S = {(1, 1), (1, 2), . . . , (6, 6)} where, for instance, (3, 5) denotes that we first rolled three and then five. Then P (S4) = |{(1, 3), (2, 2), (3, 1)}| |S| = 3 36 = 1 12 , as before. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 182 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS We now consider the statistical independence of any finite number of events. Definition 6. Events A1, . . . , An are mutually independent if and only if, for any Ai1 , . . . , Aik of these, P (Ai1 ∩ · · · ∩Aik) = P (Ai1)× · · · × P (Aik) . Example 22. Events A, B, C are mutually independent if and only if these four identities all hold: P (A ∩B) = P (A)P (B) P (A ∩ C) = P (A)P (C) P (B ∩ C) = P (B)P (C) P (A ∩B ∩B) = P (A)P (B)P (C) ♦ In general for n events to be mutually independent, they must satisfy non-trivial 2n − n − 1 identities such as the ones above, one for each subset of {1, . . . , n} with at least two elements. None of these many identities imply any of the others in general, so we cannot make do with a smaller set of identities. This is illustrated by the following example. Example 23. Draw a ball from a bag containing four balls marked 0 to 3. For i = 0, . . . , 3, let Ai be the event that ball i is drawn, and let Bi = A0 ∪Ai be the event that ball 0 or ball i is drawn. Then for all distinct i, j = 1, 2, 3, P (Bi) = P (A0 ∪Ai) = 2 4 = 1 2 and P (Bi ∩Bj) = P (A0) = 1 4 . Hence, P (Bi ∩Bj) = P (Bi)P (Bj), so Bi and Bj are independent. In contrast, P (B1 ∩B2 ∩B3) = P (A0) = 1 4 6= 1 8 = P (B1)P (B2)P (B3) , so B1, B2, B3 are not mutually independent. ♦ If events A and B are independent, then A and Bc are also independent: P (A ∩Bc) = P (A)− P (A ∩B) = P (A)− P (A)P (B) = P (A)(1 − P (B)) = P (A)P (Bc) . By modifying these calculations slightly and using induction, we can proof the more general result: Theorem 4. If events A1, . . . , An are mutually independent and Bi is either Ai or A c i for each i = 1, . . . , n, then B1, . . . , Bn are also mutually independent. Suppose that events A, B, and C are mutually independent. Then by Theorem 4, P (A ∩ (B ∪ C)) = P (A ∩ ((B − C) ∪ (B ∩ C) ∪ (C −B))) = P (A ∩B ∩ Cc) + P (A ∩B ∩ C) + P (A ∩ C ∩Bc) = P (A)P (B ∩ Cc) + P (A)P (B ∩ C) + P (A)P (C ∩Bc) = P (A)(P (B − C) + P (B ∩ C) + P (B − C)) = P (A)P (B ∪ C) . We see that A and B ∪ C are also independent. By generalising the above calculations, one may prove the following result. c©2020 School of Mathematics and Statistics, UNSW Sydney 9.3. RANDOM VARIABLES 183 Theorem 5. If events A1,1, . . . , A1,n1 , A2,1, . . . , Am,nm are mutually independent and for each i = 1, . . . ,m, the event Bi is obtained from Ai,1, . . . , Ai,ni by taking unions, intersections, and complements, then B1, . . . , Bn are also mutually independent. Example 24 (A reliability example). A 3-engine plane has a central engine and two wing engines. The plane will crash if the central engine and at least one of the wing engines fail. On any given flight, the central engine fails with probability 0.005 , and each wing engine fails with probability 0.008 . Assuming that the three engines fail mutually independently, find the probability that the plane will crash during a flight. Solution. Let A be the event that the port engine fails, let B be the event that the starboard engine fails, and C be the event that the central engine fails. Then P (A) = P (B) = 0.008 and P (C) = 0.005. Let D denote the event that the plane crashes and note that D = C∩(A∪B). Since A, B, and C are mutually independent, C and A ∪B are independent by Theorem 5. Therefore, P (D) = P (C ∩ (A ∪B) = P (C)P (A ∪B) = P (C)[P (A) + P (B)− P (A ∩B)] = P (C)[P (A) + P (B)− P (A)P (B)] = 0.005[0.008 + 0.008 − 0.008 × 0.008] = 0.00007968 . (Note: There are other ways to do this problem.) Under our assumptions, the plane will crash on a given flight with probability slightly less than eighty in one million. These are dangerous assumptions, however, since it is highly optimistic to hope that the engines will fail independently of each other. For instance, if engine failure is caused by volcanic ash, then all three engines will be at risk of failure, and there are countless other factors, like shared electric wiring, that might similarly introduce dependence. In view of this, the real probabilities might be considerably higher than stated. ♦ 9.3 Random Variables It can often be useful to label the outcomes of an experiment by numbers. This often makes event notation more flexible, and it allows us to perform arithmetic on the outcomes. Definition 1. A random variable is a real function defined on a sample space. Example 1. Toss a coin and let S = {H,T} be the associated sample space. Two random variables X and Y on S are given as follows, for each outcome s ∈ S: X(s) = { 1 if s = H 0 if s = T Y (s) = { 1 if s = H −1 if s = T Example 2. Roll a die and let S = {1, . . . , 6} be the associated sample space. Two random variables X and Y on S are given as follows, for each outcome s ∈ S: X(s) = s Y (s) = { −1 if s is odd 1 if s is even c©2020 School of Mathematics and Statistics, UNSW Sydney 184 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Definition 2. For a random variable X on some sample space S, define for all subsets A ⊆ S and real numbers r ∈ R, • {X ∈ A} = {s ∈ S : X(s) ∈ A} • {X = r} = {s ∈ S : X(s) = r} • {X 6 r} = {s ∈ S : X(s) 6 r} • ... and so on. We suppress the curly brackets when expressing the probability of these events. For instance, P ({X = r}) is written as P (X = r). Example 3. Roll a die and let X and Y be the random variables defined in Example 2. Then P (X > 4) = P ({5, 6}) = 1 3 P (Y = 1) = P ({2, 4, 6}) = 1 2 P (Y = π) = P (∅) = 0 . Example 4. Toss a coin three times and let the random variable X count the number of heads tossed. Then X(S) = {0, 1, 2, 3} and P (X = 0) = P ({TTT}) = 1 8 P (X = 1) = P ({HTT, THT, TTH}) = 3 8 P (X = 2) = P ({HHT,HTH, THH}) = 3 8 P (X = 3) = P ({HHH}) = 1 8 . Definition 3. The cumulative distribution function of a random variable X is given by FX(x) = P (X 6 x) for x ∈ R . We often refer to FX(x) as just F (x). Note that F is non-decreasing and that if a 6 b, then P (a < X 6 b) = F (b)− F (a) and 0 = lim x→−∞F (x) 6 F (a) 6 F (b) 6 limx→∞F (x) = 1 . Example 5. Toss a coin three times and let random variable X be the number of heads tossed, as in Example 4. Then F (3 2 ) = P ( X 6 3 2 ) = P (X = 0) + P (X = 1) = 1 8 + 3 8 = 1 2 . c©2020 School of Mathematics and Statistics, UNSW Sydney 9.3. RANDOM VARIABLES 185 9.3.1 Discrete Random Variables The image of a function is the set of its function values. Definition 4. A random variable X is discrete if its image is countable. The random variables in Examples 1–5 are each discrete since their images are finite and thus countable. We shall for now only consider discrete random variables but will consider certain non-discrete random variables in Section 9.5, namely those that are continuous. Definition 5. The probability distribution of a discrete random variable X is some description of all the probabilities of all events associated with X. We sometimes write the probabilities as pk = P (X = xk). Note that for a discrete random variable X, the cumulative distribution function F (x) is F (x) = ∑ k6x pk . Thus, in practice to show that {pk}k>0 is a probability distribution, we need to show that: (i) pk > 0 and (ii) ∑ k pk = 1. Example 6. Roll a die and define random variables X and Y as in Example 2. The probability distributions of X and Y can for instance be represented as P (X = x) = { 1 6 if x ∈ {1, . . . , 6} 0 otherwise and yk −1 1 pk = P (Y = yk) 1 2 1 2 Clearly in each case pk > 0 and for the random variable X, we have ∑ k pk = 6 × 1 6 = 1, while for the random variable Y , ∑ k pk = 1 2 + 1 2 = 1. Hence, these are probability distributions for the random variables X and Y respectively. ♦ Example 7. Roll a die twice and let S = {(i, j) : i, j = 1, . . . , 6} be the associated sample space. Let X be the random variable defined by X(i, j) = i + j, that is, X is the sum of the numbers showing. The probability distribution of X is xk 2 3 4 5 6 7 8 9 10 11 12 pk 1 36 2 36 3 36 4 36 5 36 6 36 5 36 4 36 3 36 2 36 1 36 Clearly, pk > 0. Also,∑ k pk = 1 36 (1 + 2 + 3 + 4 + 5 + 6 + 5 + 4 + 3 + 2 + 1) = 1. c©2020 School of Mathematics and Statistics, UNSW Sydney 186 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Hence pk is a probability distribution. ♦ Note that, in the above example, we did not actually need to specify the sample space; it would suffice to define X to be the sum of rolls. Indeed, we often specify the probability distribution without defining the sample space or even, at times, a random variable with that distribution. Example 8. The probability distribution of discrete random variable X is given as follows: xk 0 1 2 4 7 pk = P (X = xk) 0.2 0.3c 0.2 c 2 0.5 (a) Find the value of c. (b) Find P (X ≥ 4). Solution. (a) ∑ pk = c 2 + 0.3c + 0.9 = 1, or c2 + 0.3c− 0.1 = 0. Solving this gives c = −0.5 or c = 0.2. Since 0.3c = p1 ≥ 0, we conclude that c = 0.2. (b) By (a), the probability distribution is xk 0 1 2 4 7 pk 0.2 0.06 0.2 0.04 0.5 Hence, P (X ≥ 4) = P (X = 4) + P (X = 7) = 0.04 + 0.5 = 0.54 . ♦ 9.3.2 The Mean and Variance of a Discrete Random Variable As we have seen in the above examples, the use of random variables can simplify the description of events and their probabilities. We will now see how they also enable us to perform arithmetic on outcomes. In particular, we can calculate the weighted averages of outcome values; this is the expected value, or mean. We can also measure the average of the squares of the distances from the mean to the outcome values; this is called the variance. Definition 6. The expected value (or mean) of a discrete random variable X with probability distribution pk E(X) = ∑ all k xkpk . The expected value E(X) is often denoted by µ or µX . Example 9. Toss a coin three times and let X count the number of heads tossed as in Example 4. The probability distribution of X is xk 0 1 2 3 pk 1 8 3 8 3 8 1 8 c©2020 School of Mathematics and Statistics, UNSW Sydney 9.3. RANDOM VARIABLES 187 so the expected value of X is E(X) = ∑ k pk xk = 0× 1 8 + 1× 3 8 + 2× 3 8 + 3× 1 8 = 3 2 . This agrees with our intuition: on average, half of the throws will be heads. Example 10. Roll a die twice and let the random variable X be the sum of the rolls. Since a die roll i is as likely as the roll 7− i, we see that X and 14−X have the same probability distribution, so the expected value of X is E(X) = 142 = 7. Let us check this using the definition of E(X) and the probabilities P (X = x) given in Example 10: E(X) = 2× 1 36 + · · · + 7× 6 36 + · · ·+ 12× 1 36 = 7 . ♦ Theorem 1. Let X be a discrete random variable with probability distribution pk = P (X = xk). Then for any real function g(x), the expected value of Y = g(X) is E(Y ) = E(g(X)) = ∑ k g(xk)pk . [X] Proof. Let {yj} = {g(xk)} be the set of function values of Y , and note that P (Y = yj) = P ({s ∈ S : g(X(s)) = yj}) = P ( ⋃ k : g(xk)=yj {s ∈ S : X(s) = xk} ) = ∑ k : g(xk)=yj P (X = xk) . By changing order of summation, we therefore see that E(Y ) = ∑ j yjP (Y = yj) = ∑ j yj ∑ k : g(xk)=yj P (X = xk) = ∑ j ∑ k : g(xk)=yj g(xk)pk = ∑ k ∑ j : yj=g(xk) g(xk)pk = ∑ k g(xk)pk . The final equality is valid because the second sum only sums over a single element j. Example 11. Toss a coin three times and let X be the number of heads tossed, as in Examples 4 and 9. By Theorem 1, the expected value of X2 is E(X2) = ∑ k pk x 2 k = 0 2 × 1 8 + 12 × 3 8 + 22 × 3 8 + 32 × 1 8 = 3 . ♦ The expected value of a random variable X describes where the values of X are centred. We can also measure how widely the values of X spread, namely by the average distance (squared) between the values and the mean. c©2020 School of Mathematics and Statistics, UNSW Sydney 188 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Definition 7. The variance of a discrete random variable X is Var(X) = E ( (X − E(X))2) . The standard deviation of X is SD(X) = √ Var(X) . The standard deviation is often denoted by σ or σX , and the variance is often written as σ 2 or σ2X . Theorem 2. Var(X) = E(X2)− (E(X))2. This formula is often useful in hand calculations. Proof. Write µ for E(X). By the definition of the variance we have, Var(X) = ∑ k (xk − µ)2pk = ∑ k (x2k − 2xkµ+ µ2)pk = ∑ k x2kpk − 2µ ∑ k xkpk + µ 2 ∑ k pk = E(X 2)− 2µ2 + µ2 = E(X2)− (E(X))2 . Example 12. Toss a coin three times and let X be the number of heads tossed. We saw in Examples 9 and 11 that E(X) = 32 and E(X 2) = 3. Thus by Theorem 2, the variance of X is Var(X) = E(X2)− (E(X))2 = 3− (3 2 )2 = 3 4 . ♦ Example 13. Consider a random variable X with probability distribution given below: xk 0 1 2 4 7 pk 0.2 0.06 0.2 0.04 0.5 The expected values of X and X2 are E(X) = ∑ xk pk = 0× 0.2 + 1× 0.06 + 2× 0.2 + 4× 0.04 + 7× 0.5 = 4.12 E(X2) = ∑ x2k pk = 0 2 × 0.2 + 12 × 0.06 + 22 × 0.2 + 42 × 0.04 + 72 × 0.5 = 26.0 and the variance of X is Var(X) = E(X2)− (E(X))2 = 26− (4.12)2 = 9.256 . Thus, the average root mean square distance of the values of X to the mean E(X) is roughly√ 9.256 ≈ 3. ♦ There is generally no easily-described relationship between Var(Y ) and Var(X) when Y = g(X). However, if Y = aX +B is a linear function of X, then we have the following simple identities: c©2020 School of Mathematics and Statistics, UNSW Sydney 9.4. SPECIAL DISTRIBUTIONS 189 Theorem 3. If a and b are constants, then E(aX + b) = aE(X) + b Var(aX + b) = a2Var(X) SD(aX + b) = |a|SD(X) . Proof. Write µ for E(X), we have E(aX + b) = ∑ (axk + b)pk = a ∑ xkpk + b ∑ pk = aE(X) + b . By Theorem 2 and the above identity, Var(aX + b) = E(aX + b− E(aX + b))2 = E(aX − aE(X))2 = a2E(X −E(X))2 = a2Var(X) . The third statement follows by definition from the second. The variance σ2 = Var(X) of a random variable X gives a measure of the average square distance from the expected value E(X) to the values of X. 9.4 Special Distributions Often, statistical models incorporate specific classes of probability distributions whose expected value and variance are known. In this section, we shall consider two such classes, namely the Binomial distributions and the Geometric distributions. These both involve Bernoulli trials and Bernoulli processes. A Bernoulli trial is an experiment with two outcomes, often “success” and “failure”, or Y(es) and N(o), or {1, 0}, where P (Y ) and P (N) are denoted by p and q = 1 − p, respectively. A Bernoulli process is an experiment composed of a sequence of identical and mutually independent Bernoulli trials. More particularly, the events Ai, denoting the success of the ith trial, are mutually independent. We have already seen examples of Bernoulli processes in previous sections, such as tossing a coin repeatedly and considering head-outcomes (p = 12); rolling a die multiple times to obtain sixes (p = 16 ); or asking each of several people whether it is their birthday (p = 1365). Example 1. Tossing a coin three times is a Bernoulli process with n = 3 identical trials that each result in either H or T , with probabilities p = q = 12 . The trials are mutually independent since the coin tosses do not influence each other. Let us formally verify this claim. Let A1, A2, and A3 to be the events that H is tossed on the 1st, 2nd, and 3rd toss, respectively. Then P (A1) = P ({HTT,HTH,HHT,HHH}) = 4 8 = 1 2 . Similarly, P (A2) = P (A3) = 1 2 . Therefore, P (A1 ∩A2) = P ({HHT,HHH}) = 2 8 = 1 4 = P (A1)P (A2) , and, similarly, P (A2 ∩A3) = P (A2)P (A3) and P (A1 ∩A3) = P (A1)P (A3). Finally, P (A1 ∩A2 ∩A3) = P ({HHH}) = 1 8 = P (A1)P (A2)P (A3) . We see that the events A1, A2, A2 are indeed mutually independent. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 190 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Throughout the remainder of this section, let p be a real number with 0 < p < 1 and let q = 1− p. 9.4.1 The Binomial Distribution Recall that the expression (n k ) denotes the number of ways to select k objects from n distinct objects, with order unimportant, and is given by n! k!(n− k)! . Definition 1. The Binomial distribution B(n, p) for n ∈ N is the function B(n, p, k) = ( n k ) pk(1− p)n−k where k = 0, 1, . . . , n . Note that B(n, p, k) is a probability distribution. To see this, we can use the Binomial Theorem: ∑ k B(n, p, k) = n∑ k=0 B(n, p, k) = n∑ k=0 ( n k ) pk qn−k = (p + q)n = 1n = 1 . Since B(n, p, k) is nonnegative, we conclude that 0 6 B(n, p, k) 6 1 for all k. Theorem 1. If X is the random variable that counts the successes of some Bernoulli process with n trials having success probability p, then X has the binomial distribution B(n, p). We write X ∼ B(n, p) to denote that X is a random variable with this distribution. Proof. The variable X can assume values k = 0, 1, . . . , n so we must calculate pk = P (X = k) for these values. Suppose that the first k trials each results in Y(es) and the rest each results in N(o): Y · · ·Y︸ ︷︷ ︸ k N · · ·N︸ ︷︷ ︸ n−k The trials are independent, so this outcome has probability pk(1− p)n−k. In general, there are( n k ) = n! k!(n − k)! ways for precisely n trials with k Y’s to occur. Therefore, pk = P (X = k) = ( n k ) pk(1− p)n−k = B(n, p, k) . Example 2. Toss a coin n = 3 times and let X be the random variable counting the number of resulting heads (H). The tosses are identical and mutually independent with probability p = 12 of resulting in H. Thus, Theorem 1 implies that X ∼ B(3, 12 ). This tells us everything about the probabilities of X; for instance, P (X = 2) = (3 2 ) ( 1 2 )2 (1 2 )3−2 = 38 . ♦ Probabilities such as P (X > t), P (X > t), and P (|X − E(X)| > t) are each referred to as a tail probability. c©2020 School of Mathematics and Statistics, UNSW Sydney 9.4. SPECIAL DISTRIBUTIONS 191 Example 3. Roll a die n = 12 times and let X be the number of resulting sixes. The rolls are identical and mutually independent with probability p = 16 of resulting in a six, so by Theorem 1, X ∼ B(12, 16). Thus for instance, we can calculate the following tail probability: P (X > 9) = P (X = 10) + P (X = 11) + P (X = 12) = ( 12 10 )( 1 6 )10(5 6 )2 + ( 12 11 )( 1 6 )11(5 6 ) + ( 1 6 )12 ≈ 7.86× 10−7 . We see that the likelihood of rolling more than nine sixes is less than one in a million. ♦ Example 4. Ask n = 40 people whether today is their birthday and let X count the Yes-answers. Assume that these questions form identical and mutually independent trials with probability p = 1365 of resulting in a Yes-answer. By Theorem 1, X ∼ B(40, 1365 ). The likelihood of today being the birthday of at least one of these people is P (X > 1) = 1− P (X = 0) = 1− ( 364 365 )40 ≈ 10.4% . ♦ Theorem 2. If X is a random variable and X ∼ B(n, p), then • E(X) = np ; • Var(X) = npq = np(1− p). Proof. First note that for k ≥ 1, k ( n k ) = k n! k!(n− k)! = n (n − 1)! (k − 1)!(n − 1− (k − 1))! = n ( n− 1 k − 1 ) . Hence, E(X) = n∑ k=0 kpk = n∑ k=0 k ( n k ) pkqn−k = n∑ k=1 k ( n k ) pkqn−k = n n∑ k=1 ( n− 1 k − 1 ) pkqn−k = n n−1∑ j=0 ( n− 1 j ) pj+1 qn−1−j = np n−1∑ j=0 ( n− 1 j ) pj qn−1−j = np n−1∑ j=0 B(n− 1, p, j) = np . See Problem 36 for the second half of the proof. Example 5. Toss a coin n = 3 times and let X count the number of ensuing heads. By Example 1, X ∼ B(3, 12), so by Theorem 2, the expected number of resulting heads is E(X) = np = 32 and the average square distance between the number of heads and E(X) is Var(X) = npq = 34 . If we now toss the coin n = 12 times, then X ∼ B(12, 12 ), so the expected number of resulting heads is E(X) = np = 122 = 6, and Var(X) = npq = 3. ♦ c©2020 School of Mathematics and Statistics, UNSW Sydney 192 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Example 6. If we roll a die n = 12 times, then one might intuitively expect to roll 12× 16 = 2 sixes on average. This is also what we find by the following calculations. If X is the random variable counting the number of sixes rolled, then by Example 3, X ∼ B(12, 16). Thus by Theorem 2, the expected number of resulting sixes is E(X) = np = 12× 16 = 2, as we expected. ♦ The distributions B(12, 12) and B(12, 1 6) appearing in Example 2 and 6 are illustrated below. Note that the function values B(12, 12 , k) are centered symmetrically around E(X) = 6 and spread out gradually to the extremities k = 0, 12. In contrast, the function values B(12, 16 , k) are clustered asymmetrically about the expected value E(X) = 2 and taper rapidly off, so that B(12, 16 , k) is nearly zero for k > 7. Thus, it is extremely highly unlikely that we would roll at least seven sixes when rolling a die twelve times. 0 6 12 0 0.15 0.30 k pk B(12, 12) 0 6 12 0 0.15 0.30 k pk B(12, 16 ) 9.4.2 Geometric Distribution Definition 2. The Geometric distribution G(p) is the function G(p, k) = (1− p)k−1p = qk−1p where k = 1, 2, . . . . Note that G(p, k) is a probability distribution since 0 6 G(p, k) 6 1 for all k and since ∑ k G(p, k) = ∞∑ k=1 G(p, k) = n∑ k=1 qk−1 p = p n∑ k=0 qk = p 1 1− q = p 1 p = 1 . Theorem 3. Consider an infinite Bernoulli process of trials each of which has success probability p. If the random variable X is the number of trials conducted until success occurs for the first time, then X has the geometric distribution G(p). We write X ∼ G(p) to denote that X has this distribution. Note that it is theoretically possible for a success never to happen (X =∞); however, this has zero probability. We therefore omit the all-failure outcome from the sample space so that X is a well-defined finite number. Proof. The variable X can assume values k = 1, 2, . . . , so we must find pk = P (X = k) for these values. The event {X = k} consists of the outcome in which the first k − 1 trials each result in N(o) and the kth trial results in Y (es): N · · ·N︸ ︷︷ ︸ k−1 Y c©2020 School of Mathematics and Statistics, UNSW Sydney 9.4. SPECIAL DISTRIBUTIONS 193 The trials are independent, so this outcome has probability pk = P (X = k) = (1− p)k−1p = G(p, k) . Example 7. Toss a coin until H(ead) is tossed and let X count the number of these tosses. The tosses are identical and mutually independent with probability p = 12 of resulting in H. Theorem 3 then implies that X ∼ G(12 ). Thus, the likelihood of having to toss the coin seven times before tossing H is P (X = 7) = ( 1− 1 2 )7−1 1 2 = 1 27 ≈ 0.8% . ♦ Tail probabilities are very easily expressed for geometrically distributed random variables: Theorem 4. If X ∼ G(p) and n is a positive integer, then P (X > n) = (1− p)n = qn. Proof. P (X > n) = ∞∑ k=n+1 P (X = k) = ∞∑ k=n+1 qk−1p = qn ∞∑ k=1 qk−1p = qn ∞∑ k=1 G(p, k) = qn . Theorem 4 gives us a simple expression for the cumulative distribution function F (x) of X: Corollary 5. If X ∼ G(p), then the cumulative distribution function F is given by F (x) = P (X 6 x) = 1− (1− p)⌊x⌋ = 1− q⌊x⌋ for x ∈ R. Note that ⌊x⌋ denotes the largest integer less or equal to x. Example 8. Roll a die until six is rolled, and let X count the number of these rolls. The rolls are identical and mutually independent with probability p = 16 of resulting in six. Theorem 3 implies that X ∼ G(16 ). By Theorem 4, the likelihood of rolling a six within at most four rolls is F (4) = P (X 6 4) = 1− P (X > 4) = 1− ( 1− 1 6 )4 ≈ 52% , or close to half. Similarly, F (6) = P (X 6 6) ≈ 23 and P (X > 7) ≈ 28%. ♦ Theorem 6. If X is a random variable and X ∼ G(p), then • E(X) = 1p ; • Var(X) = 1−p p2 . Proof. First note that for x 6= 0, using Power Series results from Calculus, ∞∑ k=0 kxk−1 = d dx ∞∑ k=0 xk = d dx 1 1− x = 1 (1− x)2 . Hence, E(X) = ∞∑ k=1 kqk−1p = p (1 − q)2 = p p2 = 1 p . The second part of the proof is left as an exercise. c©2020 School of Mathematics and Statistics, UNSW Sydney 194 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Example 9. Toss a coin until H(ead) is tossed and let X count the number of these tosses. As seen in Example 7, X ∼ G(12 ). Thus, we must expect to on average have to toss the die E(X) = 1p = ( 1 2 ) −1 = 2 times in order to toss a head; this is presumably what most would expect intuitively. Note that this is a relatively accurate estimate of the average since the average squared distance from E(X) = 2 to the infinitely many values of X is only Var(X) = 1−p p2 = (1− 12 )/(12 )2 = 2. Indeed by Theorem 4, the likelihood that we must toss the coin more than three times to get a head is P (X > 3) = (12) 3 = 18 = 12.5%, which is relatively small. ♦ Example 10. Roll a die until six is rolled, and let X count the number of these rolls as in Example 7. In that example, we saw that X ∼ G(16 ), so we must expect on average to have to toss the die E(X) = 1p = ( 1 6) −1 = 6 times in order to toss a head. As in the coin-tossing example above, this expected value is what one might guess intuitively. However, in contrast to that example, the present expected value E(X) = 6 is not a particularly precise estimate of an average tossing count. In particular, the average squared distance from E(X) = 2 to the infinitely many values of X is Var(X) = 1−p p2 = (1− 16)/(16 )2 = 30. Indeed by Example 7, the likelihood of requiring at most four rolls in order to roll a six is (slightly) more than half, and the likelihood of requiring at least eight rolls is more than a quarter. ♦ The distributions G(12 ) and G( 1 6 ) appearing in Example 8 and 10 are illustrated below. Note that the function values G(12 , k) are large to begin with but very quickly decrease, which is reflected by the small expected value and variance that both equal E(X) = Var(X) = 2. In contrast, the function values G(16 , k) are small to begin with but only gradually decrease. This is indicated by the expected value E(X) = 6 and the large variance Var(X) = 30. Thus, it is very likely that we would toss a head after only a few tosses of a coin; whereas it would not be possible to form such a good estimate about the number of die-rolls required to roll a six. 1 2 3 4 5 6 7 8 9 10 11 12 −→ 0 0.25 0.50 k pk G(12 ) 1 2 3 4 5 6 7 8 9 10 11 12 −→ 0 0.25 0.50 k pk G(16 ) 9.4.3 Sign Tests Often, we have a sample of data consisting of independent observations of some quantity of interest, and it might be of interest to see whether the observed values differ systematically from some fixed and pre-determined value. Example 11. Crop research shows that a new variety of corn yields, in bushels per acre, for 15 plots of land: c©2020 School of Mathematics and Statistics, UNSW Sydney 9.5. CONTINUOUS RANDOM VARIABLES 195 138.0 139.1 113.0 132.5 140.7 109.7 118.0 134.8 109.6 127.3 115.6 130.4 130.2 117.7 105.5 A variety of corn currently used yields 110 bushels per acre. We want to know whether the new variety improves on the existing one – that is, are the above values centred around a true value for yield of 110, or are they systematically different from the value 110? To answer this question, one may use a “sign test” approach as follows: 1. Count the number of observations that are strictly greater than the target value (“+”). 2. Count the total number of observations that are either strictly greater (“+”) or strictly smaller (“–”) than the target value. 3. Calculate the tail probability that measures how often one would expect to observe as many increases (“+”) as were observed, if there were equal probability of “+” and “–”. Using this approach, we can now determine whether the new variety of corn has a higher yield than the current variety. 1. The yield was strictly greater (“+”) than 110 bushels/acre in 12 plots. 2. The yield was either strictly greater (“+”) than or strictly smaller (“-”) than 110 bushels/acre in all 15 plots. 3. Assuming that probabilities of greater-than (“+”) yield probabilities for each plot are identical and mutually independent, we can model these yields binomially. In particular, let X be the random variable that counts the yields exceeding the average yield of 110 bushels per acre; then X ∼ B(15, 12). The probability p = 12 is set under the assumption that smaller-than (“-”) yields are as likely as greater-than (“+”) yields and that no equal-to yields occur; this assumes that the new crop has the same yield as the old one. The tail probability that 12 or more plots have a greater-than (“+”) yield is then P (X > 12) = 15∑ k=12 ( 15 k )( 1 2 )k (1 2 )15−k = 1.76% . Under the assumption that the average yield for the new variety has not improved, it is quite unlikely (less than 2%) that we would have observed 12 of the 15 yields above the old average yield. We therefore conclude that the new variety has improved yield. In this course, we will say that if the tail probability is less than 5% then we will regard this as significant. 9.5 Continuous random variables In the previous sections, we considered discrete random variables. These can assume countably many values assigned to the outcomes of an experiment. Although there may be infinitely many of these values, this is not sufficient to model many real-life experiments in which outcomes may be assigned any real values from some interval, such as the height or weight of individuals, or the half- life of a radioactive isotope. We will therefore now consider a type of non-discrete random variable, c©2020 School of Mathematics and Statistics, UNSW Sydney 196 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS called a continuous variable X. In contrast to the discrete random variables, these cannot be defined by probabilities P (X = x) of single values x, since these probabilities each equal 0. Instead, we will define continuous random variables in terms of the cumulative distribution function F (x) = FX(x) = P (X 6 x) for x ∈ R . Definition 1. Random variable X is continuous if and only if FX(x) is continuous. Strictly speaking, FX(x) must actually be piecewise differentiable, which means that FX(x) is differentiable except for at most countably many points. However, the above definition is good enough for our present purposes. Example 1. At a random point during the day, we take note of the time, ignoring the date and number of hours. This gives us a real number (of minutes) that we can represent by a random variableX with function values lying in the interval [0, 60). If x is one of these values, then, assuming that any time is as likely as another, our intuition tells us that F (x) = P (X 6 x) = P (0 6 X 6 x) is the size of the interval [0, x] compared the size of [0, 60); measured in lengths, this is x60 . We therefore find that F (x) is the function F (x) = P (X 6 x) = 0 if x 6 0 x 60 if 0 < x 6 60 1 if x > 60 F (x) 0 60 x 1 This is a continuous function, so X is a continuous random variable. ♦ For discrete random variables X, the cumulative distribution function F (x) is a sum over prob- ability distribution values pk = P (X = xk). For continuous random variables, F (x) is an integral over continuous function analogues of the discrete probability distributions. These analogues are given in the following definition. Definition 2. The probability density function f(x) of a continuous random variable X is defined by f(x) = fX(x) = d dx F (x) , x ∈ R if F (x) is differentiable, and lim x→a− d dx F (x) if F (x) is not differentiable at x = a. Since F (x) is non-decreasing and lim x→∞F (x) = 1, the probability density function satisfies f(x) > 0 for all x and ∫ ∞ −∞ f(x)dx = 1 . Theorem 1. F (x) = ∫ x −∞ f(t)dt. Proof. This follows from the Fundamental Theorem of Calculus since lim x→−∞F (x) = 0. c©2020 School of Mathematics and Statistics, UNSW Sydney 9.5. CONTINUOUS RANDOM VARIABLES 197 Note that if a 6 b, then P (a 6 X 6 b) = P (a < X 6 b) = F (b)− F (a) = ∫ b a f(x)dx . Example 2. At a random point during the day, we take note of the time as in Example 1, and let X again be the continuous random variable X that tell us how far past the hour the time is. The density function f(x) is calculated by differentiating the cumulative distribution function F (x) found in Example 1: f(x) = d dx F (x) = { 1 60 , 0 < x 6 60 0 , otherwise . b bc bc b f(x) 0 60 x 1 60 or, more compactly written, f(x) = 1 60 for x ∈ (0, 60] . Here, we differentiated F (x) from the left at the points x = 0, 60. The probability that we noted the time between a quarter past and half past the hour is P (15 6 X 6 30) = ∫ 30 15 f(x)dx = ∫ 30 15 1 60 dx = 1 4 , as we would intuitively expect. ♦ 9.5.1 The mean and variance of a continuous random variable The mean of continuous random variable is obtained (using the notion of Riemann Sums) by replacing sums by integrals, and probability distributions by probability density functions: Definition 3. The expected value (ormean) of a continuous random variable X with probability density function f(x) is defined to be µ = E(X) = ∫ ∞ −∞ xf(x)dx . Here, and in the following, we assume that all improper integrals converge. The following theorem is the continuous analogue of Theorem 1 in Section 9.3.2. Theorem 2. If X is a continuous random variable with density function f(x), and g(x) is a real function, then the expected value of Y = g(X) is E(Y ) = E(g(X)) = ∫ ∞ −∞ g(x)f(x)dx . The variance of a continuous random variable is defined exactly as for discrete random variables: c©2020 School of Mathematics and Statistics, UNSW Sydney 198 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Definition 4. The variance of a continuous random variable X is Var (X) = E((X − E(X))2) = E(X2)− (E(X))2 . The standard deviation of X is σ = SD(X) = √ Var (X). Note that by Theorem 2, E(X2) = ∫ ∞ −∞ x2 f(x) dx . Example 3. At a random point during the day, we take note of the time as in Examples 1 and 2, and let X again be the continuous random variable giving the number of minutes past the hour. In Example 2, we found the density function f(x) to be f(x) = 160 for x ∈ (0, 60], so E(X) = ∫ ∞ −∞ xf(x)dx = ∫ 0 −∞ 0 dx+ ∫ 60 0 x 60 dx+ ∫ ∞ 0 0 dx = 0 + 1 60 [1 2 x2 ]60 0 = 30 , as we would expect. Similarly by Theorem 2, E(X2) = ∫ ∞ −∞ x2f(x)dx = ∫ 60 0 x2 1 60 dx = 1 60 [1 3 x3 ]60 0 = 1200 , so Var(X) = 1200 − 302 = 300 and SD(X) = √300 ≈ 17.3. ♦ The mean and variance have the same properties under linear scaling as in the discrete case. Theorem 3. If a and b are constants, then E(aX + b) = aE(X) + b Var(aX + b) = a2Var(X) SD(aX + b) = |a|SD(X) . An immediate consequence of these properties is Theorem 4. If E(X) = µ and Var(X) = σ2, and Z = X − µ σ , then E(Z) = 0 and Var(Z) = 1. The random variable Z = X−µσ is referred to as the standardised random variable obtained from X. Note that this theorem holds for discrete and continuous random variables alike. Proof. By Theorem 3, E(Z) = E ( X − µ σ ) = 1 σ E(X)− µ σ = µ σ − µ σ = 0 ; Var(Z) = Var ( X − µ σ ) = 1 σ2 Var(X) = σ2 σ2 = 1 . c©2020 School of Mathematics and Statistics, UNSW Sydney 9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 199 9.6 Special Continuous Distributions In this section, we consider two well-known continuous probability distributions, namely the normal and exponential distributions. It turns out that these are limiting cases of the discrete probability distributions that we have already seen, namely the binomial and geometric distributions, respec- tively, but we will not prove this here. 9.6.1 The Normal Distribution A widely used probability distribution in statistics is the normal or Gaussian distribution. Definition 1. A continuous random variableX has normal distributionN(µ, σ2) if it has probability density φ(x) = 1√ 2πσ2 e− 1 2 ( x−µ σ )2 where −∞ < x <∞ . We write X ∼ N(µ, σ2) to denote that X has the normal distribution N(µ, σ2). The normal probability density is bell-shaped, symmetric about the value x = µ, and narrower for smaller σ. The probability densities for N(0, 1), N(3, 1), and N(0, 4) are illustrated below. N(3, 1)N(0, 1) N(0, 4) 0.4 0 3 The distribution N(0, 1) is called the standard normal distribution. The mean and variance of a random variable X ∼ N(µ, σ2) are simply µ and σ2: Theorem 1. If X is a continuous random variable and X ∼ N(µ, σ2), then • E(X) = µ • Var(X) = σ2. Proof. These are left as an exercise. Theorem 2. If X ∼ N(µ, σ2), then X − µ σ ∼ N(0, 1). Proof. This (almost) follows from Theorem 1 above and Theorem 4 from the previous section, but a proof is required that the new random variable is actually normal. c©2020 School of Mathematics and Statistics, UNSW Sydney 200 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Note that when we standardise a normal random variable, the resulting distribution is also normal. To find a probability involves evaluating the integral of the density function which is very hard, since this function does not have an elementary primitive. Thus, if X is normally distributed with mean µ and standard deviation σ, P (X 6 x) = FX(x) = ∫ x −∞ 1√ 2πσ2 e− 1 2 ( t−µσ ) 2 dt . To evaluate this integral, we convert to the standard normal distribution Z ∼ N(0, 1), using the change of variable Z = X − µ σ outlined above. This gives P (Z 6 z) = FZ(z) = ∫ z −∞ 1√ 2π e− 1 2 t2 dt . The value of this integral for various z has been tabulated numerically and is available either via a calculator or the table given on the following page. This table gives the values of this integral for z in the range −3 to 3. For z less than −3, the value is essentially zero, while for z greater than 3, the value is essentially 1. Example 1. Suppose X is normally distributed with mean µ = 20 and standard deviation σ = 3. Find P (X 6 24). Solution. We change to the standard normal distribution, so P (X 6 24) = P ( X − µ σ 6 24− 20 3 ) ≈ P (Z 6 1.33) ≈ 0.9082 , from the tables. ♦ Example 2. Suppose that the weekly wages of secretaries are normally distributed with mean $800 and standard deviation $50. What is the probability of a secretary having a weekly wage higher than $900 and how many secretaries out of a group of 2000 randomly selected secretaries would you expect to have a weekly wage greater than $900? Solution. Let X denote the weekly wage of a secretary. Then the mean of X is µ = 800 and the standard deviation of X is σ = 50. Since X ∼ N(µ = 800, σ2 = 502), Z = X − 800 50 ∼ N(0, 1) and X = 900 when Z = 900− 800 50 = 2 . Thus, P (X > 900) = P (Z > 2) = 1− P (Z 6 2) = 1− 0.9772 = 0.0228 . Therefore, in a group of 2000 secretaries we would expect .0228 × 2000 = 45.6 (i.e., about 46) of them to have a weekly wage in excess of $900. ♦ In the above example, we used the fact that P (Z > a) = 1 − P (Z 6 a); this is true because P (Z = a) = 0 since Z is continuous. Note also that for a 6 b, P (a 6 Z 6 b) = P (Z 6 b)−P (Z 6 a). c©2020 School of Mathematics and Statistics, UNSW Sydney 9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 201 Standard normal probabilities P (Z 6 z) z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 −2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014 −2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019 −2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026 −2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036 −2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048 −2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064 −2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084 −2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110 −2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143 −2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183 −1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233 −1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294 −1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367 −1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455 −1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559 −1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681 −1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823 −1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985 −1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170 −1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379 −0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611 −0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867 −0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148 −0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451 −0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776 −0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121 −0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483 −0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859 −0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247 −0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 c©2020 School of Mathematics and Statistics, UNSW Sydney 202 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS The normal distribution is used, among other things, to approximate the binomial distribution B(n, p) when n grows large. Before the advent of powerful computers, calculations involving many B(n, p) terms was very laborious or even impossible, so it was easier to approximate such calcula- tions by ones involving integration of the probability density of the normal distribution N(µ, σ2) with the same mean (µ = np) and variance (σ2 = np(1 − p)) as the binomial distribution. These integrals can be evaluated by transforming to the standard normal distribution as we did above and using tables. These days, computers can calculate most probabilities involving binomial sums; however, the normal distribution is occasionally still used instead. Example 3. Toss a coin n = 10 times and let X be the random variable counting the number of resulting heads. As we have seen previously, X ∼ B(10, 12). The probability of tossing at most four heads is thus P (X ≤ 4) = FX(4) = 4∑ k=0 ( 10 k )(1 2 )k(1 2 )10−k = 193 512 ≈ 37.7% . We could also have approximated this probability by calculating FY (4) of a continuous random variable Y with Y ∼ N(µ, σ2) where µ = E(X) = 5 and σ2 = Var(X) = 52 . To get an even better approximation, we can calculate FY (4.5) since P (X 6 4) = P (X < 5) and 4.5 lies in the middle of the interval from 4 to 5: P (X ≤ 4) = FX(4) ≈ P (Y 6 4.5) ≈ P (Z 6 −0.32) ≈ 37.45% , which differs from the true value by only 0.25%. Now toss a coin n = 50 times and let X again be the random variable counting the number of resulting heads. Then X ∼ B(50, 12). The probability of tossing at most 23 heads is thus P (X ≤ 23) = FX(23) = 23∑ k=0 ( 50 k )(1 2 )k(1 2 )50−k ≈ 33.59% . The above calculation is obtained at once when using Maple but would be very time-consuming to calculate by hand (however, can you find a simple short-cut to perform this particular calculation with far less effort?). We could also have approximated this probability by calculating FY (23.5) of a continuous random variable Y with Y ∼ N(µ, σ2) where µ = E(X) = 25 and σ2 = Var(X) = 252 : P (X ≤ 23) ≈ P (Y 6 23.5) ≈ P (Z 6 −0.42) ≈ 33.72% , which only differs from the true value by 0.13%. ♦ The three figures below show how binomial distributions may be approximated by normal dis- tributions. The first and third figures illustrate B(10, 12) and N(5, 5 2 ), and B(50, 1 2 ) and N(25, 25 2 ), respectively, that appeared in the above example. Note that the first and second coordinates of the three figures are differently scaled and truncated. c©2020 School of Mathematics and Statistics, UNSW Sydney 9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 203 0 0.25 1 5 9 B(10, 12) and N(5, 5 2 ) 0.18 4 10 16 B(20, 12) and N(10, 5) 0.11 16 25 34 B(50, 12) and N(25, 25 2 ) More generally, normal distributions are used to model experiments involving large numbers of identical and independent trials that have several possible outcomes. Typical examples include the final-grade distributions of the high-school graduates in a particular country in a given year; or distributions of height, or of weight, or of IQ-test results, of the citizens of a country, and so on. Example 4. A six-sided die, which is believed to be biased, is rolled 720 times and shows a ‘6’ 100 times. a. Write down the formula for the tail probability of getting 100 or less 6’s in 720 rolls of a fair die. b. Using the normal approximation to the binomial distribution, calculate the probability in (i), giving your answer to 3 decimal places. Solution: a. Let X be the number of sixes in 720 rolls of a fair die. Then X is binomial with n = 720, p = 16 . Hence P (X 6 100) = 100∑ k=0 ( 720 k )( 1 6 )k (5 6 )720−k . b. Now X can be approximated by the continuous random variable Y ∼ N(µ, σ2), with µ = E(X) = np = 720 × 1 6 = 120, σ2 = Var(X) = np(1− p) = 720 × 1 6 × 5 6 = 100. Then P (X 6 100) ≈ P (Y 6 100.5) = P ( Z 6 100.5 − 120 10 ) = P (Z 6 −1.95) = 0.026, where Z = Y−µσ ∼ N(0, 1). Since 0.026 = 2.6% < 5%, the tail probability is significantly low and so there is good evidence that the die is biased. 9.6.2 [X] The Exponential Distribution In the previous subsection, we saw that the binomial distribution can be approximated by the normal distribution. Similarly, the geometric distribution has an analogous continuous probability distribution, namely the exponential distribution. c©2020 School of Mathematics and Statistics, UNSW Sydney 204 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Definition 2. A continuous random variable T has exponential distribution Exp(λ) if it has probability density f(t) = { λe−λt if t > 0 0 if t < 0 . We write T ∼ Exp(λ) to denote that T has the exponential distribution Exp(λ). The probability densities for Exp(12 ), Exp(1), and Exp(2) are illustrated below. 2 1 1 2 0 Exp(2) Exp(1) Exp(12 ) The mean and variance associated with the exponential distribution are given as follows: Theorem 3. If T is a continuous random variable and T ∼ Exp(λ), then • E(T ) = 1 λ • Var(T ) = 1 λ2 . Proof. Using integration by parts, E(T ) = ∫ ∞ −∞ tf(t)dt = ∫ 0 −∞ 0 dt+ ∫ ∞ 0 tλe−λtdt = 0 + [−te−λt]∞ 0 + ∫ ∞ 0 e−λtdt = 0− 0 + 1 λ [−e−λt]∞ 0 = 1 λ (0− (−e0)) = 1 λ . The variance Var(T ) is calculated similarly. The cumulative distribution function FT (t) of an exponentially distributed random variable T ∼ Exp(λ) is easily expressed: FT (t) = P (T 6 t) = ∫ t −∞ f(x) dx = { 1− e−λt , t > 0 0 , t < 0 . If we set p = 1− e−λ and let n ∈ Z be an integer, then FT (n) = { 1− (1− p)n , n > 0 0 , n < 0 . c©2020 School of Mathematics and Statistics, UNSW Sydney 9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 205 By Corollary 5 of Theorem 9.4.2, this is the value FX(n) of the cumulative distribution function of a discrete random variable X that is geometrically distributed with parameter p = 1− e−λ. In other words, the exponential distribution Exp(λ) is approximated by the geometric distribution G(p): 1 1 − e − 1 2 0 5 10 Exp(12) and G ( 1− e− 12 ) 1 1 − e−1 0 5 10 Exp(1) and G ( 1− e−1) 1 1 − e−2 0 5 10 Exp(2) and G ( 1− e−2) Conversely, the geometric distribution G(p) is interpolated by the exponential distribution Exp(λ) where λ = ln 11−p . Example 5. An insurance company has collected data on one of its insurance policies, and it turns out that, on average, p = 0.0502 of these policies are claimed each year. For one of these policies, find the (a) probability that the first claim occurs within the first six years; (b) probability that the first claim occurs within the first 6.5 years; (c) probability that the first claim occurs during the first half of the sixth year; (d) expected number of years until the first claim occurs. Solution. We assume that claims occur independently of each other and with equal probability p = 0.0502. If claims only occurred at the end of each year, then we could model the behaviour of the first occurring claim by a discrete random variable X ∼ G(p) that counted the number of years until that first claim occurred. However, claims might occur at any positive time from the policy’s inception, so the discrete model does not suffice; instead, let T be continuous random variable that gives the time until the first claim. Then T ∼ Exp(λ) where λ = ln 11−0.0502 ≈ 0.0515. (a) The probability that the first claim occurs within the first six years is P (X 6 6) = FX(6) = 1− (1− p)6 = 1− (1− 0.0502)6 ≈ 26.58% . We could also have calculated this probability as follows: P (T 6 6) = FT (6) = 1− e−λ×6 = 1− e−0.0515×6 ≈ 26.58% . (b) The probability that the first claim occurs within the first 6.5 years is P (T 6 6.5) = FT (6.5) = 1− e−λ×6.5 = 1− e−0.0515×6.5 ≈ 28.45% . c©2020 School of Mathematics and Statistics, UNSW Sydney 206 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS (c) The probability that the first claim occurs during the first half of the sixth year is P (5 < T 6 5.5) = P (T 6 5.5)− P (T 6 5) = FT (5.5) − FT (5) = (1− e−λ×5.5)− (1− e−λ×5) = e−0.0515×5 − e−0.0515×5.5 ≈ 1.965% . A second way to calculate this probability is as follows: P (5 < T 6 5.5) = ∫ 5.5 5 λe−λtdt = [−e−λt]5.5 5 = e−0.0515×5 − e−0.0515×5.5 ≈ 1.965% . (d) The expected number of years until the first claim occurs is E(T ) = 1λ = 1 0.0515 ≈ 19.42. This is roughly approximated by E(X) = 1p = 1 0.0502 ≈ 19.53. These values are what we might intuitively estimate: “just under 10.05 = 20”. ♦ 9.6.3 Useful Web Applets to Illustrate Probability Reasoning Web applets for long-run frequency illustrations: http://www.shodor.org/interactivate/activities/AdjustableSpinner/ http://socr.stat.ucla.edu/Applets.dir/DiceApplet.html An applet to illustrate conditional probability and independence: http://www.stat.berkeley.edu/users/stark/Java/Html/Venn.htm An applet for Bayes’ Rule: http://www.stat.berkeley.edu/users/stark/Java/Html/Venn.htm An applet for the Birthday Problem: http://www.mste.uiuc.edu/reese/birthday/ 9.7 Probability and MAPLE Most of the problems in this chapter reduce to sums of various probability expressions. Maple’s ability to do summation symbolically is especially useful here, as is its ability to sum infinite series. Even for finite sums, Maple can simplify the calculations significantly. For example, we can implement the binomial distribution function B(n, p, k) = (n k ) pk(1− p)n−k as follows B := proc(n,p,k) binomial(n,k)*p^k*(1-p)^(n-k) end proc; Maple allows us to check that the function values of B(n, p, k) sum to 1 (even though we have not specified what n and p are!): simplify(sum(B(n,p,k), k = 0..n)); Indeed, Maple allows us to perform many exact calculations on discrete distributions. For example, it is somewhat time-consuming and awkward to calculate by hand and calculator the probability that 1000 coin tosses result in between 510 and 530 heads. However, this calculation is very quick and easy to conduct with Maple: c©2020 School of Mathematics and Statistics, UNSW Sydney 9.7. PROBABILITY AND MAPLE 207 n := 1000; p := 1/2; sum(B(n,p,k), k = 510..530); evalf(%); c©2020 School of Mathematics and Statistics, UNSW Sydney 208 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Problems for Chapter 9 Problems 9.1 : Some Preliminary Set Theory 1. [R] Let A = {a, c, d, e} and B = {d, e, f}. Suppose that the universal set is S = {a, b, c, d, e, f}. Write down the following sets. a) A−B, b) B −A, c) A ∪Ac, d) B ∩Bc, e) A ∪Bc, f) Ac ∩B, g) (A ∪B)c, h) Ac ∩Bc. 2. [R] A survey was carried out in a new development area to gain data on home-delivered newspapers. 110 homes were selected at random and the occupants asked whether they had the daily paper or the weekend paper home delivered. 74 received the daily paper, 58 received the weekend paper and 10 received no paper at all. How many homes visited in this survey received both the daily and weekend papers. 3. [R] Of the students taking PHYS1121 and MATH1131 in a hypothetical year, 90% passed MATH1131, 85% passed PHSY1121 and 6% passed neither. What percentage passed both? What percentage of those who passed PHYS1121 also passed MATH1131? 4. [R] A brewery brews one type of beer which is marketed under three different brands. In a survey of 150 first year students, 58 drink at least brand A, 49 drink at least brand B and 57 drink at least brand C. 14 drink brand A and brand C, 13 drink brand A and brand B and 17 drink both brand B and brand C. 4 students drink all three brands. How many students drink none of these three brands? 5. [R] Suppose A, B and C represent three events. Using unions, intersections and comple- ments, find expressions representing the events a) only A occurs, b) at least one event occurs, c) at least two events occur, d) exactly one event occurs, e) exactly two events occur. Problems 9.2 : Probability 6. [R] Two fair dice are thrown. a) What is the probability that the sum of the two numbers obtained is 6? b) What is the probability that both dice show the same number? c) What is the probability that at least one of the dice shows an even number? c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 9 209 7. [R] Suppose that 30% of computer users use a Macintosh, 50% use a Microsoft Windows PC and that 20% use Linux. Also suppose that 60% of the Macintosh users have succumbed to a computer virus, 80% of the Windows PC users get the virus and 10% of the Linux users get the virus. A computer user is selected at random and it is found that her computer was infected with the virus. What is the probability that she is a Windows PC user? 8. [R] Employment data at a large company reveal that 72% of the workers are married, that 44% are university graduates and that half of the university graduates are married. What is the probability that a randomly chosen worker a) is neither married nor a university graduate? b) is married but not a university graduate? c) is married or is a university graduate? 9. [R] On the basis of the health records of a particular group of people, an insurance company accepted 60% of the group for a 10 year life policy. Ten years later it examined the survival rates for the whole group and found that 80% of those accepted for the policy had survived the 10 years, while 50% of those rejected had survived the 10 years. What percentage of the group did not survive 10 years? If a person did survive 10 years, what is the probability that they had been refused cover? 10. [R] Urn 1 contains 2 red balls and 3 black balls. Urn 2 contains 4 red balls and 5 black balls. a) If an urn is randomly selected and a ball drawn at random from it, what is the probability that the ball is red? b) Suppose a ball is drawn at random from Urn 1 and placed into Urn 2. If a ball is then drawn at random from the 10 balls in Urn 2, what is the probability that it is red? c) In the previous part, given that the ball drawn from Urn 2 is red, what is the proba- bility that the ball transferred from Urn 1 was black? 11. [R] Down’s syndrome is a disorder that affects 1 in 270 babies born to mothers aged 35 or over. A new blood test for the condition has a sensitivity (i.e. the probability of a positive test result given the Down’s syndrome is present) of 89%. The specificity (i.e. the probability of a negative test result given that Down’s syndrome is absent) of the new test is 75%. a) What proportion of women over age 35 would test positive on this new blood test? b) A mother over age 35 receives a positive test result. What is the chance that Down’s syndrome is actually present? c) A mother over age 35 receives a negative test result. What is the chance that Down’s syndrome is actually present? 12. [R] The following is a table of the annual promotion probabilities at a particular workplace, broken down by gender. c©2020 School of Mathematics and Statistics, UNSW Sydney 210 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Promoted Not promoted Total Male 0.17 0.68 0.85 Female 0.03 0.12 0.15 Is there gender bias in promotion? 13. [R] A system has n independent components and each fail with probability p. Calculate the probability that the system will fail when a) the components are in parallel, so the system fails only when all of the components fail; b) the components are in series, so the system fails if any one of the components fail; 14. [X] Tom and Bob play a game by each tossing a fair coin. The game consists of tossing the two coins together, until for the first time either two heads appear when Tom wins, or two tails appear when Bob wins. a) Show that the probability that Tom wins at or before the nth toss is 1 2 − 1 2n+1 . b) Show that the probability that the game is decided at or before the nth toss is 1− 1 2n . 15. [X] Extend the Multiplication Rule of section 9.2.3 to 3 events A1, A2, A3 and show that P (A1 ∩A2 ∩A3) = P (A3|A1 ∩A2)P (A2|A1)P (A1). The same pattern applies to higher numbers of events. Write this down. This law is particularly useful when we have a sequence of dependent trials. To gain entry to a selective high school students must pass 3 tests. 20% fail the first test and are excluded. Of the 80% who pass the first, 30% fail the second and are excluded. Of those who pass the second, 60% pass the third. What proportion of students pass the first two tests? Use the multiplicative law to answer this question. What proportion of students gain entry to the selective high school? What proportion pass the first two tests, but fail the third? 16. [X] Use the additive law of probability to establish, using mathematical induction, Boole’s Law: P (A1 ∪A2 ∪ · · · ∪An) ≤ P (A1) + P (A2) + · · ·+ P (An) 17. [X] Establish, using mathematical induction, Bonferoni’s inequality: P (A1 ∩A2 ∩ · · · ∩An) ≥ 1− [P (Ac1) + P (Ac2) + · · · + P (Acn)] c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 9 211 Problems 9.3 : Random Variables 18. [R] Show that each of the following sequences pk satisfies pk ≥ 0 and ∑∞ k=0 pk = 1. Note that in the distributions below pk = P (X = k) where X is the random variable under consideration. a) Uniform Distribution. pk = 1 n for 1 ≤ k ≤ n and 0 otherwise. Here, n is a fixed positive integer. b) Binomial Distribution. pk = B(n, p, k) = ( n k ) pk(1− p)n−k for 0 ≤ k ≤ n, and 0 otherwise. Here, p is a constant with 0 < p < 1. c) Geometric Distribution. pk = G(p, k) = (1− p)k−1p for 1 ≤ k <∞, where p is a constant with 0 < p < 1. d) Poisson Distribution. pk = e −λλ k k! for 0 ≤ k <∞, where λ > 0 is a constant. To solve this question you will need the Maclaurin series for eλ. 19. [R] A box contains four red and two black balls. Two balls are drawn from the box. Let X be the number of red balls obtained. Find the probability distribution for X. 20. [R] A busy switchboard receives 150 calls an hour on average. Assume that the probability, pk, of getting k calls in a given minute is pk = e −λλ k k! where λ the average number of calls per minute. (This is called a Poisson distribution.) a) Find the probability of getting exactly 3 calls in a given minute. b) Find the probability of getting at least 2 calls in a given minute. 21. [R] In a biased lottery with tickets numbered 1 to 50, the probability that ticket number n wins is pn = n 1275 for n = 1, 2, 3, . . . , 50. What is the probability that the winning ticket bears a number less than or equal to 25? c©2020 School of Mathematics and Statistics, UNSW Sydney 212 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS 22. [H] Let X be a random variable with probability distribution P (X = k) = c k! , for k = 0, 1, 2, . . .. a) Determine the value of c. b) Calculate P (X = 2). c) Calculate P (X < 2). d) Calculate P (X > 4). 23. [X] In a biochemical experiment, n organisms are placed in a nutrient medium, and the number of organisms X which survive for a given period is recorded. The probability distribution of X is assumed to be given by P (X = k) = 2(k + 1) (n+ 1)(n + 2) for 0 ≤ k ≤ n, and 0 otherwise. a) Check that n∑ k=0 P (X = k) = 1. b) Calculate the probability that at most a proportion α of the organisms survive, and deduce that for large n this is approximately α2. c) Find the smallest value of n for which the probability of there being at least one survivor among the n organisms is at least 0.95. 24. [H] A genetic experiment on cell division can give rise to at most 2n cells. The probability distribution of the number of cells X recorded is P (X = k) = θk(1− θ) 1− θ2n+1 for 0 ≤ k ≤ 2n, where θ is a constant with 0 < θ < 1. What are the probabilities that a) an odd number of cells is recorded, b) at most n cells are recorded? 25. [R] Let X be a discrete random variable with the following probability distribution k : 0 1 2 3 4 P (X = k) = pk : 0.1 2c 0.2 0.1 4c a) Find the value of c. b) Calculate E(X) and Var(X). c) Let Y = 1− 4X. Calculate E(Y ) and Var(Y ). 26. [R] Find the mean and variance for the uniform distribution in Question 18(a). c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 9 213 27. [X] LetX be a random variable with the Poisson probability distribution given in Question 18(d). Find E ( (1 +X)−1 ) . 28. [X] Assuming that one can differentiate a power series term by term, one obtains from the formula ∞∑ k=0 xk = 1 1− x, |x| < 1 the formulas ∞∑ k=1 kxk−1 = 1 (1− x)2 ; ∞∑ k=2 k(k − 1)xk−2 = 2 (1− x)3 , |x| < 1. (You will see that this is justified in your MATH1231/41 Calculus lectures). From these formulas, show that ∞∑ k=0 kxk = x (1− x)2 ; ∞∑ k=0 k2xk = x(x+ 1) (1− x)3 , |x| < 1. and hence calculate the mean and the variance for geometric distribution in Question 18(c). Problems 9.4 : Special Distributions 29. [R] A coin is tossed 50 times. What is the probability of it coming down heads exactly 25 times? 30. [R] A test paper contains 8 multiple choice questions, each with 4 potential answers to choose from. A correct answers gains 1 mark, a wrong answer 0 marks and 4 is the pass mark. If a student simply guesses, what is probability that she will pass? 31. [R] The probability of dying from a particular disease is 0.3. 10 people in a hospital are suffering from the disease. Find the probability that at least 8 survive. 32. [R] How many times must a coin be tossed until the probability of getting 2 or more heads exceeds 0.99? (You need to try different n values after an initial guess.) 33. [X] For the B(n, p) distribution, by considering pk pk−1 , show that pk is largest when k = ⌊(n + 1)p⌋. This k is called the “mode” of the distribution. 34. [R] Consider the game of “rock, scissors, paper” in which two players instantaneously choose one of “rock”, “scissors” or “paper”. If both players pick the same item, they play again; if the two players make different choices one of them wins (rock beats scissors, scissors beats paper and paper beats rock). Let X be the number of times the game is played until someone wins. Find the probability distribution for X when each player chooses randomly from rock, scissors or paper. c©2020 School of Mathematics and Statistics, UNSW Sydney 214 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS 35. [H] A die is rolled repeatedly. a) Find the probability that the third time a ‘6’ shows is on the 20th roll. b) [X]Generalise to the kth time on the nth roll. This is an example of the “negative binomial” distribution, which generalises the geometric distribution. 36. [X] a) Show that for non-negative integers k,m, n,( k m ) ( n k ) = ( n m ) ( n−m k −m ) b) Show that n∑ k=m ( k m )( n k ) pk (1− p)n−k = ( n m ) pm. c) By considering the cases m = 1 and m = 2 in the preceding formula, prove the variance formula for the Binomial distribution, as stated in Theorem 2 of Section 9.4. 37. [R] A certain type of car is known to sustain damage 25% of the time in 15 km/hr crash tests. A modified bumper design has been proposed in an effort to decrease this percentage. In a trial, cars with the new type of bumper were damaged in one of 15 test crashed. a) What distribution could be used to model the number cars damaged in trials, if the new bumpers perform no better than the old ones? b) Calculate a tail probability that measures how unusual it would be to observe as few as one damaged car, under the assumptions of part (a). c) Does the trial indicate that the new bumpers protect cars from damage better than the old bumpers? 38. [R] An extensive study was conducted in the northern hemisphere of butterfly distribu- tions, comparing where butterfly species are presently found with where they were found a century ago (Parmesan et al 1999, Nature 399, 579-583). It was hypothesised that due to climate change, more species would have shifted northwards in distribution than shifted southwards. It was found that of the 23 butterfly species whose distribution had shifted, 22 had shifted northwards in distribution, and 1 had shifted southwards in distribution. a) What distribution could be used to model the number of butterfly species moving northward, if they are just as likely to move north as south (that is, if there is no influence of climate change on butterfly species)? b) Calculate a tail probability that measures how unusual it would be to observe as many as 22 butterfly species moving northwards, under the assumptions of part (a). c) Do these data provide evidence that climate change has affected the distribution of butterfly species? c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 9 215 39. [R] Sydney’s dam levels recently reached historic lows, which may in part be due to lower than average rainfall over recent years. The following data are total annual rainfall in Sydney for eight recent calendar years. Year 2000 2001 2002 2003 2004 2005 2006 2007 Annual rainfall 812.6 1359.0 860.0 1207.6 995.2 816.0 994.0 1499.2 Historic rainfall levels prior to the year 2000 was 1302.2mm per year. We will use a sign test approach to see how much evidence there is that the recent Sydney rainfall is less than historic levels (that is, less than 1302.2mm). It is reasonable to assume that annual rainfall is independent across years. a) In how many years was the annual rainfall less than 1302.2mm? b) What distribution could be used to model the number of years in which rainfall was less than 1302.2mm rather than being greater than this value, if both outcomes were equally likely? c) Calculate a tail probability that measures how unusual it would be to observe as many years with a total rainfall less than 1302.2mm as was observed, if annual rainfall was just as likely to be greater than 1302.2 as less than 1302.2. d) Do you think there is evidence that Sydney rainfall has decreased in recent years? (That is, that the values are systematically smaller than 1302.2mm?) 40. [R] Do ravens intentionally fly towards gunshot sounds (to scavenge on the carcass they expect to find)? Crow White addressed this question by counting raven numbers at a loca- tion, firing a gun, then counting raven numbers 10 minutes later (Ecology 2005, 86:1057- 1060). He did this in 12 locations. Results: location 1 2 3 4 5 6 7 8 9 10 11 12 before 1 0 1 0 0 0 0 5 1 1 2 0 after 2 3 2 2 1 1 2 2 4 2 0 3 We would like to find out if there is evidence that ravens fly towards the location of gunshots. a) In how many locations was there an increase in number of ravens, after the gunshot? b) In how many locations was there a change in number of ravens after the gunshot? c) What distribution could be used to model the number of locations in which there was an increase in the number of ravens rather than a decrease, if both outcomes were equally likely? d) Calculate a tail probability that measures how unusual it would be to observe as many locations with an increase in number of ravens as was observed, if increases and decreases were equally likely. e) Do you think there is evidence that the ravens fly towards gunshot sounds? (That is, was there a systematic increase in the number of ravens present after the gunshot sound?) c©2020 School of Mathematics and Statistics, UNSW Sydney 216 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Problems 9.5 41. [R] Verify that the following functions f are probability densities, that is, that f(x) > 0 and ∫ ∞ −∞ f(x) dx = 1. Also sketch the graph of each function. a) Uniform Distribution (Assume a < b). f(x) = 1 b− a for a 6 x 6 b 0 otherwise. b) Pareto Distribution. For k > 0, f(x) = k xk+1 for x > 1 0 otherwise. c) Gamma Distribution fn. For n > 0, fn(x) = 1 n! xne−x for x > 0 0 otherwise. Note. The formula ∫ ∞ 0 xne−x dx = n! will be useful. You can prove it using integration by parts. d) Laplace Distribution. f(x) = 1 2 e−|x| for −∞ < x <∞. 42. [R] Calculate the means and variances for the distributions in the preceding question. (Note that for the Pareto distribution the mean is only defined if k > 1 and the variance is only defined if k > 2.) 43. [R] The probability density of the Cauchy Distribution is given by f(x) = α 1 + x2 for −∞ < x <∞. a) Find the value of α. c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 9 217 b) If X has a Cauchy distribution, find a number c such that P (X 6 c) = 0.25. c) [H]What can be said about E(X) and Var(X) for the Cauchy distribution? 44. [R] X has the probability density function f(x) = { 1 8 x for 3 6 x 6 5 0 otherwise. Calculate a) P (X > 4), b) P (X 6 4), c) P (3.2 6 X 6 4.1), d) E(X). 45. [R] Y has the probability density function f(y) = c y for 10 6 y 6 100 0 otherwise. a) Determine the value of the constant c. b) Obtain the cumulative distribution function of Y . c) Find b such that P (Y 6 b) = 0.50. d) Find the expected value of E(Y ). 46. [R] Let F be the function defined by F (x) = 0 for x < 2 1 4 (x− 2) for 2 6 x < 3 1 4 + 3 8 (x− 3) for 3 6 x < 5 1 for x > 5. a) Sketch the graph of F . b) Find a probability density function f which would have F as its cumulative distribu- tion function. Sketch the graph of f . c) Find E(X) for this probability density function. 47. [H] Suppose X has a probability density given by f(x) = { αe−αx for x > 0 0 otherwise. Find the mean and variance of Y = 2X + 3. c©2020 School of Mathematics and Statistics, UNSW Sydney 218 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS Problems 9.6 48. [R] Suppose Z is the standard normal random variable. Use the standard normal tables to write down a) P (Z 6 1.23) b) P (Z 6 −2.3) c) P (Z > 0.36) d) P (Z > −1.24) e) P (1 6 Z 6 2) f) P (−0.5 6 Z 6 0.5) 49. [R] Suppose X is normally distributed with mean µ and standard deviation σ. Use the standard normal tables to find: a) P (X 6 12) with µ = 10, σ = 2 b) P (X > 53) with µ = 50, σ = 4.5 c) P (X 6 47) with µ = 50, σ = 4.5 d) P (X > 32) with µ = 36, σ = 8 e) P (21 6 X 6 24) with µ = 20, σ = 3 f) P (15 6 X 6 20) with µ = 18, σ = 1.5 50. [R] Suppose X is normally distributed with mean µ and standard deviation σ. Use the standard normal tables to find the value of c such that: a) P (X 6 c) = 0.8238 with µ = 0, σ = 1 b) P (X > c) = 0.0495 with µ = 0, σ = 1 c) P (X 6 c) = 0.2514 with µ = 50, σ = 4.5 d) P (X > c) = 0.6915 with µ = 36, σ = 8 51. [R] The mean heights of men in a certain country is estimated to be 1.69m with a standard deviation of 0.06m. We assume that the heights are normally distributed. a) Find the approximate probability that a man chosen at random from this country has a height between 1.60m and 1.74m. b) If 400 men are chosen from this country, how many would you expect on average to have heights greater than 1.74? 52. [R] The length of life of a particular make of T.V. is approximately normally distributed, with a mean of 3.1 years and a standard deviation of 1.2 years. If this type of T.V. is guaranteed for one year, what fraction of the T.V.s sold will require replacement under the guarantee? c©2020 School of Mathematics and Statistics, UNSW Sydney PROBLEMS FOR CHAPTER 9 219 53. [R] Experience has shown that the I.Q. scores of University students are normally dis- tributed with a mean of 112 and a standard deviation of 8. Calculate the percentage of students who will have an I.Q. score a) higher than 130 b) lower than 100 c) between 105 and 125. 54. [R] The lengths of studs turned out by a certain automatic machine are normally dis- tributed with a mean of 3.220 cm and a standard deviation of 0.003cm. If the acceptable length of a stud is between 3.226 and 3.212 cm, determine to one decimal place the per- centage rejected as under size and over size respectively. 55. [R] An unbiased die is tossed 600 times. Use the normal approximation to the binomial to find the approximate probability that a 6 appears more than 120 times. 56. [R] Olof Jonsson was a controversial psychic whose psychic abilities were tested in an experiment. In the experiment a computer showed 4 cards to the subject and (randomly) picked one of them, and the subject was to guess which card they thought the computer had picked. This process was repeated 288 times, and Olof managed to pick the right card 88 times. a) What distribution could be used to model the number of cards Olof correctly selects, if he is not psychic (and so can do no better than random guessing)? b) Write down an expression for a tail probability that measures how unusual it is to correctly pick as many cards as Olof did. c) Use the normal approximation to the binomial to find this probability, under the assumption that Olof is not psychic. d) Is this evidence that Olof was psychic? 57. [X] If X is a normal random variable with mean µ and variance σ2, show that E(|X − µ|) = √ 2 π σ. 58. [X] Find E(X) and Var(X) for the random variable X with probability density function proportional to e−x 2+x. 59. [X] Let T be a continuous random variable and T ∼ Exp(λ); that is T has an exponential distribution with parameter λ. Prove that Var(T ) = 1 λ2 . 60. [X] Suppose a continuous random variable T has an exponential distribution with param- eter λ and P (T 6 t) = p, 0 < p < 1. c©2020 School of Mathematics and Statistics, UNSW Sydney 220 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS a) Find t in terms of p. b) Hence find the median m. In other words, find m such that P (T 6 m) = 0.5. 61. [X] (Memoryless property) A continuous random variable T has an exponential distribu- tion and T ∼ Exp(λ). Prove that P (T > t+ t0|T > t0), is independent of t0 for positive t and t0. 62. [X] Suppose the time, in minutes, required by any particular student to complete a certain two-hour examination has the exponential distribution for which the mean is 90 minutes. The examination starts at 10:00 am. a) What is the probability that a student completes the examination before 11:00 am? b) What is the probability that a student completes the examination between 11:00 am and 11:30 am? c) What is the probability that a student could not complete the examination within 2 hours? d) What is the median time for completing the examination? 63. [X] During a whale watch season in Sydney, the time, measured in hours from the moment that a cruise enters the whale watch area, to spot the first whale can be modelled by the exponential distribution with parameter λ = 0.4. A person joins a Sydney whale watch tour during the season. After entering the area, a) what is the probability that no whale can be spotted in the first hour? b) what is the probability that the time to spot the first whale exceeds the mean by one standard deviation? c) The organiser claims a 90 % success rate of finding whales in a trip. How long should the cruise stay in the whale watching area to achieve that? 64. [X] A system consists of 3 independent components connected in series. Hence the system fails when at least one of the components fails. Suppose the lengths of life of the com- ponents, measured in hours, have the exponential distribution Exp(0.001), Exp(0.003), Exp(0.004). Find the probability that the system can last at least 100 hours. 65. [X] A system consists of n independent components connected in series. The lengths of life of the components, in hours, have the exponential distributions Exp(λi), 1 6 i 6 n. Let T be the random variable that gives the time until the system fails. a) Find P (T 6 t) and hence write down the cumulative distribution function of the random variable T . b) Find the probability density function of T . c) Name the probability distribution of T , and find the expected value and variance of T . c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS TO SELECTED PROBLEMS Chapter 6 3. a) 12 0 , 11 4 . 7. Axioms 1, 4, 5, 6, 10 are satisfied, others are not. It is not a vector space. 8. a) 2v = (1 + 1)v = 1v + 1v = v + v. Identify the axioms that have been used. b) Use induction. 9. For part 5: If λv = µv then λv − µv = 0, so (λ − µ)v = 0. But v 6= 0 so, by part 4 of the proposition, λ− µ = 0. So λ = µ. 11. The position vectors of the points on a plane which does not pass through the origin do not form a vector subspace. 12. a) For example, 00 0 , 20 1 , 1−2 −1 are in S c) The position vectors of the points on a plane which passes through the origin form a vector subspace. 16. Column 1 belongs to S as A 10 0 = (2 4 ) . Similar arguments apply for columns 2 and 3. 17. a) No. b) Yes. c) Yes. 20. W = 0 0 α β γ : α, β, γ ∈ R , a copy of R3. 222 CHAPTER 6 23. No, S is not a subspace because the zero polynomial is not in it. 24. b) x3 + 3x2 + 3x+ 1. 25. HINT: Suppose that S1 = {( x 0 ) : x ∈ R } and S2 = {( 0 y ) : y ∈ R } . Show that S1 and S2 are subspaces of R2 but S1 ∪ S2 is not. 28. a) Yes, a = 2v1 − 3v2 + v3. b) Yes. 29. a) No. b) No, 3b2 − 2b3 = 0. span(v1,v2,v3) is a plane in R3. 30. a) Yes, a = v1 − v2 − 2v3 b) No, b ∈ span(v1,v2,v3) if and only if b1 − 2b2 + b4 = 0. The span is a 3-dimensional subspace in R4. 31. Yes, b = −10v1 + 12v2 + 16v3. 32. No, span(v1,v2,v3) is plane in R 3 given by 5b1 − 4b2 + b3 = 0. 33. Yes. v = 3v1 − v2 − 2v3, where v1,v2,v3 are the columns of A. 34. No. 35. Yes. 38. Linearly dependent, coplanar. 39. Linearly independent, not coplanar. 40. Set containing 0 is linearly dependent as coefficient of zero vector can be varied to make linear combination non-unique. 41. b) v3 = v1 + v2. e) span(S) is plane through origin parallel to v1 and v2. 42. b) v4 = −2v1 + 2v2 + v3 e) span (S) = R3. 43. No. −2− x+ 5x2 = (1− x+ 2x2) + 3(−1 + x2). 44. No, it is impossible to return to origin. 45. Yes, it is possible to return to origin. 47. Yes, it is a basis. c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS 223 48. A basis for W is 12 3 , 11 −1 . Dim(W ) = 2. 50. a) True. b) False. c) False. d) True. e) False, True, False, False. f) False. g) True. h) False. i) True. j) True. 51. a) n 6 ℓ b) No relation. c) n > ℓ. d) n = ℓ. 53. b) −11 0 , −10 1 . 54. A basis for col(A) is 1 0 0 0 , −1 1 0 0 , −2 4 2 0 . Dim(col(A)) = 3. 55. A basis for col(A) is 1 0 −1 1 , 0 −1 1 0 , 2 −2 4 4 . Dim(col(A)) = 3. 56. A basis is 1 2 3 4 , 3 5 5 5 , −7 −8 −3 2 , 0 1 0 0 . 57. a) B = {v1,v3} . b) x = −4v1 + 5v3, c) dim(col(A)) = 2. 59. Bases are {p1, p2, p4} or {p1, p3, p4} or {p2, p3, p4}. {p1, p2, p3} is not a basis; why not? 62. Coordinate vector is −2 4 12 4 . 63. v = 18 7 11 19 . 64. v = 11 10 . c©2020 School of Mathematics and Statistics, UNSW Sydney 224 CHAPTER 6 65. a) 32 2 . b) −a1 − a2 + 2a3a2 −a1 + a3 . 66. a) 12−8 21 ; b) −21 −3 . 67. c) Coordinates are λ1 = −13 4 · v1 = −2√2, λ2 = −13 4 · v2 = 2√3, λ3 = −13 4 · v3 = √6. Coordinate vector is −2 √ 2 2 √ 3√ 6 . 69. a) For example, ( 1 0 0 4 ) , ( 1 2 3 4 ) , (−1 0 0 6 ) are in S. b) S is not a subspace because the 2× 2 zero matrix is not in it. 70. a) For example, ( 1 0 0 −1 ) , ( 1 2 3 −1 ) , ( 0 0 0 0 ) are in S. b) Use the Subspace Theorem to prove that S is a subspace. 74. a) a11 a12 a21 a22 . b) 1 2(a11 + a22) 1 2(a12 + a21) 1 2i(−a12 + a21) 1 2(a11 − a22) . 75. a) ( −4 2 −1 −3 ) = −4 ( 1 0 −2 0 ) + 2 ( 0 1 3 0 ) − 3 ( 0 0 5 1 ) . b) No. 79. Not a subspace. 85. Not a subspace. 87. No. 88. Let p ∈ P2 be p(z) = a0 + a1z + a2z2. Then the condition is −32a1 + a2 = 0. (An equivalent condition is p′(−13 ) = 0.) 89. No for question 87. No for question 88. 90. No. p3 = −p1 + 3p2. 91. A basis is {p1, p2, p3, p4, 1, z}. c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS 225 92. 2−1 0 . 93. 91 17 . 94. a0 − a1 + a2a0 a0 + a1 + a2 . Chapter 7 1. S is not a linear map as the domain [−1, 1] is not a vector space. 2. a) Linear. b) Linear. c) Not Linear. d) Linear. e) Not Linear. 3. a) Domain C, codomain R, linear. b) Domain C, codomain R, linear. c) Domain C, codomain R+ = {x ∈ R : x > 0}, not linear. d) Domain C− {0}, codomain (−π, π], not linear. e) Domain C, codomain C, linear. 6. No. 7. T 2−1 4 = 21 4 −15 28 and T x1x2 x3 = x1 − 3x2 + 4x3 2x1 3x1 + x2 − 5x3 4x1 + 4x2 + 6x3 . 9. 02 1 = 12 3 + −21 −4 + 1−1 2 , and so T 02 1 = (−1 1 ) . 11. a) 3 −1 2 4 −3 −3 0 1 . b) −2 0 5 06 −8 0 2 −2 4 −3 0 . d) 1 0 −2 −43 −4 −3 1 −2 2 4 0 . 12. Same as for 11. 13. a) Ae1 = 2e1, Ae2 = 0.7e2, Ab = 4e1 + 2.1e2. (e1 is stretched to twice its length, e2 is compressed to 0.7 of its length and b is stretched and rotated.) d) Notice that Ab = 3b. This means b is stretched to three times its length. c©2020 School of Mathematics and Statistics, UNSW Sydney 226 CHAPTER 7 e) Notice that Ab = −2b. This means the direction of b is reversed and it is stretched to twice its length. 15. 1 2 ( 1 −√3√ 3 1 ) . 16. x′ = T (x) = (−x1 x2 ) , A = (−1 0 0 1 ) . 17. x′ = T (x) = x1x2 −x3 , A = 1 0 00 1 0 0 0 −1 . 18. q = Ap, where diagonal entries of A are aii = −1+2d2i /|d|2 and off-diagonal entries of A are aij = 2didj/|d|2. 19. T is linear. For T (x) = Ax, the matrix is A = 0 −b3 b2b3 0 −b1 −b2 b1 0 . 20. T (a) = Aa, where A = 1 5 1 0 20 0 0 2 0 4 . 21. S is not linear. 23. a′1a′2 a′3 = Rα a1a2 a3 = a1 cosα+ a3 sinαa2 −a1 sinα+ a3 cosα . 25. a) ker(A) = {0}, nullity(A) = 0, no basis. b) Kernel: λ −1−3 1 : λ ∈ R , nullity(B) = 1. c) Kernel: λ 2 −2 −1 1 : λ ∈ R , nullity(C) = 1. 26. a) 1 −3 1 0 , nullity(D) = 1. b) {0}, nullity(E) = 0. c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS 227 27. For example A = ( 1 1 1 1 1 2 3 4 ) . 28. a) ker(A) = {0}, nullity(A) = 0. b) ker(A) = λ 5 4 2 1 : λ ∈ R , nullity(A) = 1. d) ker(A) = λ 6 4 1 1 : λ ∈ R , nullity(A) = 1. 29. For questions 16, 17 and 18, ker(T ) = {0} and nullity(T ) = 0. For question 19, ker(T ) = {x ∈ R3 : x = λb for λ ∈ R}. Nullity(T ) = 1. Kernel is set of all vectors parallel to b. For question 20, ker(T ) = {x ∈ Rn : b · x = 0}. Nullity(T ) = n− 1 (Why?). Kernel is set of all vectors orthogonal to b. 30. b) 1 31. a) b ∈ im(A) as A 2−3 1 = 1110 4 . b) b is not in im(A) as Ax = 9−2 −4 has no solution. c) b ∈ im(A), since, for example, x = −10 12 16 0 is a solution of Ax = b. 32. a) No conditions, im(A) = R3. b) 3b2 − 2b3 = 0. c) No conditions, im(A) = R3. 33. rank(A) = 3. Columns 1,2,3 of A form a basis for im(A). rank(B) = 2. Columns 1,2 of B form a basis for im(B). rank(C) = 3. Columns 1,2,3 of C form a basis for im(C). rank(D) = 3. Columns 1,2,4 of D form a basis for im(D). rank(E) = 3. Columns 1,2,3 of E form a basis for im(E). 35. rank(A) = 3. Columns 1,3,4 of A form a basis for im(A). rank(B) = 3. Columns 1,3,4 of B form a basis for im(B). c©2020 School of Mathematics and Statistics, UNSW Sydney 228 CHAPTER 7 36. One possible answer is 12 −1 , 2−4 2 , 01 0 . 37. One possible answer is 1 2 3 4 , 3 3 3 0 , −1 1 4 8 , 1 0 0 0 . 38. a) 3 4 −1 0 0 0 3 −3 0 0 0 0 0 0 0 1 . b) 3 0 0 0 , −1 3 0 0 , 0 −3 0 1 , 3. c) 1. d) No. 46. T is linear. 50. T is a linear function. 53. S is not linear as Z is not a vector space. T is not linear as, for example, T (1.5) + T (1.5) = 2 + 2 = 4 while T (1.5 + 1.5) = T (3) = 3. 54. yL(s) = s2 + 9s+ 19 (s+ 3)2(s+ 1) . 56. a) 7− 2x c) 2 1 1 −1 . 57. T i2 −1 = (2 + i) + (7− 4i)z + 2z2 − 3iz3, T x1x2 x3 = (x1 − 2x3) + [(2 + i)x1 + (4− 3i)x2]z + x2z2 − 3x1z3. 58. b) 0 0 0 0 1 0 0 0 0 12 0 0 0 0 13 0 0 0 0 14 . c) {x, x2, x3, x4}. d) The empty set. 59. Let input vector be b = ( b1 b2 b3 b4 b5 )T , where b1, b2, b3, b4 and b5 are the amounts of steel, plastics, rubber, glass and labour used respectively. Let the output vector be x =( x1 x2 x3 x4 )T , where x1, x2, x3, x4 are the numbers of station wagons, 4-wheel drives, hatchbacks and sedans made. Then the factory is represented by the linear map c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS 229 TA : R 4 → R5, where TA(x) = b = Ax with A = 1 1.5 0.8 0.9 0.5 0.6 0.7 0.6 0.1 0.2 0.2 0.25 0.2 0.15 0.3 0.3 1 1.5 1.1 0.9 . 60. The matrix is ( 1 0 0 1 ) . 61. The matrix is (−1 −3 2 4 ) . 62. −6 + 4x 63. For 48, 1 −3 0 0 0 0 2 −3 0 0 0 0 0 1 0 0 3 −1 2 4 . For 49, 3 4 0 0 0 3 8 0 0 0 3 12 0 0 0 3 . For 50, −8 0 0 0 0 −4 0 0 0 0 0 0 0 0 0 4 ; For 51, 0 0 0 0 1 0 0 0 0 12 0 0 0 0 13 0 0 0 0 14 . 64. 48: im(T ) = { p ∈ P4(R) : p(z) = λ0 + λ1z + λ2z3 + λ3z4 for λ0, λ1, λ2, λ3 ∈ R } , (note z2 is not in im(T )) rank = 4, ker(T ) = {0}, nullity(T ) = 0. 49: im(T ) = P3(R), rank(T ) = 4, ker(T ) = {0}, nullity(T ) = 0. 50: im(T ) = { p ∈ P3(R) : p(x) = λ0 + λ1x+ λ2x3 for λ0, λ1, λ2 ∈ R } , rank(T ) = 3, ker(T ) = {p ∈ P3(R) : p(x) = λx2 for λ ∈ R}, nullity(T ) =1. 51: im(T ) = { p ∈ P4(R) : p(x) = λ1x+ λ2x2 + λ3x3 + λ4x4 for λ1, λ2, λ3, λ4 ∈ R } , rank(T ) = 4, ker(T ) = {0}, nullity(T ) = 0. 65. The matrix A is diagonal with diagonal elements akk = (k − 1)(k − 2) − 3(k − 1) + 3 for 1 6 k 6 n + 1. The kernel is α1x + α2x 3 and the nullity is 2. Note that the kernel is the solution of the homogeneous differential equation. 66. The matrix is 1√ 2 0 1√ 2 − 1√ 2 0 1√ 2 0 −1 0 . c©2020 School of Mathematics and Statistics, UNSW Sydney 230 CHAPTER 8 69. b) i) 1 1 −10 1 2 0 0 1 . ii) 1 2 00 1 4 0 0 1 . 70. b) λ ( 1 −3 −3 0 ) , λ ∈ F; 1. c) 3. d) No. e) 0 0 0 10 1 −1 0 3 1 0 0 . 71. a) 2 4 −2 4 , 1 2 −2 1 , 1 −3 2 −2 . b) 3. c) a4 − a1 − a3 − a2 = 0. d) 3 2 1 0 0 0 , −12 0 −3 −3 1 . e) 3, 2. Chapter 8 1. a) In each case, the eigenvalues are the diagonal entries and the respective eigenvectors are te1 and te2 (t 6= 0). For interpretations of (b), (c) and (d), see part (e) of question. 2. Eigenvalue is 2. 3. λ = (detA)1/3. 4. b) ( 0 1 1 0 ) . 5. a) λ = 2 with eigenvectors { t ( 1 2 ) : t 6= 0 } and λ = 3 with eigenvectors { t ( 2 3 ) : t 6= 0 } , b) λ = −3 with eigenvectors { t ( 1 1 ) : t 6= 0 } and λ = 1 with eigenvectors { t ( 1 3 ) : t 6= 0 } . 7. a) λ = 3 with eigenvectors { t ( 1 1 ) : t 6= 0 } and λ = −1 with eigenvectors { t ( −1 1 ) : t 6= 0 } . b) Only one eigenvalue λ = 2 with multiplicity 2 and eigenvectors { t ( 1 0 ) : t 6= 0 } . c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS 231 c) λ = 3 with eigenvectors { t ( 1 0 ) : t 6= 0 } and λ = −6 with eigenvectors { t ( −5 9 ) : t 6= 0 } . d) λ = 1± i with eigenvectors { t (−1± i 1 ) : t 6= 0 } . e) λ = 5± i√3 with eigenvectors { t ( ± √ 3 2 + 1 2 i 1 ) : t 6= 0 } . f) λ = 5±√5 with eigenvectors { t ( 1 2 (1∓ √ 5)i 1 ) : t 6= 0 } . 9. The eigenvalues are the diagonal entries, 2, −2, 3, 5. Corresponding eigenvectors are v1 = 1 0 0 0 , v2 = 1 1 0 0 , v3 = 1 1 5 0 , v4 = 25 −3 21 14 . 10. a) −1, 4, 6; −32 0 , 11 0 , 00 1 . b) 2,−3, 3; 0−1 6 , 0−1 1 , 10 0 . 11. In each of the following answers, the diagonal entries in D and the columns in M may be rearranged in the same way and the answer is still correct. Also, any column in M may be multiplied by a scalar and the new M is still correct. For Question 7: a) D = ( 3 0 0 −1 ) , M = ( 1 −1 1 1 ) . b) The matrix is not diagonalisable. c) D = ( 3 0 0 −6 ) , M = ( 1 −5 0 9 ) . d) D = ( 1 + i 0 0 1− i ) , M = ( 1 + i 1− i 1 1 ) . e) D = ( 5 + i √ 3 0 0 5− i√3 ) , M = (√ 3 2 + 1 2 i − √ 3 2 + 1 2 i 1 1 ) . f) D = ( 5 + √ 5 0 0 5−√5 ) , M = ( 1 2(1− √ 5)i 12(1 + √ 5)i 1 1 ) . For Question 9: D = 2 0 0 0 0 −2 0 0 0 0 3 0 0 0 0 5 , M = 1 1 1 25 0 1 1 −3 0 0 5 21 0 0 0 14 . c©2020 School of Mathematics and Statistics, UNSW Sydney 232 CHAPTER 8 For Question 10: a) D = −1 0 00 4 0 0 0 6 , M = −3 1 02 1 0 0 0 1 . b) D = 2 0 00 −3 0 0 0 3 , M = 0 0 1−1 −1 0 6 1 0 . 15. If v is an eigenvector of T then the coordinate vector [v]B of v with respect to the basis B is the corresponding eigenvector for the matrix A. 16. A5 = ( −78 330 55 −133 ) . 17. a) 6, t ( 1 2 ) , t 6= 0, t ∈ R; −4, (−3 4 ) , t 6= 0, t ∈ R. b) P = ( 1 −3 2 4 ) , D = ( 6 0 0 −4 ) . c) ( 6n (−3) (−4)n 2× 6n 4(−4)n ) . 18. When A is diagonalisable, Ak = MDkM−1 and xk = Akx0. As a check, if you put k = 0 in the answers below you should get A0 = I, whereas if you put k = 1 you should get A1 = A. For Question 7: a) Ak = 1 2 ( 3k + (−1)k 3k + (−1)k+1 3k + (−1)k+1 3k + (−1)k ) . c) Ak = 1 9 ( 3k+2 5(3k − (−6)k) 0 9(−6)k ) . 19. α1, α2, α3 and α4 are arbitrary real numbers. For Question 5: a) y(t) = α1e 3t ( 2 3 ) + α2e 2t ( 1 2 ) . b) y(t) = α1e t ( 1 3 ) + α2e −3t ( 1 1 ) . For Question 9: y(t) = α1e 2t 1 0 0 0 + α2e−2t 1 1 0 0 + α3e3t 1 1 5 0 + α4e5t 25 −3 21 14 . c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS 233 For Question 10: a) y(t) = α1e −t − 3 2 1 0 + α2e4t 11 0 + α3e6t 00 1 . b) y(t) = α1e 2t 0 −16 1 + α2e−3t 0−1 1 + α3e3t 10 0 . 20. a) 1, λ ( 3 −1 ) , λ 6= 0; 5, λ ( 1 1 ) , λ 6= 0. b) { x1 = 3αe t + βe5t, x2 = −αet + βe5t. 21. a) x = 300et−200e3t, y = 150et−50e3t. b) x = −500+600e−2t , y = −100+200e−2t . 22. The solutions by the two methods are: a) y(t) = ( y(t) y˙(t) ) = α1e t ( 1 1 ) + α2e t/5 ( 1 1 5 ) , (matrix method) y(t) = α1e t + α2e t/5. (calculus method) b) y(t) = ( y(t) y˙(t) ) = α1e 4t ( 1 4 ) + α2e −4t ( 1 −4 ) , (matrix method) y(t) = α1e 4t + α2e −4t. (calculus method) 23. The matrix method given in notes is not applicable as the matrix is not diagonalisable. 24. a) A = ( 0 1 − ca − ba ) . 25. a) 500 years — 0.9048 : 0.0928 : 0.0024 1000 years — 0.8187 : 0.1722 : 0.0091 1000000 years — 1.4 × 10−87 : 7× 10−44 : 1 b) The associated matrix is not diagonalisable. 26. In the 12th: 0.98 0.02 0.030.01 0.96 0.03 0.01 0.02 0.94 11 300300 300 ≈ 378293 229 , In the 24th: 0.98 0.02 0.030.01 0.96 0.03 0.01 0.02 0.94 23 300300 300 ≈ 426280 194 . 27. In the 12th: 339262 205 total = 806; In the 24th: 339222 154 total = 715. c©2020 School of Mathematics and Statistics, UNSW Sydney 234 CHAPTER 9 28. The population settles to the proportions 1.156 : 1.124 : 1.116 : 1.086 : 1 but eventually dies out. 31. a) 6, t 1−1 1 , t ∈ R; 7, t 20 1 , t ∈ R; 8, t 21 1 , t ∈ R. b) A = 1 2 2−1 0 1 1 1 1 , D = 6 0 00 7 0 0 0 8 . c) A−1 = −1 0 22 −1 −3 −1 1 2 , Mk = −6k + 4× 7k − 2× 8k −2× 7k + 2× 8k 2× 6k − 6× 7k + 4× 8k6k − 8k 8k −2× 6k + 2× 8k −6k + 2× 7k − 8k −7k + 8k 2× 6k − 3× 7k + 2× 8k Chapter 9 1. a) {a, c}, b) {f}, c) S, d) ∅, e) {a, b, c, d, e}, f) {f}, g) {b}, h) {b}. 2. 32. 3. 81%, 95.3%. 4. 26. 5. a) A ∩Bc ∩ Cc, b) A ∪B ∪C, c) (A ∩B) ∪ (A ∩ C) ∪ (B ∩ C), d) (A ∩Bc ∩ Cc) ∪ (Ac ∩B ∩Cc) ∪ (Ac ∩Bc ∩C), e) (Ac ∩B ∩ C) ∪ (A ∩Bc ∩ C) ∪ (A ∩B ∩ Cc). 6. a) 5 36 . b) 1 6 . c) 3 4 . 7. 2 3 . 8. a) 3 50 , b) 1 2 , c) 47 50 . 9. 32%, 5 17 . 10. a) 19 45 , b) 11 25 , c) 6 11 . c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS 235 11. a) 25.24%, b) 0.0131, c) 0.000545. 12. No. 13. a) pn; b) 1− (1− p)n. 15. P (A1 ∩A2 ∩ · · · ∩An) = P (An|A1 ∩ · · · ∩An−1)P (An−1|A1 ∩ · · · ∩An−2) · · ·P (A2|A1)P (A1), 56%, 33.6%, 22.4%. 19. x 0 1 2 P (X = x) 115 8 15 2 5 20. a) 0.214, b) 0.713. 21. 13 51 . 22. a) c = 1 e . b) P (X = 2) = 1 2e . c) P (X < 2) = 2 e . d) P (X > 4) = 1− 8 3e . 23. b) (⌊αn⌋)2 + 3⌊αn⌋+ 2 n2 + 3n+ 2 , c) n = 5. 24. a) θ(1− θ2n) (1− θ2n+1)(1 + θ) , b) 1− θn+1 1− θ2n+1 . 25. a) c = 0.1, b) E(X) = 2.5, Var(X) = 2.05, c) E(Y ) = −9, Var(Y ) = 32.8. 26. µ = n+ 1 2 ; σ2 = n2 − 1 12 . 28. µ = α 1− α ; σ 2 = α (1− α)2 . 29. 0.1123. 30. 70p4q4+ 56p5q3 +28p6q2 +8p7q+ p8 where p = 14 , q = 3 4 . This evaluates to 7459 65536 ≃ 0.1138. 31. 0.383. 32. 11. 34. P (X = k) = 2 3k for k = 0, 1, 2, 3, . . .. 35. a) ( 19 2 )( 1 6 )3(5 6 )17 . b) ( n− 1 k − 1 )( 1 6 )k (5 6 )n−k . c©2020 School of Mathematics and Statistics, UNSW Sydney 236 CHAPTER 9 37. a) B(15, 0.25). b) 0.08018. c) No. 38. a) B(23, 0.5). b) 2.86 × 10−6. c) Yes. 39. a) 6. b) B(8, 0.5) c) 0.1445. d) No. 40. a) 10. b) 12. c) B(12, 0.5). d) 0.01929. e) Yes. 42. a) µ = 1 2 (a+ b); σ2 = 1 12 (a− b)2. b) µ = k k − 1 for k > 1; σ 2 = k (k − 1)2(k − 2) for k > 2. c) µ = n+ 1, σ2 = n+ 1. d) µ = 0, σ2 = 2. 43. a) α = 1 π . b) c = −1. c) neither exists. 44. a) 9 16 , b) 7 16 , c) 0.4106, d) 49 12 . 45. a) 1 log 10 . b) F (y) = 0 y < 10 log(y/10) log 10 10 6 y 6 100 1 y > 100 . c) 103/2 ≈ 31.62. d) 90 log 10 ≈ 39.09. 46. a) The graph of F (x) is . . . 0 1 2 3 4 5 6 7 1/8 1/4 3/8 1/2 5/8 3/4 7/8 1 F (x) (5,1) (3, 14 ) x c©2020 School of Mathematics and Statistics, UNSW Sydney ANSWERS 237 b) The function f(x) = 0 x 6 2 1 4 2 < x 6 3 3 8 3 < x 6 5 0 x > 5 will do. Its graph is shown below. 0 1 2 3 4 5 6 7 1/8 1/4 3/8 1/2 f(x) x c) E(X) = 358 . 47. a) E(Y ) = 2 α + 3; Var(Y ) = 4 α2 . 48. a) 0.8907. b) 0.0107. c) 0.3594 d) 0.8925. e) 0.1359 f) 0.3830 49. a) 0.8413. b) 0.2514. c) 0.2514 d) 0.6915. e) 0.2789 f) 0.8854 50. a) 0.93. b) -1.65. c) 47 d) 32 51. a) 0.7299. b) 81 52. 0.0401. 53. a) 1.2%, b) 6.7%, c) 75.9%. 54. 2.3% over, 0.4% under. 55. 0.0122 56. a) Binomial(288,0.25). b) 288∑ k=88 ( 288 k )( 1 4 )k (3 4 )288−k . c) 0.017. d) Yes. 58. E(X) = Var(X) = 12 . 60. a) − 1λ log(1− p) b) 1λ log 2. 62. a) 0.487 b) 0.146 c) 0.264 d) 62.4 min. 63. a) 0.6703 b) 0.1353 c) more than 5.76 hours. c©2020 School of Mathematics and Statistics, UNSW Sydney 238 CHAPTER 9 64. 0.4493 65. Let λ = λ1 + · · ·+ λn a) P (T 6 t) = 1− e−λt, { 1− e−λt, t > 0 0, t 6 0. b) { λe−λt, t > 0 0, t 6 0. c) The exponential distribution Exp(λ), mean = 1λ , variance = 1 λ2 . c©2020 School of Mathematics and Statistics, UNSW Sydney INDEX 239 Index axiom, 3 basis, 35 ordered, 46 orthonormal, 38 basis by extension, 44 basis by reduction, 42 characteristic polynomial, 140 column space, 21 coordinate vector, 46 counterexample, 8 cumulative distribution function, 184 defective matrix, 143 diagonalisable matrix, 146 dimension, 40 Distribution Exponential, 203 eigenvalue linear map, 138 matrix, 138 eigenvector linear map, 138 matrix, 138 exponential distribution standard, 204 function, 63 addition, 63 codomain, 63 composition, 63 domain, 63 equality, 63 injective, 120 inverse, 121 multiplication, 63 multiplication by a scalar, 63 one-to-one, 120 onto, 120 range, 120 surjective, 120 image, 94 function, 120 linear map, 98 matrix, 98 kernel, 94 linear map, 95 matrix, 96 Laplace transform, 107 linear combination, 17 linear independence, 24 linear map, 80 addition condition, 80 and linear combination, 83, 84 inverse, 116 matrix representation, 86, 110 one-to-one, 116 onto, 116 scalar multiplication condition, 80 linear transformation, 80 linearly dependent set, 25 linearly independent set, 25 Markov chains, 154 normal distribution standard, 199 null space, 94 nullity, 97 Phantasmagoria, 1 polynomial function, 56 polynomial, characteristic, 140 powers of matrix, 147 c©2020 School of Mathematics and Statistics, UNSW Sydney 240 INDEX probability density function, 196 projection, 93 random variable continuous, 195 rank, 100 Rank-Nullity Theorem, 119 linear map, 101 matrix, 101 rotation, 91 sets, 62 equality, 62 intersection, 63 union, 63 span, 17 spanning set, 18 standard deviation, 198 subset, 62 proper, 62 subspace, 11, 13 proper, 13 subspace theorem, 13 alternative, 49 variance, 198 vector coordinate, 46 vector space Mmn, 7, 50 Cn, 6 P, Pn, 7, 57, 58 Rn, 3 R[X], 8, 52 associative law, 3 cancellation property, 9 commutative law, 3 definition, 3 distributive law, 3 finite dimensional, 40 欢迎咨询51作业君