辅导案例-MATH1231

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
MATH1231 Mathematics 1B
MATH1241 Higher Mathematics 1B
ALGEBRA NOTES
Copyright 2020 School of Mathematics and Statistics, UNSW Sydney
ii
c©2020 School of Mathematics and Statistics, UNSW Sydney
iii
Preface
Please read carefully.
These Notes form the basis for the algebra strand of MATH1231 and MATH1241. However, not
all of the material in these Notes is included in the MATH1231 or MATH1241 algebra syllabuses.
A detailed syllabus will be uploaded to Moodle.
In using these Notes, you should remember the following points:
1. It is essential that you start working right from the beginning of the session and continue to
work steadily throughout the session. Make every effort to keep up with the lectures and to
do problems relevant to the current lectures.
2. These Notes are not intended to be a substitute for attending lectures or tutorials. The
lectures will expand on the material in the notes and help you to understand it.
3. These Notes may seem to contain a lot of material but not all of this material is equally
important. One aim of the lectures will be to give you a clearer idea of the relative importance
of the topics covered in the Notes.
4. Use the tutorials for the purpose for which they are intended, that is, to ask questions about
both the theory and the problems being covered in the current lectures.
5. The theory (i.e. the theorems and proofs) is regarded as an essential part of the Algebra
course. A list of the theory that you should know is given on page ix.
6. Some of the material in these Notes is more difficult than the rest. This harder material
is marked with the symbol [H]. Material marked with an [X] is intended for students in
MATH1241.
7. It is essential for you to do problems which are given at the end of each chapter in addition
to the online tutorials that can be found on Moodle. If you find that you do not have time to
attempt all of the problems, you should at least attempt a representative selection of them.
The problems set in tests and exams will be similar to the problems given in these notes.
8. You will probably find some of the ideas in Chapters 6 and 7 quite difficult at first because
they are expressed in a general and abstract manner. However, as you work through the
examples in the chapters and the problems at the ends of the chapters you should find that
the ideas become much clearer to you.
9. You will be expected to use the computer algebra package Maple in tests and understand
Maple syntax and output for the end of term examination.
c©2020 School of Mathematics and Statistics, UNSW Sydney
iv
10. You should keep these Notes for use in 2nd year subjects on Linear Algebra.
Note.
These notes have been prepared by a number of members of the University of New South
Wales. The main contributors include Peter Blennerhassett, Peter Brown, Shaun Disney, Ian Doust,
William Dunsmuir, Peter Donovan, David Hunt, Elvin Moore and Colin Sutherland. Chapter 9
was written by Dr. Thomas Britz based on the notes of Prof. William Dunsmuir. The original
problems for this chapter came from MATH1151. They were reorganised and expanded by Dr. Chi
Mak. Copyright is vested in The University of New South Wales, c©2020.
c©2020 School of Mathematics and Statistics, UNSW Sydney
CONTENTS v
Contents
Preface iii
Algebra Syllabus viii
Syllabus and lecture timetable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Problem schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Theory in the algebra component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Revision xi
Important facts from MATH1131/1141 . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Revision problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
6 VECTOR SPACES 1
6.1 Definitions and examples of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . 2
6.2 Vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6.4 Linear combinations and spans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6.4.1 Matrices and spans in Rm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.4.2 Solving problems about spans . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.5 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.5.1 Solving problems about linear independence . . . . . . . . . . . . . . . . . . . 27
6.5.2 Uniqueness and linear independence . . . . . . . . . . . . . . . . . . . . . . . 31
6.5.3 Spans and linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.6 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.6.1 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.6.2 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.6.3 Existence and construction of bases . . . . . . . . . . . . . . . . . . . . . . . 41
6.7 [X] Coordinate vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.8 [X] Further important examples of vector spaces . . . . . . . . . . . . . . . . . . . . 49
6.8.1 Vector spaces of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.8.2 Vector spaces associated with real-valued functions . . . . . . . . . . . . . . . 52
6.8.3 Vector spaces associated with polynomials . . . . . . . . . . . . . . . . . . . . 55
6.9 A brief review of set and function notation . . . . . . . . . . . . . . . . . . . . . . . 62
6.9.1 Set notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.9.2 Function notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.10 Vector spaces and MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
c©2020 School of Mathematics and Statistics, UNSW Sydney
vi
Problems for Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7 LINEAR TRANSFORMATIONS 79
7.1 Introduction to linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 Linear maps from Rn to Rm and m× n matrices . . . . . . . . . . . . . . . . . . . . 85
7.3 Geometric examples of linear transformations . . . . . . . . . . . . . . . . . . . . . . 88
7.4 Subspaces associated with linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.4.1 The kernel of a map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.4.2 Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.4.3 Rank, nullity and solutions of Ax = b . . . . . . . . . . . . . . . . . . . . . . 101
7.5 Further applications and examples of linear maps . . . . . . . . . . . . . . . . . . . . 103
7.6 [X] Representation of linear maps by matrices . . . . . . . . . . . . . . . . . . . . . 109
7.7 [X] Matrix arithmetic and linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . 114
7.8 [X] One-to-one, onto and invertible linear maps and matrices . . . . . . . . . . . . . 115
7.8.1 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.9 [X] Proof of the Rank-Nullity Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 119
7.10 One-to-one, onto and inverses for functions . . . . . . . . . . . . . . . . . . . . . . . 120
7.11 Linear transformations and MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Problems for Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8 EIGENVALUES AND EIGENVECTORS 137
8.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.1.1 Some fundamental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.1.2 Calculation of eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . 141
8.2 Eigenvectors, bases, and diagonalisation . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.3 Applications of eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 146
8.3.1 Powers of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
8.3.2 Solution of first-order linear differential equations . . . . . . . . . . . . . . . . 149
8.3.3 [X] Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
8.4 Eigenvalues and MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Problems for Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
9 INTRODUCTION TO PROBABILITY AND STATISTICS 165
9.1 Some Preliminary Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
9.2.1 Sample Space and Probability Axioms . . . . . . . . . . . . . . . . . . . . . . 170
9.2.2 Rules for Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.2.3 Conditional Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.2.4 Statistical Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
9.3 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.3.1 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
9.3.2 The Mean and Variance of a Discrete Random Variable . . . . . . . . . . . . 186
9.4 Special Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.4.1 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.4.2 Geometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
c©2020 School of Mathematics and Statistics, UNSW Sydney
vii
9.4.3 Sign Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
9.5 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
9.5.1 The mean and variance of a continuous random variable . . . . . . . . . . . . 197
9.6 Special Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
9.6.1 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
9.6.2 [X] The Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 203
9.6.3 Useful Web Applets to Illustrate Probability Reasoning . . . . . . . . . . . . 206
9.7 Probability and MAPLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Problems for Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
ANSWERS TO SELECTED PROBLEMS 221
Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Chapter 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
INDEX 239
c©2020 School of Mathematics and Statistics, UNSW Sydney
viii
ALGEBRA SYLLABUS AND LECTURE TIMETABLE
The algebra course for both MATH1231 and MATH1241 is based on chapters 6 to 9 of the
Algebra Notes. Lecturers will not cover all of the material in these notes in their lectures as some
sections of the notes are intended for reference and for background reading. A detailed syllabus
and lecture schedule will be uploaded to Moodle.
PROBLEM SETS
At the end of each chapter there is a set of problems. Some of the problems are very easy, some
are less easy but still routine and some are quite hard. To help you decide which problems to try
first, each problem is marked with an [R], an [H] or an [X]. The problems marked [R] form a basic
set of problems which you should try first. Problems marked [H] are harder and can be left until
you have done the problems marked [R]. You do need to make an attempt at the [H] problems
because problems of this type will occur on tests and in the exam. If you have difficulty with the
[H] problems, ask for help in your tutorial.
The problems marked [X] are intended for students in MATH1241 – they relate to topics which
are only covered in MATH1241.
Extra problem sheets for MATH1241 may be issued in lectures.
There are a number of questions marked [M], indicating that Maple is required in the solution
of the problem. Questions marked with a [V] have a video solution available from the Moodle
course page.
c©2020 School of Mathematics and Statistics, UNSW Sydney
ix
ALGEBRA PROBLEM SCHEDULE
Solving problems and writing mathematics clearly are two separate skills that need to be devel-
oped through practice. We recommend that you keep a workbook to practice writing solutions to
mathematical problems. The range of questions suitable for each week will be provided on Moodle
along with a suggestion of specific recommended problems to do before your classroom tutorials.
The Online Tutorials will develop your problem solving skills, and give you examples of math-
ematical writing. Online Tutorials help build your understanding from lectures towards solving
problems on your own.
THEORY IN THE ALGEBRA COURSE
The theory is regarded as an essential part of this course and it will be examined both in class tests
and in the end of year examination.
You should make sure that you can give DEFINITIONS of the following ideas:
Chapter 6. Subspace of a vector space, linear combination of a set of vectors, span of a set of
vectors, linear independence of a set of vectors, spanning set for a vector space, basis for a vector
space, dimension of a vector space.
Chapter 7. Linear function, kernel and nullity of a linear function, image and rank of a linear
function.
Chapter 8. Eigenvalue and eigenvector, diagonalizable matrix.
Chapter 9. Probability, statistical independence, conditional probability, discrete random vari-
able, expected value (mean) of a random variable, variance of a random variable, binomial distri-
bution, geometric distribution.
You should be able to give STATEMENTS of the following theorems and propositions.
Chapter 6. Theorem 1 of §6.3, Propositions 1 and 3 and Theorem 2 of §6.4, Proposition 1 and
Theorems 2, 3, 4, 5 and 6 of §6.5, Theorems 1, 2, 3, 4, 5, 6 and 7 of §6.6.
Chapter 7. Theorems 2, 3 and 4 of §7.1, Theorem 1 and 2 of §7.2, Proposition 7 and Theorems
1, 5, 8, 9 and 10 of §7.4.
Chapter 8. Theorems 1, 2 and 3 of §8.1, Theorem 1 and 2 of §8.2.
You should be able to give PROOFS of the following theorems and propositions.
Chapter 6. Theorem 2 of §6.4, Theorems 2 and 3 of §6.5, Theorem 2 of §6.6.
Chapter 7. Theorem 2 of §7.1, Theorem 1 of §7.2, Theorems 1, 5 and 8 of §7.4.
Chapter 8. Theorem 1 of §8.1.
c©2020 School of Mathematics and Statistics, UNSW Sydney
xc©2020 School of Mathematics and Statistics, UNSW Sydney
REVISION xi
Revision
Some important facts from MATH1131/1141
In the next couple of chapters, we shall frequently refer to some subsets of Rn, such as lines and
planes, etc. As well as the two operations addition and multiplication by a scalar, we shall also
refer some other operations such as dot and cross products, etc. However, for ease of reading some
definitions are restated below. When necessary, you should refer to the 1131/41 Algebra Notes.
• Suppose that n > 1. A parametric vector equation of a line in Rn through a point A and
parallel to a non-zero vector v is given by
x = a+ λv, λ ∈ R,
where a is the position vector of A (with respect to the origin O) and x is the position vector
of a variable point on the line.
• Suppose that n > 2. A parametric vector equation of a plane in Rn through a point A and
parallel to two non-zero non-parallel vectors u,v is given by
x = a+ λu+ µv, λ, µ ∈ R,
where a is the position vector of A and x is the position vector of a variable point on the
plane.
• Suppose that a and b are two vectors in Rn, n > 1. The dot product is defined by
a · b = a1b1 + · · · + anbn.
The length of a vector a is defined to be

a · a. (This definition is equivalent to the one given
in the 1131/41 Algebra Notes.)
The vectors are said to be orthogonal if a ·b = 0. A set of vectors is said to be an orthonormal
set if the lengths of each vectors is 1 and the vectors are mutually orthogonal.
The projection of a vector a on the vector b is
projba =
a · b
|b|2 b.
• Suppose that a and b are two vectors in R3. An equivalent definition of cross product in
determinant form is given by
a× b =
∣∣∣∣∣∣
e1 e2 e3
a1 a2 a3
b1 b2 b3
∣∣∣∣∣∣ , where a =

a1a2
a3

 , b =

b1b2
b3

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
xii
• The following is a point-normal form of a plane in R3 which passes through a point A and
has a normal vector n.
n · (x− a) = 0,
where a is the position vector of A and x is the position vector of a variable point on the
plane.
Revision problems
1.[R] Let A, B, P be points in R3 with position vectors
a =

 7−2
3

, b =

 1−5
0

 and p =

 1−1
2

.
Let Q be the point on segment between A and B such that AQ = 23AB.
i) Find q, the position vector of Q.
ii) Find the parametric vector equation of the line that passes through P and Q.
2. [R] Consider the three points A(1, 1, 1), B(2, 0, 3) and C(3,−1, 1).
i) Find
−−→
AB and
−→
AC.
ii) Find a parametric vector form of the line through A and B.
iii) Find a parametric vector form of the plane through A, B and C.
iv) Find
−−→
AB ×−→AC.
v) Find a point-normal form of the plane through A, B and C.
vi) Find a Cartesian equation of the plane through A, B and C.
3. [R] Given the vectors p =

 11
−1

 and q =

 21
−1

, find |p|, |q|, p · q, then the cosine of the
angle between p and q.
4. [R] Consider the equation
det

 x− 1 y − 2 z + 11 0 2
2 −1 0

 = 0
i) Show that the equation represents the Cartesian equation of a plane.
ii) Write the equation in point-normal form.
5. [R] For the points P (1, 2, 0) , Q(1, 3,−1) and R(2, 1, 1), find −−→PQ × −→PR and the area of the
triangle with vertices P, Q and R.
c©2020 School of Mathematics and Statistics, UNSW Sydney
xiii
6. [R] Suppose that A is the point (2,−1, 3) and Π is the plane
x = λ

10
1

+ µ

 1−2
2

 for λ, µ ∈ R.
i) Find a vector n which is normal to Π.
ii) Find the projection of
−→
OA on the direction n.
iii) Hence find the shortest distance of A from Π.
7. [R] Find the intersection (if any) of the line x =

 018
1

+ µ

 2−3
1

 for µ ∈ R and the plane
x =

10
4

+ λ1

14
1

+ λ2

 31
−2

 for λ1, λ2 ∈ R.
8. [R] Are the planes
x =


1
−4
2
3

+ λ1


2
1
−2
7

+ λ2


−3
1
5
2

 for λ1, λ2 ∈ R
and
x =


2
−4
1
3

+ µ1


3
−1
2
4

+ µ2


−1
4
2
6

 for µ1, µ2 ∈ R
parallel?
9. [R] Consider the following system of linear equations where k is a real number.
x1 + x2 + 2k x3 = 3
x1 + 3k x2 + x3 = −2
2x1 + 6k x2 + k x3 = 1
Find for which values of k the system has (I) no solutions, (II) a unique solution or (III)
infinitely many solutions.
c©2020 School of Mathematics and Statistics, UNSW Sydney
xiv
Answers to the revision problems
1i.

 3−4
1

. ii. x =

 1−1
2

+ λ

 2−3
−1

 for λ ∈ R.
2i.

 1−1
2

 ,

 2−2
0

. ii. x =

11
1

+ λ

 1−1
2

 for λ ∈ R.
iii. x =

11
1

+ λ

 1−1
2

+ µ

 2−2
0

 for λ, µ ∈ R. iv.

44
0

.
v.

44
0

 ·

x−

11
1



 = 0.
vi. x+ y − 2 = 0.
3.

3,

6, 4,
2

2
3
.
4i. 2x+ 4y − z = 11. ii.

 24
−1

 ·



xy
z

−

 00
−11



 = 0
5.

 0−1
−1

 , √2
2
.
6i.

 2−1
−2

, ii. −1
9

 2−1
−2

, iii. 1
3
.
7. Meet at (6, 9, 4).
8. Planes are not parallel as λ1


2
1
−2
7

+ λ2


−3
1
5
2

 = µ1


3
−1
2
4

+ µ2


−1
4
2
6


only when λ1 = λ2 = µ1 = µ2 = 0.
9. (I) If either k = 2 or k = 1/3, then the system will have no solutions.
(II) If k 6= 2 and k 6= 1/3 then the system will a unique solution.
(III) There are no k values which give a system with infinitely many solutions.
c©2020 School of Mathematics and Statistics, UNSW Sydney
1Chapter 6
VECTOR SPACES
But, keeping still the end in view
To which I hope to come,
I strove to prove the matter true
By putting everything I knew
Into an Axiom
Lewis Carroll, Phantasmagoria.
We have studied geometric vectors and column vectors in Chapter 1 and matrices in Chapter 5.
What do the following sets have in common?
• The set of geometric vectors in a three-dimensional space.
• The set of column vectors of n real components, i.e. Rn.
• The set of m× n matrices with real entries, i.e. Mmn(R).
In each of these sets, we can add two elements and we can multiply an element in the set with
a scalar (in this case, a scalar is a real number) and remain inside the set we started in. We say
that each set is closed under the two operations — addition and multiplication by a scalar. Such
a set together with scalars and the two operations satisfy some fundamental properties. As what
we have seen in Chapters 1 and 5, addition of vectors and matrices satisfies the associative and
commutative laws. There is a special element 0 in each set such that 0 + v = v + 0 = v for all
v in the set. For each v in the set, there is a negative −v such that v + (−v) = (−v) + v = 0.
The associative law of scalar multiplication and distributive laws also hold. These are examples
of vector spaces which we are going to study in this chapter. In a vector space, each element in
the set is called a vector. The set of scalars must be a field, generally the set of real or complex
numbers.
Besides the above mentioned vector spaces, there are many other examples, such as the set of
polynomials with real or complex coefficients, the real functions on a given interval and the differ-
entiable functions defined on an interval, which form a vector space. It is perhaps a remarkable fact
that each of these quite different kinds of objects obey similar rules for addition and multiplication
by a scalar.
In the present chapter our main objectives are to develop a general theory of vector spaces and
to show how this general theory can be applied to the study of particular vector spaces. Although
c©2020 School of Mathematics and Statistics, UNSW Sydney
2 CHAPTER 6. VECTOR SPACES
all of the theorems and propositions stated in this chapter are true for all vector spaces, we will
concentrate on giving examples and applications of the theory for the vector space Rn. The main
reason for this is that in Rn the theoretical results can be more easily understood, as they can often
be given a geometric interpretation. Also, Rn is the most commonly used vector space in practical
applications.
As the mathematics developed in this chapter applies to all vector spaces, it is more abstract
and theoretical than that of Chapter 1, where there was always an immediate geometric picture
available for all results.
Another reason for the difficulty is that an appreciable part of the language of vector spaces
will be new to you. Therefore, as with any new language, it is absolutely essential that you
make a special effort to learn the definitions of any new words. You will find that many
of the fundamental vector space ideas discussed in this chapter, such as linear combination, span,
linear independence, basis, dimension, and coordinate vector, are generalisations of ideas that we
have already discussed in an informal geometric manner in Chapter 1 for vectors in Rn.
In addition, you will also find that the solution of most of the problems in this chapter can be
obtained by solving systems of linear equations using Gaussian Elimination developed in Chapter 4.
In most cases the details are suppressed, but the reader should check them before attempting the
exercises.
It is important to keep in mind in this chapter that correct setting out is essential, rather than
just computation. When you write down a solution to a question, you must make sure it reads
correctly, both logically and mathematically.
6.1 Definitions and examples of vector spaces
We start from a mathematical system which consists of the following four things.
1. A non-empty set V of elements called “vectors”.
2. A “vector-addition” rule (usually represented by +) for combining pairs of vectors from V .
For vectors v, w ∈ V , the vector formed by adding w to v is denoted by v +w.
3. A field F of “scalars”. For example, F could be the rational numbers Q or the real numbers
R or the complex numbers C. There are also other important, but less common, examples of
fields which can be used.
4. A “multiplication by a scalar rule” for combining a vector from V and a scalar from F to form
a vector. If λ is a scalar and v is an element of V , then λ ∗ v means the result of multiplying
v by the scalar λ.
The system is then denoted by (V,+, ∗,F). However, if we have no problem in distinguishing
the product of two scalars λµ and the multiplication of a vector by a scalar λ ∗ v, we shall omit
the symbol ∗. Just as we write 2x instead of 2× x, we shall write λv instead of λ ∗ v.
We can now give a formal definition of a vector space.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.1. DEFINITIONS AND EXAMPLES OF VECTOR SPACES 3
Definition 1. A vector space V over the field F is a non-empty set V of vectors
on which addition of vectors is defined and multiplication by a scalar is defined in
such a way that the following ten fundamental properties are satisfied:
1. Closure under Addition. If u,v ∈ V , then u+ v ∈ V .
2. Associative Law of Addition. If u,v,w ∈ V , then (u+v)+w = u+(v+w).
3. Commutative Law of Addition. If u,v ∈ V , then u+ v = v + u.
4. Existence of Zero. There exists an element 0 ∈ V such that, for all v ∈ V ,
v+ 0 = v.
5. Existence of Negative. For each v ∈ V there exists an element w ∈ V
(usually written as −v), such that v +w = 0.
6. Closure under Multiplication by a Scalar. If v ∈ V and λ ∈ F, then
λv ∈ V .
7. Associative Law of Multiplication by a Scalar. If λ, µ ∈ F and v ∈ V ,
then λ(µv) = (λµ)v.
8. If v ∈ V and 1 ∈ F is the scalar one, then 1v = v.
9. Scalar Distributive Law. If λ, µ ∈ F and v ∈ V , then (λ+ µ)v = λv+ µv.
10. Vector Distributive Law. If λ ∈ F and u,v ∈ V , then λ(u+v) = λu+λv.
Note.
1. Each of the basic rules is called an axiom.
2. In axiom 5, −v is a symbol for the negative of v. The vector −v and the vector formed by
multiplying v with the scalar −1 are not the same by definition. We shall prove that they
are the same later.
3. Formally, axiom 7 says λ ∗ (µ ∗ v) = (λµ) ∗ v.
4. In axiom 9, the addition on the left is the addition of two scalars while the addition on the
right is the addition of two vectors. They are different additions.
5. Two systems are the same only when all the four things are the same. However, we seldom
discuss different vector spaces with the same set of vectors but we often discuss different sets
of vectors with the same set of scalars and the same operations. When there is no confusion,
we shall simply call (V,+, ∗,F), the vector space V .
c©2020 School of Mathematics and Statistics, UNSW Sydney
4 CHAPTER 6. VECTOR SPACES
Example 1 (The Vector Space Rn). The set of vectors is the set of all n-vectors of real numbers,
Rn =

x : x =

x1...
xn

 for x1, . . . , xn ∈ R

 .
The set of scalars is R.
Vector addition is defined by 
x1...
xn

+

y1...
yn

 =

x1 + y1...
xn + yn

 .
The multiplication of a vector by a scalar λ ∈ R is defined by
λ

x1...
xn

 =

λx1...
λxn

 .
To prove that this system is a vector space it is necessary to show that all ten axioms listed in the
definition are satisfied by the system.
All the axioms are general statements about arbitrary vectors and scalars. We have to prove
the axioms are satisfied by any
u =

u1...
un

 , v =

v1...
vn

 , w =

w1...
wn

 , and λ, µ ∈ R.
1. Closure under addition. If u,v ∈ Rn then u1 + v1, . . . , un + vn ∈ R because R is closed
under addition. Hence
u+ v =

u1 + v1...
un + vn

 ∈ Rn.
2. Associative law of addition. If u,v,w ∈ Rn then
(u1 + v1) + w1 = u1 + (v1 + w1), . . . , (un + vn) + wn = un + (vn + wn)
because addition in R is associative. Hence
(u+ v) +w =

u1 + v1...
un + vn

+

w1...
wn

 =

 (u1 + v1) + w1...
(un + vn) + wn


=

u1 + (v1 + w1)...
un + (vn + wn)

 =

u1...
un

+

v1 + w1...
vn + wn

 = u+ (v +w).
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.1. DEFINITIONS AND EXAMPLES OF VECTOR SPACES 5
3. Commutative law of addition. If u,v ∈ Rn then
u1 + v1 = v1 + u1, . . . , un + vn = vn + un
because addition in R is commutative. Hence,
u+ v =

u1 + v1...
un + vn

 =

v1 + u1...
vn + un

 = v + u.
4. Existence of zero. There is a special element
0 =

0...
0

 ∈ Rn
called the zero vector which has the property that
v+ 0 =

v1 + 0...
vn + 0

 =

v1...
vn

 = v, for all v ∈ Rn.
5. Existence of Negative. For each v ∈ Rn there exists an element
−v1...
−vn

 ∈ Rn,
the negative of v, such that
v1...
vn

+

−v1...
−vn

 =

v1 − v1...
vn − vn

 = 0.
6. Closure under scalar multiplication. If v ∈ Rn and λ ∈ R then λv1, . . . , λvn ∈ R because
R is closed under multiplication. Hence λv ∈ Rn.
7. Associative law of multiplication by a scalar. If λ, µ ∈ R and v ∈ Rn then
λ(µv1) = (λµ)v1, . . . , λ(µvn) = (λµ)vn
because multiplication in R is associative. Hence,
λ(µv) = λ

µv1...
µvn

 =

λ(µv1)...
λ(µvn)

 =

(λµ)v1...
(λµ)vn

 = (λµ)

v1...
vn

 = (λµ)v.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6 CHAPTER 6. VECTOR SPACES
8. If v ∈ Rn then 1v =

1v1...
1vn

 =

v1...
vn

 = v.
9. Scalar distributive law. If λ, µ ∈ R and v ∈ Rn then
(λ+ µ)v1 = λv1 + µv1, . . . , (λ+ µ)vn = λvn + µvn,
because of the distributive law in R. We then have
(λ+ µ)

v1...
vn

 =

(λ+ µ)v1...
(λ+ µ)vn

 =

λv1 + µv1...
λvn + µvn


=

λv1...
λvn

+

µv1...
µvn

 = λ

v1...
vn

+ µ

v1...
vn

 .
Hence (λ+ µ)v = λv + µv.
10. Vector distributive law. If λ ∈ R and u,v ∈ Rn then
λ(u1 + v1) = λu1 + λv1, . . . , λ(un + vn) = λun + λvn,
because of the distributive law in R. We then have
λ

u1 + v1...
un + vn

 =

λ(u1 + v1)...
λ(un + vn)

 =

λu1 + λv1...
λun + λvn

 = λ

u1...
un

+ λ

v1...
vn

 .
Hence λ(u+ v) = λu+ λv.
After we have checked that all ten axioms are satisfied, we can conclude that the system is a vector
space, or simply Rn is a vector space over R. ♦
Note. As special cases of Rn, the real number line R, the plane R2 and three-dimensional space
R3 are all vector spaces over the real numbers.
Example 2 (The Vector Space Cn). The set of vectors is the set of all column vectors with n
components of complex numbers
Cn =

x : x =

x1...
xn

 for x1, . . . , xn ∈ C

 ,
and the set of scalars is C. Addition is defined by
x1...
xn

+

y1...
yn

 =

x1 + y1...
xn + yn

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.1. DEFINITIONS AND EXAMPLES OF VECTOR SPACES 7
and multiplication of a vector by a scalar λ ∈ C is defined by
λ

x1...
xn

 =

λx1...
λxn

 .
To prove that this is a vector space it is necessary to show that the ten vector space axioms
are satisfied. This proof is formally identical to that for Rn over R since only basic operations are
involved. ♦
Example 3 (The Vector Space Mmn = Mmn(R) of Real Matrices). Mmn is a vector space over
R. There is a natural and straightforward generalisation to Mmn(F), where the entries come from
the field F. In the most important case of F = R we often suppress the R. Note that here we are
thinking of matrices as vectors!
Mmn =Mmn(R) =

A : A =

a11 . . . a1n... . . . ...
am1 . . . amn

 , where aij ∈ R for 1 6 i 6 m, 1 6 j 6 n

 .
Using the notation introduced in Chapter 5, the ijth entry (ith row, jth column entry) of A is
denoted by [A]ij . We define “addition of vectors” to be matrix addition where
[A+B]ij = [A]ij + [B]ij , for all i, j.
Similarly we define the “multiplication (of the vector A) by a scalar λ ∈ R” in terms of the
multiplication of a matrix by a scalar. That is,
[λA]ij = λ[A]ij for all i, j
as in Chapter 5.
To check that Mmn is a vector space is routine. All of the properties are included amongst the
properties developed for matrices. For example
A+B = B +A for matrices of the same size,
hence the commutative law holds for the set Mmn regarded as a vector space. The details are left
for the reader to check. ♦
Note. Mmn(R) and Mmn(C) are widely used in both quantum physics and chemistry.
Example 4 (The Vector Space of Polynomials). One of the most important aspects of vector space
theory is that it applies in many quite different situations. The set of all real-valued functions on
R forms a vector space, as does the set of all continuous functions. A simpler example, perhaps, is
the set P(R) of all real polynomials.
Suppose that p is the polynomial given by p(x) = a0 + a1x+ · · ·+ anxn =
∑n
k=0 akx
k and q is
the polynomial given by q(x) =
∑m
k=0 bkx
k where ak and bk are real. Note that p is a real-valued
function while p(x) is the value of the function at x. (You might like to quickly read the brief
review of function notation given in Appendix 6.9.)
c©2020 School of Mathematics and Statistics, UNSW Sydney
8 CHAPTER 6. VECTOR SPACES
We all know how to add and subtract these polynomials, and how to multiply a polynomial by
a real number. Their sum is just the polynomial p+ q, where the value of the function at x is
(p+ q)(x) = p(x) + q(x) =
max(n,m)∑
k=0
(ak + bk)x
k, x ∈ R.
(Of course we just set any missing coefficient equal to zero to do this sum.) The scalar multiple is
the polynomial λp where
(λp)(x) = λ(p(x)) =
n∑
k=0
λakx
k, x ∈ R.
The proof that P(R) is a vector space over R is straightforward. For example, if p and q are
polynomials, then p + q is also a polynomial (Closure under Addition). The zero element of P(R)
is just the polynomial p such that p(x) = 0, for all x ∈ R. It is important when working in P(R)
to remember that saying that two polynomials p and q are equal means that p(x) = q(x) for all
x ∈ R, or equivalently, that the corresponding coefficients for p and q are equal. Please check the
details yourself. ♦
Note. The set P(F) of all polynomials over a field F, with addition and multiplication by a scalar
similarly defined, is also a vector space.
For any non-negative integers n, the subset of all polynomials of degrees n or less including the
zero polynomial is again a vector space. That is,
Pn(F) = {p : p is a polynomial over F, degree of p 6 n or p = 0},
with the same addition and multiplication by a scalar as in P(F), is a vector space over F.
As a summary, we know that the following are vectors spaces.
• Rn over R, where n is a positive integer.
• Cn over C, where n is a positive integer.
• P(F), Pn(F) over F, where F is a field. Usually F is either Q, R or C.
• Mmn(F) over F, where m, n are positive integers and F is a field.
Furthermore, the following set, its subset of all continuous function and its subset of all differentiable
functions are vector spaces over R.
• R[X], the set of all possible real-valued functions with domain X.
When we refer to these vectors spaces, we assume the additions and multiplications by scalars are
as defined above. However, there are systems with the same set of vectors but different operations.
If we want to emphasise that the operations are the ones defined above, we shall use the terms
usual addition and usual multiplication by a scalar.
We shall concentrate on the vector space Rn. Those who want to see other examples of vector
spaces can look at Section 6.8, where other vector spaces are covered in rather more depth.
We are going to end this section by an example of a system which is not a vector space. To
prove a system is not a vector space, we need to prove that one of the axioms is not satisfied.
To disprove a general statement, we only need to give a counterexample to illustrate that the
statement is false.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.2. VECTOR ARITHMETIC 9
Example 5. The system (R2,+, ∗,R) with the usual multiplication by a scalar but the addition
defined by — for any u =
(
u1
u2
)
and v =
(
v1
v2
)
in R2,
u+ v =
(
u1 + v1
2u2 + 2v2
)
,
is not a vector space.
Solution. We are going to give a counterexample to axiom 9. Let λ = µ = 1 and v =
(
0
1
)
.
We have
(λ+ µ)v = 2
(
0
1
)
=
(
0
2
)
6=
(
0
4
)
=
(
0 + 0
2× 1 + 2× 1
)
=
(
0
1
)
+
(
0
1
)
= λv + µv.
Hence, axiom 9 is not satisfied by this system and this system is not a vector space. ♦
6.2 Vector arithmetic
The axioms give a minimal set of rules needed to define a vector space. There are several other
useful rules which can be proved directly from the axioms and which are therefore true in all vector
spaces. In this section we shall discuss some of these properties and give examples of them for Rn.
The first five vector space axioms apply to vector addition, and they are in fact identical to the
five basic axioms of addition for integers, real numbers and complex numbers. This means that all
the arithmetic properties of vector addition are identical to corresponding properties of addition of
numbers. In particular, we have:
Proposition 1. In any vector space V , the following properties hold for addition.
1. Uniqueness of Zero. There is one and only one zero vector.
2. Cancellation Property. If u,v,w ∈ V satisfy u+ v = u+w, then v = w.
3. Uniqueness of Negatives. For all v ∈ V , there exists only one w ∈ V such that v+w = 0.
Proof. For property 1, Axiom 4 ensures the existence of a zero vector in V .
Now assume that two vectors 0 and 0′ are both zero vectors in V . Then, for the reasons given in
brackets, we have
0 = 0+ 0′ (axiom 4 applied to the zero vector 0′)
= 0′ + 0 (axiom 3)
= 0′ (axiom 4 applied to the zero vector 0)
Hence, 0 = 0′, and there is only one zero vector in V .
For property 2, Axiom 5 ensures the existence of the negative −u. Hence, we have
(−u) + (u+ v) = (−u) + (u+w) (axiom 5)
[(−u) + u] + v = [(−u) + u] +w (axiom 2)
0+ v = 0+w (axiom 5)
v = w (axiom 4)
For property 3, assume u andw are both negatives of v. By axiom 5, we have v+u = 0 = v+w.
By property 2, we can conclude that v = w. Hence the inverse is unique.
c©2020 School of Mathematics and Statistics, UNSW Sydney
10 CHAPTER 6. VECTOR SPACES
Example 1. i) The unique zero vector in Rn is 0 =

0...
0

.
ii) The negative of a vector is used in solving an equation such as

1
2
3
4

+ v =


−2
5
1
7

 to obtain v = −


1
2
3
4

+


−2
5
1
7

 =


−3
3
−2
3

 . ♦
A comparison of the axioms for multiplication of a vector by a scalar with the multiplication
properties for fields of numbers (see Chapter 3) also shows strong similarities — the main difference
being that in a field, two numbers of the same kind are being multiplied, whereas for vectors the
objects being multiplied are of different kinds. As a result, some of the fundamental properties of
multiplication of numbers also hold for multiplication of a vector by a scalar. In particular, we
have:
Proposition 2. Suppose that V is a vector space over a field F, λ ∈ F, v ∈ V , 0 is the zero scalar
in F and 0 is the zero vector in V . Then the following properties hold for multiplication by a scalar:
1. Multiplication by the zero scalar. 0v = 0,
2. Multiplication of the zero vector. λ0 = 0.
3. Multiplication by −1. (−1)v = −v (the additive inverse of v).
4. Zero products. If λv = 0, then either λ = 0 or v = 0.
5. Cancellation Property. If λv = µv and v 6= 0 then λ = µ.
Proof. We shall prove properties 1 and 3. The readers should write out the proofs for the others
as exercises. For property 1,
v+ 0 = v = 1v = (1 + 0)v = 1v + 0v = v + 0v.
You should check carefully which axioms are required. Finally, by Cancellation Property of
vector addition, we have 0v = 0.
For property 3,
v + (−1)v = 1v + (−1)v = (1 + (−1))v = 0v = 0.
Hence by Uniqueness of Negatives (−1)v = −v.
Example 2. The properties in the previous proposition are true for all vector spaces. In particular,
for vectors in Rn, the results can be easily proved by definitions of the operations and properties
of scalars. Such as
a) 0

x1...
xn

 =

0...
0

 = 0,
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.3. SUBSPACES 11
b) (−1)

x1...
xn

 =

−x1...
−xn

 = −

x1...
xn

,
c) If λ

x1...
xn

 =

0...
0

 , then either λ = 0 or

x1...
xn

 =

0...
0

 . ♦
6.3 Subspaces
Before reading this section, you should quickly read the brief review of sets given in Appendix 6.9.
Although all examples in this section are subsets of Rn, the definitions, theorems and corollaries
apply to all vector spaces as stated.
In practice, many problems about vectors involve subsets of some vector space. For example,
the points on a line in Rn form a subset of Rn, the points on a plane in R3 form a subset of R3, the
solutions of a system of m linear equations in n unknowns form a subset of Rn. It is an important
problem to determine the conditions under which some subset of a vector space is itself a vector
space. It is convenient to begin by looking at some examples.
Example 1. The real-number line is a vector space. The question arises whether some subset of
the real-number line is a vector space. For example, we might ask if some interval, for example the
interval
S = [−5, 5] = {x ∈ R : −5 6 x 6 5}
is a vector space. Geometrically, the set S represents the line segment shown in Figure 1(b).
Solution. The given system is not a vector space, since it is not closed under scalar multiplication.
A counterexample — 5 ∈ S, but 5 + 5 = 10 is not an element of S. A picture is given in Figure
1(b). ♦
−10 −5 0 5 10 −10 −5
[ ]
0 5 10
(a) The real number line is a
vector space.
(b) The interval S = [−5, 5] is not
a vector space as 5 ∈ S
but 5 + 5 = 10 /∈ S.
Figure 1.
Example 2. The plane R2 is a vector space. Show that the subset S of R2 given by
S =
{
x =
(
x1
x2
)
∈ R2 : x1 > 0
}
is not a vector space.
c©2020 School of Mathematics and Statistics, UNSW Sydney
12 CHAPTER 6. VECTOR SPACES
Solution. There are several ways to solve this
problem since there are several axioms which are
not satisfied. One method is to note that
(
1
0
)

S, whereas −1
(
1
0
)
=
(−1
0
)
6∈ S. Hence the
set S is not closed under scalar multiplication
and so S is not a vector space. ♦
Note. Geometrically, the subset S contains the
position vectors of all the points in the right half-
plane as shown.
0
x1
x2
1−1
Figure 2.
Example 3. Show that the subset of R2 given by
S1 =
{
x =
(
x1
x2
)
∈ R2 : x1 + x2 = 4
}
is not a vector space, whereas the subset given by
S2 =
{
x =
(
x1
x2
)
∈ R2 : x1 + x2 = 0
}
is a vector space. (Geometrically, S1 represents a line in R
2 which does not pass through the origin,
whereas S2 represents a line which does pass through the origin (see Figure 3).)
Solution. The vector v =
(
0
0
)
6∈ S1, since 0 + 0 6= 4 so S1 is not a vector space.
It is possible to show that S2 is a vector space by the usual time-consuming and tedious process
of checking that it satisfies all ten of the vector space axioms. ♦
4
4
x1
x2
0
Figure 3(a): The line x1 + x2 = 4
is not a vector space.
x1
x2
0
Figure 3(b): The line x1 + x2 = 0
is a vector space.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.3. SUBSPACES 13
Note. Points in an n-dimensional space are represented by vectors in Rn. When we say a set of
vectors S in Rn represents a line, or simply S is a line in Rn, we mean S is set of the position
vectors of all the points on the line in the n-dimensional space.
In Examples 1, 2 and 3 we have asked whether certain subsets of the vector spaces R and R2
are themselves vector spaces. Of course the operations of the systems are the usual addition and
the usual multiplication by a scalar. We have seen that it is usually fairly simple to show that
a subset is not a vector space, but it will be time-consuming and tedious to show that a given
subset is a vector space by checking all ten axioms. We shall now develop a simple general test for
determining whether a given subset of a vector space is itself a vector space.
We first make the following definitions.
Definition 1. A subset S of a vector space V is called a subspace of V if S is
itself a vector space over the same field of scalars as V and under the same rules for
addition and multiplication by scalars.
In addition if there is at least one vector in V which is not contained in S, the
subspace S is called a proper subspace of V .
A simple test for a subspace is given by the following theorem.
Theorem 1 (Subspace Theorem). A subset S of a vector space V over a field F, under the same
rules for addition and multiplication by scalars, is a subspace of V if and only if
i) The vector 0 in V also belongs to S.
ii) S is closed under vector addition, and
iii) S is closed under multiplication by scalars from F.
Proof. We first note that if S is a subspace then it is a vector space.
Conversely, suppose S contains the zero vector and the two closure axioms 1 and 6 are satisfied
by the elements of S. Every element of S is an element of V because S is a subset of V . Furthermore,
S and V are under the same operations, the vector space axioms 2, 3, 7, 8, 9, 10 are automatically
satisfied by all elements of S.
Since S contains the zero vector, if v ∈ S then, 0+ v = 0 (since this is true in V and hence in
S, so axioms 4 follows.
Finally, if v ∈ S then v ∈ V . Hence, from part 3 of Proposition 2 of 6.2, we have −v = (−1)v.
But, as S is closed under multiplication by a scalar, we have (−1)v ∈ S, and hence −v ∈ S. Thus,
axiom 5 is satisfied for all vectors in S. The proof is complete.
If we want to check if S is a subspace of V , we should first check if the zero vector of V is in S.
If the zero vector is in S we can proceed to verify the two closure axioms. Otherwise, we can draw
a conclusion that S is not a subspace of V .
c©2020 School of Mathematics and Statistics, UNSW Sydney
14 CHAPTER 6. VECTOR SPACES
Example 4. Prove that the set
S =



x1x2
x3

 : 2x1 − x2 + 4x3 = 0


is a vector subspace of R3.
Solution. As 2(0)− 0 + 4(0) = 0, the zero vector of R3 is in S. We now proceed to show the two
closure axioms.
For any vectors u =

u1u2
u3

 , v =

v1v2
v3

 ∈ S and λ ∈ R, we have
2u1 − u2 + 4u3 = 0 (1)
2v1 − v2 + 4v3 = 0 (2)
If we add (1) and (2), we have
(2u1 − u2 + 4u3) + (2v1 − v2 + 4v3) = 0 + 0.
Hence, we obtain
2(u1 + v1)− (u2 + v2) + 4(u3 + v3) = 0,
and so u+ v =

u1 + v1u2 + v2
u3 + v3

 ∈ S. Thus S is closed under addition.
Now we multiply both sides of (2) by λ, we have
λ(2v1 − v2 + 4v3) = λ0 i.e. 2(λv1)− λv2 + 4(λv3) = 0.
Hence λv =

λv1λv2
λv3

 ∈ S and so S is closed under multiplication by a scalar.
By the Subspace Theorem, the set S is a vector subspace of R3. ♦
Example 5. Prove that a line in Rn is a subspace of Rn if and only if it passes through the origin.
Solution. Suppose that S represents a line in Rn. If 0 /∈ S, then S is not a subspace. Hence, a
line which does not pass through the origin is not a vector subspace.
If S is a line through the origin, we can write
S = {x ∈ Rn : x = tv, t ∈ R},
where v is a fixed non-zero vector in Rn.
To check if S is a subspace we check the two closure axioms.
Closure under addition. If x1,x2 ∈ S then
x1 = t1v and x2 = t2v for some t1, t2 ∈ R.
Hence,
x1 + x2 = (t1 + t2)v = t
′v,
where t′ = t1 + t2 ∈ R. Thus, x1 + x2 ∈ S, and hence S is closed under addition.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.3. SUBSPACES 15
Closure under multiplication by a scalar. We have
x = tv for some t ∈ R,
and hence, if λ ∈ R,
λx = λ(tv) = (λt)v = t′′v,
where t′′ = λt ∈ R. Hence λx ∈ S, and thus S is closed under multiplication by a scalar.
Therefore, by the Subspace Theorem, the line S is a subspace of Rn if it passes through the
origin. The proof is complete. ♦
A similar result to that given for lines in Example 5 also holds for planes.
Example 6. Prove that a plane in Rn is a subspace of Rn if and only if it passes through the
origin.
Solution. If the plane does not pass through the origin, then it does not contain the zero vector,
and hence is not a subspace.
If the plane passes through the origin, then it is represented by
S = {x ∈ Rn : x = s1v1 + s2v2, s1, s2 ∈ R},
where v1 and v2 are fixed, non-parallel vectors in R
n.
We now check closure under addition and under multiplication by a scalar.
Closure under addition. If x1,x2 ∈ S, then
x1 = s1v1 + s2v2 and x2 = t1v1 + t2v2 for some s1, s2, t1, t2 ∈ R.
Therefore
x1 + x2 = (s1 + t1)v1 + (s2 + t2)v2 = sv1 + tv2,
where s = s1 + t1 and t = s2 + t2 are both real numbers. Thus x1 + x2 ∈ S, and hence S is closed
under addition.
The proof that S is closed under multiplication by a scalar is similar and is left as an exer-
cise. ♦
In practice, some of the most important subspaces of Rn are connected with systems of linear
equations, that is, with the matrix equation Ax = b. Here is an important example of this. follows.
Example 7. Let A be an m × n matrix with real entries. Show that the subset S of Rn which
consists of all solutions of the matrix equation Ax = b for given b ∈ Rm is a subspace of Rn if and
only if b = 0.
Formally, the set of all solutions of Ax = b is given by
S = {x ∈ Rn : Ax = b}.
Solution. We first consider the case b 6= 0. Then 0 ∈ Rn is not a solution of Ax = b as
A0 = 0 6= b, and hence S does not contain the zero vector. Thus S is not a subspace.
We next examine the case b = 0. Then S is the set of solutions of Ax = 0. We use the Subspace
Theorem to show that S is a subspace.
c©2020 School of Mathematics and Statistics, UNSW Sydney
16 CHAPTER 6. VECTOR SPACES
Closure under addition. If x ∈ S and y ∈ S, then Ax = 0 and Ay = 0, and hence
A(x+ y) = Ax+Ay = 0+ 0 = 0.
Thus x+ y ∈ S and S is closed under addition.
Closure under multiplication by a scalar. If x ∈ S, we have Ax = 0, and hence for all λ ∈ R,
A(λx) = λ(Ax) = λ0 = 0.
Thus λx ∈ S and S is closed under scalar multiplication. The result is proved. ♦
Note. Example 7 shows that the set of solutions of the matrix equation Ax = 0 is a subspace
of Rn. This subspace, which is of considerable practical importance, is called the kernel of the
matrix A (see Section 7.4.1).
An important theoretical and practical problem concerning vector spaces is that of finding all
their subspaces. For example, it can be shown that the only subspaces of R2 are (1) the origin, (2)
lines through the origin, and (3) R2 itself. Similarly, for R3 the only subspaces (see Example 10 of
Section 6.6) are (1) the origin, (2) lines through the origin, (3) planes through the origin, and (4) R3
itself. A listing of subspaces can be given for any vector space. However, before we can investigate
this problem satisfactorily, we require further machinery. This machinery will be developed in
Sections 6.4 and 6.5. In vector spaces other than Rn it may be difficult to get a good geometric
feel for which subsets are subspaces. Nonetheless, the Subspace Theorem allows one a simple way
to check whether a certain set is a subspace or not.
Example 8. Let V = P(R), the set of all real polynomials. Let P2(R) denote the set of all real
polynomials of degree less than or equal to 2. That is
P2(R) = {p ∈ P(R) : p(x) = a0 + a1x+ a2x2 for some a0, a1, a2 ∈ R}.
Show that P2(R) is a subspace of P(R).
Solution. Clearly P2(R) contains the zero polynomial. Suppose then that p, q ∈ P2(R). Then
there exist coefficients a0, a1, a2, b0, b1, b2 ∈ R such that
p(x) = a0 + a1x+ a2x
2, q(x) = b0 + b1x+ b2x
2.
Now (p+ q)(x) = (a0+ b0)+ (a1+ b1)x+(a2+ b2)x
2 which is another polynomial of degree less
than or equal to 2, i.e. p+ q ∈ P2(R).
Suppose that p ∈ P2(R) as above and that λ ∈ R. Then (λp)(x) = (λa0) + (λa1)x + (λa2)x2,
and so λp ∈ P2.
Thus, by the Subspace Theorem, P2(R) is a subspace of P(R). ♦
Suppose that P(F) is the set of all polynomials over F. We shall show in Section 6.8 that for
any n, the set Pn(F) consisting of all polynomials over F of degree less than or equal to n is also a
subspace of P(F).
Example 9. Let S denote the set of all real polynomials of degree exactly 3. Show that S is not
a subspace of P(R).
Solution. This set is not closed under either addition or scalar multiplication! For example, the
polynomial p given by p(x) = x3 is in S, but 0p = 0 which does not lie in S. Also, for example,
(x3 + x2 + x) + (−x3 + x+ 3) 6∈ S. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.4. LINEAR COMBINATIONS AND SPANS 17
6.4 Linear combinations and spans
The two fundamental vector space operations are addition and multiplication by a scalar. By
combining these two operations we arrive at the important idea of a sum of scalar multiples of
vectors. This leads to the ideas of “linear combination” and “span”: a linear combination of a
given set of vectors is a sum of scalar multiples of the vectors and the span of a given set of vectors
is the set of all linear combinations of the vectors. We defined linear combinations and span of two
vectors in Chapter 1. These ideas are used to develop the parametric vector forms for planes. In
particular, the span of two non-parallel vectors is a plane through the origin. In this section, we
generalise the ideas to a finite set of vectors.
The formal definition of ‘linear combination’ is as follows.
Definition 1. Let S = {v1, . . . ,vn} be a finite set of vectors in a vector space V
over a field F. Then a linear combination of S is a sum of scalar multiples of the
form
λ1v1 + · · ·+ λnvn with λ1, . . . , λn ∈ F.
Example 1. The vector
(
3
−4
)
is a linear combination of the vectors in the set{(
1
1
)
,
(
2
3
)
,
(
1
−1
)}
in R2 because
(
3
−4
)
= 2
(
1
1
)
+ (−1)
(
2
3
)
+ 3
(
1
−1
)
. ♦
We know that a vector space (and therefore any subspace of a vector space) is closed under
addition and multiplication by scalars, so we would expect that it would also be closed under the
operation of forming linear combinations. This is confirmed by the following theorem. The proof
of the theorem (which uses induction) is left as an exercise (Problem 36).
Proposition 1 (Closure under Linear Combinations). If S is a finite set of vectors in a vector
space V , then every linear combination of S is also a vector in V .
The formal definition of ‘span’ is as follows.
Definition 2. Let S = {v1, . . . ,vn} be a finite set of vectors in a vector space V
over a field F. Then the span of the set S is the set of all linear combinations of S,
that is,
span (S) = span (v1, . . . ,vn)
= {v ∈ V : v = λ1v1 + · · · + λnvn for some λ1, . . . , λn ∈ F}.
Example 2. The span of a single non-zero vector v in Rn is a line through the origin. In Chapter 1
we defined “the line in Rn spanned by v” to mean the set
S = {x ∈ Rn : x = λv, for some λ ∈ R}.
This set is just span (v) . ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
18 CHAPTER 6. VECTOR SPACES
Example 3. If {v,w} is a pair of non-zero, non-parallel vectors in Rn then span (v,w) is a plane
containing the origin. ♦
The following important theorem tells us that the span of a finite non-empty set of vectors in
a vector space V is not only a subset of V , it is always a subspace of V .
Theorem 2 (A span is a subspace). If S is a finite, non-empty set of vectors in a vector space V ,
then span(S) is a subspace of V . Further, span(S) is the smallest subspace containing S (in the
sense that span(S) is a subspace of every subspace which contains S).
Proof. We first note that 0 ∈ S, since we may take each scalar to be zero. Proposition 1 tells us
that every linear combination of S is a vector in V , so span(S) is a subset of V .
To prove that span(S) is a subspace we will use the Subspace Theorem, so we set out to prove
that span(S) is closed under addition and under multiplication by scalars. Let S be the set
S = {v1, . . . ,vn}
where all vj belong to V .
To show closure under addition, suppose u,w ∈ span (S). Then
u = λ1v1 + · · ·+ λnvn for some λ1, . . . , λn ∈ F and
w = µ1v1 + · · ·+ µnvn for some µ1, . . . , µn ∈ F,
so u+w = (λ1 + µ1)v1 + · · ·+ (λn + µn)vn with λ1 + µ1, . . . , λn + µn ∈ F.
This shows that u+w belongs to span (S), so span (S) is closed under addition. To prove closure
under multiplication by a scalar, suppose u ∈ span (S) and λ ∈ F. Then
λu = λ(λ1v1 + · · · + λnvn)
= (λλ1)v1 + · · ·+ (λλn)vn,
where λλ1, . . . , λλn ∈ F. This shows that λu belongs to span (S), so span (S) is closed under
multiplication by scalars.
We have now proved that span (S) is a subspace of V . To show that it is the smallest subspace
of V containing S, suppose W is any subspace of V containing S. Then W is itself a vector space
containing S and, by what we have just proved, span (S) is a subspace of W . This completes the
proof by showing that span (S) is a subspace of every subspace of V containing S.
Example 4. If v is a non-zero vector in R3, then the line span (v) is a subspace of every vector
space which contains v. In particular, it is a subspace of R3 and of every plane through the origin
parallel to v. Further, there is no subset of the line span (v) which both contains v and is a vector
space. Thus, for example, a line segment cannot be a vector space. We have already seen a special
case of this result in Example 1 of Section 6.3 and Figure 1. ♦
We often need to find a set S in a vector space V such that span (S) is the whole of V .
Definition 3. A finite set S of vectors in a vector space V is called a spanning
set for V if span (S) = V or equivalently, if every vector in V can be expressed as
a linear combination of vectors in S.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.4. LINEAR COMBINATIONS AND SPANS 19
Note also that we often say that “S spans V ” instead of “S is a spanning set for V ”.
Example 5. As shown in Chapter 1, every geometric vector a in three dimensions can be written
as a linear combination
a = a1i+ a2j+ a3k for a1, a2, a3 ∈ R,
where i, j and k are the unit vectors along the directions of the three coordinate axes. Therefore
{i, j,k} is a spanning set for the vector space of all geometric vectors in three dimensions. ♦
Example 6. The set S =



11
3

 ,

−12
−2



 is a spanning set of the vector space

x ∈ R3 : x = λ

11
3

+ µ

−12
−2

 , λ, µ ∈ R

 .
The set S′ =



11
3

 ,

−12
−2

 ,

03
1



 also spans the above vector space. Obviously, the third
vector in S′ is the sum of the other two, so span (S) = span (S′). Thus the third vector in S′ is
somewhat redundant. ♦
Example 7. Let v be a fixed non-zero vector in Rn. The spanning set of the vector space
{x ∈ Rn : x = λv, λ ∈ R} is {v}. ♦
Example 8. Every vector

x1...
xn

 ∈ Rn can be written as x = x1e1 + · · ·+ xnen. This expresses x
as a linear combination of the set {e1, . . . , en}, where
e1 =


1
0
...
0

 , e2 =


0
1
...
0

 , . . . , en =


0
...
0
1

 .
Thus Rn = span (e1, . . . , en) and the set {e1, . . . , en} spans Rn. ♦
Example 9. Let Pn denote the space of polynomials of degree less than or equal to n. Every
polynomial p ∈ Pn can be written as a linear combination of the polynomials {1, x, x2, . . . , xn}, so
Pn = span
(
1, x, x2, . . . , xn
)
. We shall see later that there is no finite set of vectors whose span is
all of P (the vector space of all polynomials). ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
20 CHAPTER 6. VECTOR SPACES
6.4.1 Matrices and spans in Rm
We want to have an effective way to tell whether or not a given vector in Rm belongs to the span
of a set S = {v1, . . . ,vn}. From the definition of span, we know that b belongs to span (S) if and
only if there are λ1, . . . , λn ∈ R such that
b = λ1v1 + · · · + λnvn.
This equivalent to the condition that there is at least one solution to the vector equation
x1v1 + · · ·+ xnvn = b,
where x1, . . . , xn are the unknowns. This vector equation represents a set of simultaneous linear
equations in n unknowns. Therefore the question of whether b belongs to span (S) is a question
of whether or not a particular set of linear equations has a solution. This is the sort of question
which we studied in detail in MATH1131/41.
Furthermore, suppose that v1 =

a11...
am1

 , v2 =

a12...
am2

 , . . . , vn =

a1n...
amn

 , and x =

x1...
xn

.
If that A is the m× n matrix whose columns are the vectors v1, . . . ,vn then
Ax =

a11 · · · a1n... . . . ...
am1 · · · amn



x1...
xn

 =

 a11x1 + · · · + a1nxn...
am1x1 + · · · + amnxn


= x1

a11...
am1

+ · · ·+ xn

a1n...
amn

 = x1v1 + · · · + xnvn.
As a result, we have the following proposition.
Proposition 3 (Matrices, Linear Combinations and Spans). If S = {v1, . . . ,vn} is a set of
vectors in Rm and A is the m× n matrix whose columns are the vectors v1, . . . ,vn then
a) a vector b in Rm can be expressed as a linear combination of S if and only if it can be
expressed in the form Ax for some x in Rn,
b) a vector b in Rm belongs to span (S) if and only if the equation Ax = b has a solution x in
Rn.
Example 10. For the set of three vectors
v1 =


0
5
3
6

 , v2 =


1
3
4
5

 , v3 =


−2
−3
−5
−6

 ∈ R4, we let A =


0 1 −2
5 3 −3
3 4 −5
6 5 −6

 and x =

x1x2
x3

 .
By expanding each side, it can easily be checked that
Ax =


x2 − 2x3
5x1 + 3x2 − 3x3
3x1 + 4x2 − 5x3
6x1 + 5x2 − 6x3

 = x1


0
5
3
6

+ x2


1
3
4
5

+ x3


−2
−3
−5
−6

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.4. LINEAR COMBINATIONS AND SPANS 21
In particular,
A

11
3

 = 1


0
5
3
6

+ 1


1
3
4
5

+ 3


−2
−3
−5
−6

 =


−5
−1
−8
−7

 .
Let us denote the vector


−5
−1
−8
−7

 by b. Hence x =

11
3

 is a solution of Ax = b if and only if b
can be written as the linear combination v1 + v2 + 3v3. ♦
Example 11. If A is an m× n matrix and ej is the jth standard basis vector in Rn then
Aej = aj,
where aj is the jth column of A. In the case of the matrix A of the last example, we find by direct
matrix multiplication that
Ae1 =


0 1 −2
5 3 −3
3 4 −5
6 5 −6



10
0

 =


0
5
3
6

 = a1; A

01
0

 =


1
3
4
5

 = a2; A

00
1

 =


2
−3
−5
−6

 = a3.

When applying the results of Proposition 3, it is convenient to have a special name for the
subspace of Rm spanned by the columns of a given m× n matrix.
Definition 4. The subspace of Rm spanned by the columns of an m×n matrix A
is called the column space of A and is denoted by col(A).
6.4.2 Solving problems about spans
By Proposition 3, a vector b in Rm lies in the span of a set S = {a1, . . . ,an} in Rm if and only if
the equation Ax = b has a solution, where A is the matrix with columns a1, . . . ,an. The following
examples show how to apply this knowledge to problems about spans in Rm.
Example 12. Is the vector b =


1
4
1
2

 in the span of the set S =




1
3
4
2

 ,


−4
−8
−12
6



?
In geometric terms, the question is asking whether the point (1, 4, 1, 2) lies on the plane through
the origin parallel to


1
3
4
2

 and


−4
−8
−12
6

.
c©2020 School of Mathematics and Statistics, UNSW Sydney
22 CHAPTER 6. VECTOR SPACES
Solution. Let A be the matrix whose columns are the members of S, so
A =


1 −4
3 −8
4 −12
2 6

 .
As a consequence of Proposition 3, we know that b belongs to span (S) if and only if the equation
Ax = b has a solution. We form the augmented matrix (A|b) for this system and reduce it to
row-echelon form. 

1 −4 1
3 −8 4
4 −12 1
2 6 2


R2 = R2 − 3R1−−−−−−−−−−−−−−−→
R3 = R3 − 4R1
R4 = R4 − 2R1


1 −4 1
0 4 1
0 4 −3
0 14 0


R3 = R3 −R2−−−−−−−−−−−−−−→
R4 = R4 − 72R2


1 −4 1
0 4 1
0 0 −4
0 0 −72

 R4 = R4 − 78R3−−−−−−−−−−−−−−−→


1 −4 1
0 4 1
0 0 −4
0 0 0

 .
Since the right hand column is a leading column, the system has no solution. Therefore b does not
belong to the span of S. ♦
Example 13. Find conditions which are necessary and sufficient to ensure that a vector b in R3
belongs to the span of the set S = {v1,v2,v3} where v1 =

12
3

 , v2 =

 11
−1

 , v3 =

−10
5

 .
Hence, determine if the vector v =

 21
−1

 ∈ span (S). Then give a geometric interpretation of the
span.
Solution. By Proposition 3, the vector b belongs to span (S) if and only if there is a solution
to the system of equations Ax = b, where the three columns of A are the vectors v1, v2, v3. We
reduce the augmented matrix (A|b) to row-echelon form.
 1 1 −1 b12 1 0 b2
3 −1 5 b3

 R2 = R2 − 2R1−−−−−−−−−−−−−−−→
R3 = R3 − 3R1

 1 1 −1 b10 −1 2 b2 − 2b1
0 −4 8 b3 − 3b1


R3 = R3 − 4R2−−−−−−−−−−−−−−−→

 1 1 −1 b10 −1 2 b2 − 2b1
0 0 0 5b1 − 4b2 + b3

 .
The system represented by this augmented matrix has a solution if and only if
5b1 − 4b2 + b3 = 0.
Therefore b belongs to span (S) if and only if this condition is satisfied.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.4. LINEAR COMBINATIONS AND SPANS 23
To check that v is in the span or not, we substitute the components of v in the condition. Since
5(2) − 4(1) + 1 = 7 6= 0, so v is not in span (S).
To get a geometric interpretation of this result, note that a vector b is in the span if and only
if its components satisfy the Cartesian equation
5x1 − 4x2 + x3 = 0
which is a plane through the origin with normal

 5−4
1

 . Therefore span (S) is this plane. ♦
Note. As a check, note that each of the vectors v1, v2, v3 belongs to span (v1,v2,v3) and should
satisfy the above condition. On substituting v1 =

12
3

 for

b1b2
b3

 we find 5(1) − 4(2) + 3 = 0,
so v1 does satisfy the above condition. You can check for yourself that v2 and v3 also satisfy this
condition.
Example 14. Determine whether or not the set S = {v1,v2,v3,v4}, is a spanning set for R3,
where v1 =

12
3

, v2 =

 11
−1

, v3 =

−10
5

 and v4 =

23
5

.
Solution. S is a spanning set for R3 if and only if every vector b ∈ R3 belongs to span (S). By
Proposition 3, every vector b ∈ R3 belongs to span (S) if and only if the system Ax = b has a
solution for every b in R3, where A is the matrix whose columns are the members of S.
By row operations we can reduce the augmented matrix of the system to row-echelon form
 1 1 −1 2 b12 1 0 3 b2
3 −1 5 5 b3

 R2 = R2 − 2R1−−−−−−−−−−−−−−−→
R3 = R3 − 3R1

 1 1 −1 2 b10 −1 2 −1 b2 − 2b1
0 −4 8 −1 b3 − 3b1


R3 = R3 − 4R2−−−−−−−−−−−−−−−→

 1 1 −1 2 b10 −1 2 −1 b2 − 2b1
0 0 0 3 b3 − 4b2 + 5b1

 .
For every b ∈ R3, the right-hand column is non-leading which means that this system has a
solution. This implies that every vector b ∈ R3 belongs to span (S). Hence S is a spanning set for
R3. ♦
Note. The equations would still have a solution for all b ∈ R3 if the non-leading column (column 3)
were dropped from the row-echelon form matrix. This means that the vector v3 can be dropped
from S and the set {v1, v2, v4} will still span R3. Thus, in this case,
span (v1,v2,v3,v4) = span (v1,v2,v4) = R
3.
We shall see that in Section 6.5 Example 6 that in place of v3 we could drop either v1 or v2 from
S and obtain the same span. That is
span (v1,v2,v3,v4) = span (v2,v3,v4) = span (v1,v3,v4) = R
3.
c©2020 School of Mathematics and Statistics, UNSW Sydney
24 CHAPTER 6. VECTOR SPACES
However, the removal of the vector corresponding to a non-leading column in a row-echelon form
matrix gives us a simple criterion to get a subset of S which spans the same subspace span (S). In
general, we have the following result.
Suppose that S = {v1,v2, . . . ,vn} is a subset of Rm and A, which is the matrix with the n
vectors in S as columns, reduces to a row-echelon from matrix U . If the ith column of U is
non-leading, then {v1, . . . ,vi−1,vi+1, . . . ,vn} spans the same set as S.
The following example shows that matrix methods can also be used to solve problems about
spans in some vector spaces other than Rn.
Example 15. Find conditions on the coefficients of p ∈ P3(R) so that p ∈ span
(
1 + x, 1− x2).
Solution. Let p(x) = b0 + b1x + b2x
2 + b3x
3 be a polynomial in P3(R). From the definition of
span, we know that p ∈ span (1 + x, 1− x2) if and only if there exist λ1, λ2 ∈ R such that, for all
x ∈ R,
p(x) = λ1(1 + x) + λ2(1− x2) = (λ1 + λ2) + λ1x− λ2x2.
By comparing coefficients, we must have
λ1 + λ2 = b0
λ1 = b1
−λ2 = b2
0 = b3.
This is a system of linear equations in the variables λ1 and λ2 and we have to find out which choices
of b0, b1, b2, b3 make it into a system which does have a solution. The augmented matrix for the
system is 

1 1 b0
1 0 b1
0 −1 b2
0 0 b3

 .
This augmented matrix can be reduced to row-echelon form

1 1
0 1
0 0
0 0
∣∣∣∣∣∣∣∣
b0
b1 − b0
b2 − b1 + b0
b3

 .
This system has a solution if and only if b2 − b1 + b0 = 0 and b3 = 0, so these are the conditions
under which p belongs to span
(
1 + x, 1− x2). ♦
6.5 Linear independence
Suppose that v1, v2 are non-zero vectors. In Chapter 1 we saw that span (v1, v2) represents a
plane if v1 and v2 are not parallel to each other, but only a line if they are parallel. Similarly, if
v1, v2 and v3 are given non-zero vectors in R
3 then span (v1,v2,v3) represents
i) a line if the three vectors are all parallel to each other,
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.5. LINEAR INDEPENDENCE 25
ii) a plane if they are coplanar, or
iii) the whole of R3 otherwise.
In this section we shall show how these results can be understood through the ideas of linear
independence and linear dependence of a set of vectors.
Definition 1. Suppose that S = {v1, . . . , vn} is a subset of a vector space. The
set S is a linearly independent set if the only values of the scalars λ1, λ2, . . . , λn
for which
λ1v1 + · · ·+ λnvn = 0 are λ1 = λ2 = · · · = λn = 0.
Definition 2. Suppose that S = {v1, . . . , vn} is a subset of a vector space. The
set S = {v1, . . . ,vn} is a linearly dependent set if it is not a linearly independent
set, that is, if there exist scalars λ1, . . . , λn, not all zero, such that
λ1v1 + · · ·+ λnvn = 0.
Note. The linear combination
λ1v1 + · · · + λnvn
is equal to 0 when all the scalars are zero. The essential point of the definition of linear independence
is that the only way this linear combination is 0 is that all the scalars are zero.
Example 1. Show that the vectors


1
2
3
4

 and


−3
−6
−9
5

 form a linearly independent set.
Solution. Applying the definition, we look for scalars λ1, λ2 such that
λ1


1
2
3
4

+ λ2


−3
−6
−9
5

 =


0
0
0
0

 .
In order to satisfy this vector equation the scalars must satisfy the four equations
λ1 − 3λ2 = 0, 2λ1 − 6λ2 = 0, 3λ1 − 9λ2 = 0, 4λ1 + 5λ2 = 0.
Each of the first three equations is satisfied if and only if λ1 = 3λ2. By substituting this formula
for λ1 in the fourth equation we get 17λ2 = 0. Thus the only solution is λ1 = λ2 = 0 and this
shows that the two given vectors form a linearly independent set. ♦
The vectors in the above example are not parallel because neither is a scalar multiple of the
other. This is a special case of the following geometric interpretation of linear dependence for pairs
of vectors.
c©2020 School of Mathematics and Statistics, UNSW Sydney
26 CHAPTER 6. VECTOR SPACES
Example 2. Show that two non-zero vectors in Rn are parallel if and only if they form a linearly
dependent set.
Solution. We first show that two parallel vectors form a linearly dependent set. Two non-zero
vectors {v1, v2} are parallel if one is a (non-zero) scalar multiple of the other, and that is v2 = λv1
for some λ ∈ R. We can rewrite this equation as λv1 − v2 = 0. The coefficient of v2 in this
expression is −1 (which is non-zero), so this equation proves that {v1, v2} is a linearly dependent
set.
We next show that two linearly dependent non-zero vectors are parallel. If {v1, v2} is a linearly
dependent set then there exist λ1 and λ2, not both zero, such that
λ1v1 + λ2v2 = 0.
Without loss of generality, we can assume that λ1 6= 0. Dividing by λ1 and rearranging gives
v1 = −λ2
λ1
v2.
This shows that v1 is a scalar multiple of v2. It also implies that λ2 6= 0 (otherwise v1 would be
0). Hence the two vectors are parallel. ♦
Example 3. It is easy to verify that
3

12
1

+ 2

 1−1
2

+ (−1)

54
7

 =

00
0

 .
By Definition 2, the set



12
1

 ,

 1−1
2

 ,

54
7



 is a linearly dependent set. ♦
Note that no two of the three vectors in the previous example are parallel. The following
example gives a geometric interpretation (in terms of coplanarity) for linear dependence of sets of
three vectors.
Example 4. Show that three non-zero vectors in Rn are coplanar if and only if they form a linearly
dependent set.
Solution. Suppose that v1, v2, v3 are three non-zero vectors in R
n.
We first show that three coplanar vectors form a linearly dependent set. Suppose that v1, v2,
v3 are coplanar. We first consider the case that two of the three vectors are parallel. Without loss
of generality, we can assume that v2 and v3 are parallel. By Example 2, there exist λ2, λ3, not
both of them are zero such that
λ2v2 + λ3v3 = 0.
Hence, we have
0v1 + λ2v2 + λ3v3 = 0.
Since, not both λ2 and λ3 are zero, the vectors v1, v2, v3 form a linearly dependent set.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.5. LINEAR INDEPENDENCE 27
Otherwise, we may assume then that v2 is not parallel to v3. Hence, v1 lies on the plane
through the origin parallel to v2 and v3. This means that there are scalars λ2, λ3 such that
v1 = λ2v2 + λ3v3.
We can rearrange this to get
−v1 + λ2v2 + λ3v3 = 0.
At least one coefficient is non-zero (the coefficient of v1 is −1), so we have shown that the set is
linearly dependent.
We now show that if three vectors form a linearly dependent set then they must be coplanar.
If the set {v1,v2,v3} is linearly dependent then there exist λ1, λ2, λ3, not all zero, such that
λ1v1 + λ2v2 + λ3v3 = 0.
Without loss of generality we can assume that λ1 6= 0. Then the above equation can be rearranged
as
v1 = −λ2
λ1
v2 − λ2
λ1
v3.
This shows that v1 satisfies the parametric vector form
x = λv2 + µv3,
which represents either a line or a plane through the origin. In both cases, the three vectors are
coplanar. ♦
6.5.1 Solving problems about linear independence
We have seen that questions about spans in Rm can be answered by relating them to questions
about the existence of solutions for systems of linear equations. The same is true for questions
about linear dependence in Rm.
Proposition 1. If S = {a1, . . . ,an} is a set of vectors in Rm and A is the m × n matrix whose
columns are the vectors a1, . . . ,an then the set S is linearly dependent if and only if the system
Ax = 0 has at least one non-zero solution x ∈ Rn.
Proof. As on page 20,
Ax = x1a1 + · · · + xnan for any vector x =

x1...
xn

 ∈ Rn.
Therefore Ax = 0 has at least one non-zero solution x =

λ1...
λn

 if and only if there are scalars
λ1, . . . , λn, not all zero, such that
λ1a1 + · · ·+ λnan = 0.
In other words, if and only if the set {a1, . . . , an} is linearly dependent.
c©2020 School of Mathematics and Statistics, UNSW Sydney
28 CHAPTER 6. VECTOR SPACES
Example 5. Is the set S =




1
3
2
4

 ,


−2
−1
0
2

 ,


0
0
1
2



 a linearly independent set?
Solution. Let A be the matrix whose columns are the vectors in S. By Proposition 1, the set S
is linearly dependent if and only if the system Ax = 0 has at least one non-zero solution. We then
reduce the augmented matrix (A|0) to row-echelon form (U |0).


1 −2 0 0
3 −1 0 0
2 0 1 0
4 2 2 0


R2 = R2 − 3R1−−−−−−−−−−−−−−−→
R3 = R3 − 2R1
R4 = R4 − 4R1


1 −2 0 0
0 5 0 0
0 4 1 0
0 10 2 0


R2 =
1
5R2−−−−−−−−−−−−−−→
R4 =
1
2R4


1 −2 0 0
0 1 0 0
0 4 1 0
0 5 1 0


R3 = R3 − 4R2−−−−−−−−−−−−−−−→
R4 = R4 − 5R2


1 −2 0 0
0 1 0 0
0 0 1 0
0 0 1 0


R4 = R4 −R3−−−−−−−−−−−−−−→


1 −2 0 0
0 1 0 0
0 0 1 0
0 0 0 0


There are no non-leading columns in U , so the system has a unique solution, namely x = 0. Since
there are no non-zero solutions, the set S is linearly independent. ♦
Example 6. Suppose that v1 =

12
3

, v2 =

 11
−1

, v3 =

−10
5

 and v4 =

23
5

.
a) Prove that the set S = {v1,v2,v3,v4} is a linearly dependent set.
b) Find all possible ways of writing 0 as a linear combination of the vectors in S.
c) Find a linearly independent subset of S with the same span as S.
Solution.
a) Let A be the matrix with v1, v2, v3, v4 as columns. As seen in the previous example,
elementary row operations do not affect the zero right-hand column. To see whether or not
there is any non-zero solution for the equation Ax = 0, we can simply reduce the matrix A
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.5. LINEAR INDEPENDENCE 29
and an equivalent row-echelon form U .
1 1 −1 22 1 0 3
3 −1 5 5

 R2 = R2 − 2R1−−−−−−−−−−−−−−−→
R3 = R3 − 3R1

1 1 −1 20 −1 2 −1
0 −4 8 1


R3 = R3 − 4R2−−−−−−−−−−−−−−−→

1 1 −1 20 −1 2 −1
0 0 0 3


The row-echelon form matrix U has a non-leading column, the homogeneous system Ax = 0
has infinitely many solutions. There must be solutions other than the zero solution. Therefore
the given set S is linearly dependent.
b) The complete solution of Ax = 0 can be found by back substitution, and it is x4 = 0, x3 = λ,
x2 = 2λ, and x1 = −λ. By substituting for x1, x2, x3, x4 in the original linear combination
we get
λ(−v1 + 2v2 + v3) + 0v4 = 0,
which gives all possible ways of writing 0 as a linear combination of the vectors in S.
c) Choosing λ = 1, we have
−v1 + 2v2 + v3 = 0, i.e. v3 = −v1 + 2v2.
This shows that span (v1, v2, v4) = span (S). On the other hand, if we remove the third
column from A, we can reduce the matrix
1 1 22 1 3
3 −1 5

 to

1 1 20 −1 −1
0 0 3

 .
Hence the set {v1, v2, v4} is a linearly independent subset of S with the same span as S.
Note also that v1 = 2v2 + v3 and v2 =
1
2v1 − 12v3, so both sets {v2,v3,v4} and {v1,v3,v4} have
the same span as S. It is not difficult to check that these two sets are also linearly independent. ♦
The next two examples illustrate the fact that matrix methods can be used in vector spaces
other than Rm.
Example 7. Show that the set {1 + x, 2 − x} is linearly independent in the vector space P(R) of
all polynomials.
Solution. Suppose that λ1(x + 1) + λ2(2 − x) = 0 for all x ∈ R. Expanding the left-hand-side
gives
(λ1 + 2λ2) + (λ1 − λ2)x = 0.
Comparing coefficients shows that we must have
λ1 + 2λ2 = 0
λ1 − λ2 = 0.
c©2020 School of Mathematics and Statistics, UNSW Sydney
30 CHAPTER 6. VECTOR SPACES
The augmented matrix for this system is
(
1 2 0
1 −1 0
)
,
which can be reduced to the row-echelon form(
1 2 0
0 −3 0
)
.
There are no non-leading variables here, so the only solution is λ1 = λ2 = 0. Therefore this set of
polynomials is linearly independent. ♦
Example 8. Is the set {1 + x, 2 − x,−1 + 2x} a linearly independent subset of the vector space
P(R)?
Solution. Suppose that λ1(x + 1) + λ2(2 − x) + λ3(−1 + 2x) = 0 for all x ∈ R. Expanding the
left-hand-side gives
(λ1 + 2λ2 − λ3) + (λ1 − λ2 + 2λ3)x = 0.
Comparing coefficients shows that we must have
λ1 + 2λ2 − λ3 = 0
λ1 − λ2 + 2λ3 = 0.
The augmented matrix for this system is
(
1 2 −1 0
1 −1 2 0
)
,
which can be reduced to the row-echelon form
(U |0) =
(
1 2 −1 0
0 −3 3 0
)
.
The third column in U is non-leading so there must be some solutions with λ1, λ2 and λ3 not all
zero. Therefore the given set of polynomials is linearly dependent. ♦
Note. Although it is not required here, it is a good idea to find a specific nonzero solution and check
that the appropriate linear combination is zero. In this example we could use back substitution to
get a solution λ1 = −1, λ2 = 1, λ3 = 1. Checking shows that indeed
−1(1 + x) + 1(2 − x) + 1(−1 + 2x) = 0
for all x ∈ R.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.5. LINEAR INDEPENDENCE 31
6.5.2 Uniqueness and linear independence
The following theorem gives one reason why the idea of linear independence is important.
Theorem 2 (Uniqueness of Linear Combinations). Let S be a finite, non-empty set of vectors
in a vector space and let v be a vector which can be written as a linear combination of S. Then
the values of the scalars in the linear combination for v are unique if and only if S is a linearly
independent set.
Proof. It turns out to be easier to prove the theorem by proving the equivalent result that:
The values of the scalars in the linear combination are non-unique if and only if S is a linearly
dependent set.
Let S = {v1, . . . ,vn}. Suppose first that v has two expressions as a linear combination of S, namely
v = λ1v1 + · · · + λnvn and v = µ1v1 + · · ·+ µnvn.
Subtracting the second equation from the first equation gives
(λ1 − µ1)v1 + · · · + (λn − µn)vn = 0. (#)
If the two sets of scalars are not identical then there must be at least one value of j (with 1 6 j 6 n)
such that λj − µj 6= 0. This means that there is at least one non-zero scalar coefficient in (#) and
therefore the set S is a linearly dependent set. Conversely, if S is a linearly dependent set then
there are scalars α1, . . . , αn, not all zero, such that
α1v1 + · · ·+ αnvn = 0.
If v = λ1v1 + · · · + λnvn is any expression for v as a linear combination of S then we can get a
second expression by saying
v = v + 0
= (λ1v1 + · · ·+ λnvn) + (α1v1 + · · · + αvn)
= (λ1 + α1)v1 + · · ·+ (λn + αn)vn.
The coefficients in this expression are not all the same as the coefficients in the first expression
because α1, . . . , αn are not all zero. Therefore the coefficients in an expression for v as a linear
combination of v1, . . . ,vn are not unique.
The following is a numerical example of the result of Theorem 2.
Example 9. As shown in Example 6, the set of vectors S = {v1,v2,v3,v4} is a linearly dependent
set, where v1 =

12
3

, v2 =

 11
−1

, v3 =

−10
5

, and v4 =

23
5

. Show that b =

 77
−4


belongs to span (S) and then check that the linear combination for b in terms of S is not unique.
c©2020 School of Mathematics and Statistics, UNSW Sydney
32 CHAPTER 6. VECTOR SPACES
Solution. We know that b ∈ span (S) if and only if the system Ax = b, where A is the matrix
with the vectors in S as columns. We then reduce (A|b) to a row-echelon form (U |y).
 1 1 −1 2 72 1 0 3 7
3 −1 5 5 −4

 R2 = R2 − 2R1−−−−−−−−−−−−−−−→
R3 = R3 − 3R1

 1 1 −1 2 70 −1 2 −1 −7
0 −4 8 1 −25


R3 = R3 − 4R2−−−−−−−−−−−−−−−→

 1 1 −1 2 70 −1 2 −1 −7
0 0 0 3 3


The right-hand-side column is not a leading column, so the system does have a solution and b must
be in span (S). The third column in the row-echelon form is non-leading, so the system has infinitely
many solutions and hence there are infinitely many expressions for b as a linear combination of S.

Note. If we want to find these expressions explicitly, we can apply back substitution to the row-
echelon form and find the general solution x4 = 1, x3 = λ, x2 = 6 + 2λ and x1 = −1− λ, where λ
is an arbitrary real parameter. Therefore
 77
−4

 = (−1− λ)

12
3

+ (6 + 2λ)

 11
−1

+ λ

−10
5

+

23
5

 .
We found in Example 6 that if v3 (which is a non-leading variable in this system) is dropped
then the resulting set {v1,v2,v4} is linearly independent.
In this case we find that

 77
−4

 is a unique linear combination of v1, v2 and v4. In fact, by
putting x3 = λ = 0 in the above expression we get
 77
−4

 = −1

12
3

+ 6

 11
−1

+ 1

23
5

 .
6.5.3 Spans and linear independence
Suppose that v1 and v2 are non-zero vectors in R
n. We have seen in Chapter 1 that span (v1,v2)
represents a plane if the vectors are not parallel (in other words, if they are linearly independent),
whereas the span represents a line if they are parallel (in other words, linearly dependent). An
equivalent way of expressing this result is to say that if {v1,v2} is a linearly dependent set then
span (v1,v2) = span (v1) = span (v2) .
For three non-zero vectors in Rn, we have also seen that span (v1,v2,v3) reduces to either a
plane or a line when the three vectors form a linearly dependent set. In either of these cases it is
possible to drop at least one vector from {v1,v2,v3} without changing the span of the set.
In this section we show that similar results are true for the span of any number of vectors in
any vector space. To do this, we need two important results. The first, which is a generalisation of
the results in Examples 2 and 4, is as follows.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.5. LINEAR INDEPENDENCE 33
Theorem 3. A set of vectors S is a linearly independent set if and only if no vector in S can be
written as a linear combination of the other vectors in S, that is, if and only if no vector in S is
in the span of the other vectors in S.
Note. The theorem is equivalent to:
A set of vectors S is a linearly dependent set if and only if at least one vector in S is in the
span of the other vectors in S.
Proof. It is easier to prove the alternative statement because in that case the method of proof
can closely follow the solution given for Example 4.
If some vector vi in S = {v1, . . . , vn} is in the span of the other vectors in S then there are
scalars µ1, . . . , µi−1, µi+1, . . . , µn such that
vi = µ1v1 + · · ·+ µi−1vi−1 + µi+1vi+1 + · · ·+ µnvn.
An obvious rearrangement then gives
µ1v1 + · · · − vi + · · ·+ µnvn = 0.
At least one coefficient in this expression (the coefficient −1 for vi) is non-zero, so the set is linearly
dependent.
Conversely, if S is a linearly dependent set then there are scalars λ1, . . . , λn, not all zero, such
that
λ1v1 + · · ·+ λnvn = 0.
Let i be such that λi 6= 0. Then we can solve for vi in the preceding equation to obtain
vi = − 1
λi
(λ1v1 + · · · + λi−1vi−1 + λi+1vi+1 + · · ·+ λnvn),
This shows that vi is in the span of the other vectors {v1, . . . ,vi−1,vi+1, . . . ,vn}, so the proof is
complete.
Example 10. The set {v1,v2,v3,v4} in Example 6 is a linearly dependent set, and we found that
−v1 + 2v2 + v3 = 0.
We saw that v3 = v1 − 2v2. Hence, v3 ∈ span (v1,v2), which implies v3 ∈ span (v1,v2,v4).
Furthermore, we have v2 ∈ span (v1,v3,v4) and v1 ∈ span (v2,v3,v4). However, v4 is not in
the span of the other three. A geometric interpretation of this result is that v1, v2, v3 lie in the
same plane, whereas v4 does not lie in this plane. ♦
The second key result that we need is as follows.
If a vector is added to a set then the span of the new set is equal to the span of the original
set if and only if the additional vector is in the span of the original set.
Formally, we have the following theorem.
Theorem 4. If S is a finite subset of a vector space V and the vector v is in V , then
span (S ∪ {v}) = span (S) if and only if v ∈ span (S).
c©2020 School of Mathematics and Statistics, UNSW Sydney
34 CHAPTER 6. VECTOR SPACES
[X] Proof. Let S = {v1, . . . ,vn} so that S ∪ {v} = {v1, . . . ,vn,v}.
Obviously v ∈ span (S ∪ {v}), so if span (S ∪ {v}) = span (S) then v ∈ span (S).
To prove the converse, we assume that v ∈ span (S) and prove span (S) = span (S ∪ {v}) by
proving firstly that if a vector u ∈ span (S) then u ∈ span (S ∪ {v}), and secondly that if a vector
u ∈ span (S ∪ {v}) then u ∈ span (S).
For the first proof, suppose that u ∈ span (S). Then
u = λ1v1 + · · ·+ λnvn, for some λ1, . . . , λn ∈ F.
Then u = λ1v1 + · · ·+ λnvn + 0v. Hence u ∈ span (S ∪ {v}).
For the second proof, suppose that u ∈ span (S ∪ {v}). Then
u = λv + λ1v1 + · · ·+ λnvn for some λ, λ1, . . . , λn ∈ F.
But v ∈ span (S), and hence
v = µ1v1 + · · ·+ µnvn for some µ1, . . . , µn ∈ F.
On substituting this linear combination for v into the previous linear combination for u, we find u
is a linear combination of {v1, . . . ,vn}. Hence, u ∈ span (S) and the proof is complete.
Example 11. For the linearly dependent set of four vectors {v1,v2,v3,v4} of Example 6, from
Example 10, we have
span (v1,v2,v3,v4) = span (v1,v2,v4) = span (v1,v3,v4) = span (v2,v3,v4)
which agrees with Theorem 3. ♦
By combining Theorems 3 and 4 we get the following.
If S is a linearly dependent set of vectors then it is possible to drop at least one vector from
S to obtain a new set with the same span as S, whereas if S is a linearly independent set
then dropping any vector from S results in a new set with a smaller span than span (()S).
In formal terms, we have the following theorem.
Theorem 5. Suppose that S is a finite subset of a vector space. The span of every proper subset
of S is a proper subspace of span (S) if and only if S is a linearly independent set.
Example 12. For the linearly dependent set S = {v1,v2,v3,v4} considered in Example 6, we
have seen that span (v1,v2,v4) = span (v1,v2,v3,v4) = R
3, and the set {v1,v2,v4} is a linearly
independent set.
We have also seen in Example 14 of Section 6.4 that S spans R3. If we now drop any vector
from the linearly independent set {v1,v2,v4}, we obtain a set whose span is only a plane in R3
and not all of R3. This illustrates Theorem 5. ♦
For use in the next section, we need one more result involving spanning and linear independence.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.6. BASIS AND DIMENSION 35
Theorem 6. If S is a finite linearly independent subset of a vector space V and v is in V but not
in span (S) then S ∪ {v} is a linearly independent set.
Proof. Let S = {v1, . . . ,vn} and S′ = S ∪ {v}. If S′ is not linearly independent then there exist
scalars λ, λ1, . . . , λn, not all zero, such that
λv+ λ1v1 + · · ·+ λnvn = 0. (*)
If λ = 0 then we must have λi 6= 0 for some 1 6 i 6 n and this would contradict the assumption
that S is linearly independent. Therefore we must have λ 6= 0 and this means that the equation
(∗) can be rearranged to express v as a linear combination of the set S. This contradicts the fact
v is not in span (S), so the supposition that S′ was linearly dependent must be false and hence S′
is linearly independent.
6.6 Basis and dimension
In Chapter 1 we called the set of vectors {e1, . . . , en} the standard “basis” for Rn. We have also
talked of a plane as being “two-dimensional” and ordinary space as being “three-dimensional”. The
main aims of this section are to give precise definitions of the ideas of basis and dimension and to
show that these ideas apply to every vector space.
Recall that S is a spanning set for a vector space V when span (S) = V , that is, when every
vector in V can be written as a linear combination of the vectors in S. We have also shown that
if S is a linearly independent set then all linear combinations formed from S are unique. Hence,
if S is a linearly independent spanning set for V then every vector in V can be written as a
unique linear combination of S.
We have also shown (Theorem 5 of Section 6.5) that a proper subset of a linearly independent
set S spans a proper subspace of span (S). This means, in particular, that if S is a linearly
independent spanning set for a vector space V then dropping any vector from S gives a set
which is not a spanning set for V .
Because of these two properties, linearly independent spanning sets are of fundamental impor-
tance in both the theoretical development and the practical applications of vector spaces.
6.6.1 Bases
Definition 1. A set of vectors B in a vector space V is called a basis for V if:
1. B is a linearly independent set, and
2. B is a spanning set for V (that is, span (B) = V ).
Note. We exclude from our discussion the vector space consisting of only the zero vector.
Example 1. The set {e1, e2, . . . , en}, where e1 =


1
0
...
0

, e2 =


0
1
...
0

, . . . , en =


0
0
...
1

, is a linearly
c©2020 School of Mathematics and Statistics, UNSW Sydney
36 CHAPTER 6. VECTOR SPACES
independent spanning set for Rn, so this set is a basis for Rn. It is called the standard basis for
Rn. Each vector a =

a1...
an

 can be written as a unique linear combination a1e1 + · · ·+ anen. ♦
Example 2. Show that the set S =



21
0

 ,

−10
1

 ,

 01
−1



 is a basis for R3.
Solution. To show that S is a spanning set for R3, we need to show that every vector b ∈ R3
belongs to span (S). This will be true if and only if the system Ax = b, where
A =

 2 −1 01 0 1
0 1 −1

 ,
has a solution for every right hand side b ∈ R3. The augmented matrix (A|b) can be reduced to
the row-echelon form
 2 −1 0 b11 0 1 b2
0 1 −1 b3

 R1 ↔ R2−−−−−−−−−−−−−−→

 1 0 1 b22 −1 0 b1
0 1 −1 b3


R2 = R2 − 2R1−−−−−−−−−−−−−−−→

 1 0 1 b20 −1 −2 b1 − 2b2
0 1 −1 b3


R3 = R3 +R2−−−−−−−−−−−−−−→

 1 0 1 b20 −1 −2 b1 − 2b2
0 0 −3 b1 − 2b2 + b3

 .
For all b ∈ R3, the row-echelon matrix has a non-leading right hand column and hence the equation
Ax = b has a solution. Therefore span (S) = R3.
Moreover, left side of the row-echelon matrix has no non-leading columns, so the only solution
for a zero right hand side is x1 = x2 = x3 = 0. This shows that S is a linearly independent set. We
have now proved S is linearly independent spanning set for R3 and is therefore a basis for R3. ♦
Example 3. Let v1 =

 1−1
2

 , v2 =

21
3

 , v3 =

24
2

 , v4 =

15
0

 and S = {v1,v2,v3,v4}.
Find a subset of S which is a basis for span (S).
Solution. From the result on page 24, we only need to reduce the matrix with the vectors in S
as columns to row-echelon from, then the vectors in S corresponding to the leading columns will
span the same set as S.
 1 2 2 1−1 1 4 5
2 3 2 0

 R2 = R2 +R1−−−−−−−−−−−−−−→
R3 = R3 − 2R1

 1 2 2 10 3 6 6
0 −1 −2 −2


R3 = R3 +
1
3R2−−−−−−−−−−−−−−−→

 1 2 2 10 3 6 6
0 0 0 0


c©2020 School of Mathematics and Statistics, UNSW Sydney
6.6. BASIS AND DIMENSION 37
The vectors corresponding to the leading columns are v1 and v2. Hence {v1, v2} is a spanning set
for span (S). Furthermore, if we remove the third and fourth columns in the matrices in the above
reduction, we can see that {v1, v2} is a linearly independent set. Therefore, this set is a basis for
span (S). ♦
Suppose that B is a basis for a finite-dimensional vector space V . Since B is a spanning set,
every vector in V can be written as a linear combination of B. Since B is also independent, the
linear combination is unique. In formal terms:
Let B = {v1, . . . , vn} be a basis for a vector space V over F. Every vector v ∈ V can be
uniquely written as
v = λ1v1 + · · ·+ λnvn, where λ1, . . . , λn ∈ F.
Example 4. Write the vector b =

−10
5

 as the unique linear combination of the ordered basis

v1 =

12
3

 ,v2 =

 11
−1



 of span (v1,v2).
Solution. We first write b as a linear combination of v1 and v2. From Proposition 3 of Section 6.4
we know that the required scalars are the components of a solution to the system Ax = b, where
A is the matrix whose columns are the given vectors v1, v2. The augmented matrix (A|b) can be
reduced to row-echelon form.
 1 1 −12 1 0
3 −1 5

 R2 = R2 − 2R1−−−−−−−−−−−−−→
R3 = R3 − 3R1

 1 1 −10 −1 2
0 −4 8

 R3 = R3 − 4R4−−−−−−−−−−−−−→

 1 1 −10 −1 2
0 0 0


This system has the unique solution x =
(
1
−2
)
, so b = 1

12
3

− 2

 11
−1

 . ♦
Example 5. Show that the set {i, j,k} is a basis for R3
Solution. We have seen in Section 6.4 Example 5 that {i, j,k} is a spanning set. To prove that
the set is linearly independent, we use the fact that {i, j,k} is an orthonormal set of vectors. That
is,
i · i = j · j = k · k = 1 and i · j = j · k = i · k = 0.
If we assume that
0 = a1i+ a2j+ a3k (#)
then by taking the dot product of (#) with i we get
0 = i · 0 = i · (a1i+ a2j+ a3k)
= a1(i · i) + a2(i · j) + a3(i · k)
= a1,
c©2020 School of Mathematics and Statistics, UNSW Sydney
38 CHAPTER 6. VECTOR SPACES
and hence a1 = 0. Similarly, by taking the dot products of (#) with j and k in turn, we find that
a2 = 0 and a3 = 0. Therefore {i, j,k} is a linearly independent set.
We have now proved that {i, j,k} is a linearly independent spanning set for, and hence a basis
for, R3. ♦
The set {i, j,k} is an example of a orthonormal basis. An orthonormal basis is a basis whose
elements are all of length 1 and are mutually orthogonal. The advantage of using an orthonormal
basis is that we can write easily any vector as the unique linear combination of the basis by dot
product.
Example 6. The set of vectors B = {u1,u2,u3}, where
u1 =


1√
2
0
− 1√
2

 , u2 =


1√
2
0
1√
2

 , u3 =

 0−1
0

 ,
is an orthonormal basis for R3. Write a =

a1a2
a3

 as the unique linear combination of this basis.
Solution. Suppose that a = x1u1 + x2u2 + x3u3. We could find x1, x2, x3 in the same way as
Example 4, but there is a simpler method which uses the orthonormality properties of the basis.
Given that B is orthonormal. That is
u1 · u1 = u2 · u2 = u3 · u3 = 1 and u1 · u2 = u1 · u3 = u2 · u3 = 0.
We have
x1 = u1 · (x1u1 + x2u2 + x3u3) = u1 · a = 1√
2
a1 − 1√
2
a3,
and similarly
x2 = u2 · a = 1√
2
a1 +
1√
2
a3 and x3 = u3 · a = −a2.
Therefore, the unique linear combination is given by
a =
1√
2
(a1 − a3)u1 + 1√
2
(a1 + a3)u2 − a2u3.

Example 7. Show that the set {1, x, x2, . . . , xn} is a basis for Pn(R). (This is called the standard
basis for Pn(R).)
Solution. As we have seen, {1, x, x2, . . . , xn} is a spanning set for Pn(R). This set is also linearly
independent because if
λ11 + λ2x+ · · ·λn+1xn = 0
for all x ∈ R then λ1 = λ2 = · · · = λn+1 = 0. This follows from the theorem that we proved
in Chapter 3 which states that two polynomials agree for all values of x ∈ R if and only if their
corresponding coefficients are equal. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.6. BASIS AND DIMENSION 39
6.6.2 Dimension
In this section we show that “dimension” can be defined for every vector space which is spanned
by a finite set of vectors.
To do this we need two results.
Theorem 1. The number of vectors in any spanning set for a vector space V is always greater
than or equal to the number of vectors in any linearly independent set in V .
[X] Sketch Proof. Suppose I = {v1, · · · ,vm} is a linearly independent set and S = {w1, · · · ,wn}
is a spanning set. Then we can write
v1 = a11w1 + · · · + a1nwn
...
vm = am1w1 + · · · + amnwn.
(1)
Suppose m > n and consider the matrix A = (aij). A has more rows than columns, so if we reduce
A to row-echelon form by the Gaussian elimination algorithm then we must end up with at least
one row of zeros. Apart from a row swap to get it into the right position, this row of zeros will
have been obtained from some row of our original matrix A, say row Rk, by subtracting from
it multiples of other rows or linear combinations of other rows of A. Therefore there are scalars
α1, . . . , αk−1, αk+1, . . . , αm such that
Rk − (α1R1 + · · ·+ αk−1Rk−1 + αk+1Rk+1 + · · ·+ αmRm) (2)
is an all-zero row.
If we use the equations in (1) to express the vector
vk − (α1 v1 + · · ·+ αk−1 vk−1 + αk+1 vk+1 + · · ·+ αm vm) (3)
as a linear combination of w1, . . . ,wn then the coefficient of each wi will be the same as the ith
entry in the row which is defined by (2). But we know that this row is all zeros, so the vector
defined by (3) must be the zero vector. We now have a linear combination of v1, . . . ,vm which
equals zero and at least one coefficient in this combination (the coefficient of vk) is non-zero. This
is not compatible with the fact that the set {v1, . . . ,vm} is linearly independent. By assuming that
m is greater than n we have reached a contradiction, therefore m must be less than or equal to n.

The second important theorem which guarantees the existence of a dimension for a vector space
is as follows:
Theorem 2. If a vector space V has a finite basis then every set of basis vectors for V contains
the same number of vectors, that is, if B1 = {u1, . . . ,um} and B2 = {v1, . . . ,vn} are two bases for
the same vector space V then m = n.
Proof. Using the results of Theorem 1 and the fact that a basis is a linearly independent spanning
set, we have
m > n, since B1 spans V and B2 is linearly independent, and
n > m, since B2 spans V and B1 is linearly independent.
Therefore m = n and the proof is complete.
c©2020 School of Mathematics and Statistics, UNSW Sydney
40 CHAPTER 6. VECTOR SPACES
Since every basis for a vector space with a finite basis contains exactly the same number of
vectors, the following definition makes sense for every vector space with a finite basis.
Definition 2. If V is a vector space with a finite basis, then the dimension of V ,
denoted by dim(V ), is the number of vectors in any basis for V . Such a V is called
a finite dimensional vector space.
Example 8. a) Rn has a basis {e1, . . . , en} of n vectors, and hence dim(Rn) = n.
b) The space of geometric vectors in ordinary physical space has a basis {i, j,k} of three vectors
and therefore its dimension is 3.
c) The subspace span (S) in Example 3 has a basis of two vectors. The dimension of span (S) is
2.
d) We define the dimension of the vector space consisting only of the zero vector to be 0.
e) The space Pn of polynomials of degree less than or equal to n has a basis {1, x, x2, . . . , xn},
so dim(Pn) = n+ 1.
The following theorem summarises some useful results connecting spanning sets, linearly indepen-
dent sets and dimension.
Theorem 3. Suppose that V is a finite dimensional vector space.
1. the number of vectors in any spanning set for V is greater than or equal to the dimension of
V ;
2. the number of vectors in any linearly independent set in V is less than or equal to the dimen-
sion of V ;
3. if the number of vectors in a spanning set is equal to the dimension then the set is also a
linearly independent set and hence a basis for V ;
4. if the number of vectors in a linearly independent set is equal to the dimension then the set
is also a spanning set and hence a basis for V .
Proof. Assume that V is a vector space of dimension n.
The dimension of a vector space is equal to the number of vectors in a basis and a basis is a
linearly independent spanning set. Therefore there is a linearly independent set in V which contains
n vectors and there is also a spanning set for V which contains n vectors.
(1) and (2) follow from Theorem 1.
3. Assume that a spanning set S contains n vectors. Then, as no spanning set for V can contain
fewer than n vectors, there is no proper subset of S which is a spanning set for V . Hence no proper
subset of S has the same span as S. Thus, by Theorem 5 of Section 6.5, S is a linearly independent
set, and is a basis of V .
4. Assume that I = {v1, . . . ,vn} is a linearly independent set of n vectors in V and let v be any
vector in V . If v does not belong to span (I) then, by Theorem 6 of Section 6.5, the set I ∪ {v} is
linearly independent. This implies that V contains a linearly independent set with n+1 > dim(V )
vectors. This would contradict the result of (2), so we must have v ∈ span (I) for all v ∈ V .
Therefore I is a spanning set for V and hence a basis for V.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.6. BASIS AND DIMENSION 41
Some of the uses of Theorem 3 are illustrated in the next example.
Example 9. a) Obviously, the two vectors
(
1
−1
)
and
(
4
5
)
are non-parallel and so they are
linearly independent. Hence, the set of two vectors
{(
1
−1
)
,
(
4
5
)}
is a basis for the two-
dimensional space R2.
b) The set of 3 linearly independent vectors



21
0

 ,

−10
1

 ,

 01
−1



 in Example 2 is a basis
for the three-dimensional space R3.
c) A set of three vectors is not a spanning set for R4 as dim(R4) = 4 > 3.
d) Any set of 10 vectors which spans R10 is a basis for R10 since dim(R10) = 10.
e) Any linearly independent set of 325 vectors in R325 is a basis for R325 since dim(R325) = 325.
f) A set of 1200 vectors in R1209 is not a spanning set as dim(R1209) = 1209 > 1200. ♦
Example 10. Show that the only subspaces in R3 are (1) the origin, (2) lines through the origin,
(3) planes through the origin, and (4) R3 itself.
Solution. By part 2 of Theorem 3, no subspace of R3 can have dimension greater than 3, otherwise
a basis for the subspace would be a linearly independent set with more than three members. It
follows that the only possible dimensions for subspaces (other than the subspace {0}) are 1, 2 and
3.
A subspace of dimension 1 must be of the form span (v), where v is non-zero. We know that
this represents a line through the origin.
A subspace of dimension 2 must be of the form span (v1,v2), where the set {v1,v2} is linearly
independent. We know that this represents a plane through the origin.
If a subspace is of dimension 3, it must be the whole of R3 because a basis for it will be a set
of 3 linearly independent vectors in R3 and hence a basis for R3 itself. ♦
6.6.3 Existence and construction of bases
In this section we examine the following two problems. Firstly, is it always possible to find a basis
for a given vector space V ? Secondly, if we know that a basis does exist for a given vector space
V , how can we find a basis for V ?
For the existence of a basis, we have seen in Example 3 that a finite set of vectors in R3 contains
a subset which is a basis for span (S). This is generally true for any vector space spanned by a
finite set of vectors.
Theorem 4. If S is a finite non-empty subset of a vector space then S contains a subset which is
a basis for span (S).
In particular, if V is any non-zero vector space which can be spanned by a finite set of vectors
then V has a basis.
c©2020 School of Mathematics and Statistics, UNSW Sydney
42 CHAPTER 6. VECTOR SPACES
Proof. Let S be a finite non-empty set of vectors. If S is linearly independent then it is a basis
for span (S) and there is nothing more to be done. If not, then (by Theorem 5 of Section 6.5)
there must be a vector which can be dropped from S without changing the span. This gives a
new set with fewer vectors which still spans span (S). If this new set is linearly independent, we
have a basis. If not, we can again remove a vector and get a smaller set which still spans the same
subspace. If we continue in this way we must eventually get a set which is linearly independent
and spans span (S) (the process cannot continue indefinitely because the original set S had only a
finite number, say n, of members and after n steps we would have no vectors left in the set).
A non-zero vector space V contains the zero vector, so it cannot be spanned by an empty set.
Suppose that S is a non-empty finite spanning set for V . By the above result, there exists a subset
of S which is a basis for span (S) = V .
This theorem shows that a spanning set can always be converted into a basis by removing some
vectors from it. The next theorem proves a result about going in the opposite direction.
Theorem 5. Suppose that V is a vector space which can be spanned by a finite set of vectors. If S
is a linearly independent subset of V then there exists a basis for V which contains S as a subset.
In other words, every linearly independent subset of V can be extended to a basis for V .
Proof. Suppose S is a linearly independent set in V and that V can be spanned by a set of n
vectors. If S spans V then there is nothing more to be done. If not, then there is a vector v ∈ V
which is not in span (S). If we add v to S then we get a new set S ∪ {v} which is still linearly
independent (by Theorem 6 of Section 6.5). If this new set spans V then we can stop. Otherwise,
we can repeat the previous step and add another vector to get a larger linearly independent set.
This process cannot continue beyond n steps, otherwise we would have a linearly independent set
with more than n members (which is more than the number of members in a spanning set) and
this would contradict Theorem 1. But the process only ends when we get a set which does span V .
So eventually we must have a linearly independent spanning set or, in other words, a basis,
Note carefully that the last two theorems only apply to vector spaces which can be spanned
by a finite set of vectors. Any vector space which cannot be spanned by any finite set of vectors
is said to be an infinite-dimensional vector space. The vector space P of all polynomials is an
example of an infinite-dimensional vector space (see Example 22 of Section 6.8).
In the proofs of the last two theorems we used step-by-step procedures to reduce a spanning set
to a basis and to extend a linearly independent set to a basis. These procedures could be translated
into algorithms for finding a basis but they would not be efficient because they involve re-testing
(for linear independence or spanning) each time a vector is added or deleted. In practice, at least
in Rm, we can do all the adding or all the deleting at the same time. We form a suitable matrix,
reduce it to echelon form and examine the echelon form to see which vectors we should add or
delete. The details of the procedures are given in the next two theorems.
Theorem 6 (Reducing a spanning set to a basis in Rm). Suppose that S = {v1, . . . ,vn} is any
subset of Rm and A is the matrix whose columns are the members of S. If U is a row-echelon form
for A and S′ is created from S by deleting those vectors which correspond to non-leading columns
in U then S′ is a basis for span (S).
Proof. Let U ′ be the matrix created by deleting any non-leading columns from U and let A′ be
created by deleting the same-numbered columns from A (so that the columns of A′ are the members
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.6. BASIS AND DIMENSION 43
of S′). The matrix U ′ has no non-leading columns, so the homogeneous system A′y = 0 has no
solutions other than the zero solution. This implies (by Proposition 1 of Section 6.5) that the set
S′ is linearly independent. In removing non-leading columns from U , we cannot create any new
all-zero rows, so the system A′y = b has a solution whenever Ax = b has a solution. This implies
(by Proposition 3 of Section 6.4) that S′ spans the same subspace as S. This completes the proof
that S′ is a basis for span (S).
Example 11. Find a basis and the dimension for the subspace of R4 spanned by the set
S =




1
1
2
2

 ,


2
3
4
5

 ,


−3
1
−6
−2

 ,


1
3
3
6

 ,


−2
−1
−4
−3



 .
Solution. As an exercise, show that the matrix A with the members of S as its columns can be
reduced by elementary row operations to echelon form
U =


1 2 −3 1 −2
0 1 4 2 1
0 0 0 1 0
0 0 0 0 0

 .
The third and fifth columns of U are non-leading, so we remove the third and fifth members from
S and get
S′ =




1
1
2
2

 ,


2
3
4
5

 ,


1
3
3
6




as a basis for span (S), which is 3 dimensional. ♦
Note. Do not confuse the dimension of a subspace with the with the dimension of the vector space
it lies in. In the above example, span (S) is a 3 dimension subspace of R4. This has nothing to do
with R3.
Example 12. Show that the vectors v1 =

01
2

, v2 =

 2−1
−2

, v3 =

32
4

, v4 =

54
2

 span R3
and find a basis for R3 which is a subset of S.
Solution. Suppose that A is the matrix whose columns are the four given vectors. One way
to show that S spans R3 is to reduce the augmented matrix (A|b) to the echelon form as in
Section 6.4.1. The augmented matrix involves a column of indeterminate b.
By Theorem 6, we can first find a basis for span (S) simply by reducing A to a row-echelon
form. If the dimension of the span is 3 then the result follows.
0 2 3 51 −1 2 4
2 −2 4 2

 R1 ↔ R2−−−−−−−−−−−−−−→

1 −1 2 40 2 3 5
2 −2 4 2


R3 = R3 − 2R1−−−−−−−−−−−−−−−→

1 −1 2 40 2 3 5
0 0 0 −6


c©2020 School of Mathematics and Statistics, UNSW Sydney
44 CHAPTER 6. VECTOR SPACES
The matrix U has one non-leading column, the third, so we delete the third member from the
given set and find that the subset B = {v1,v2,v4} is a basis for span (S). However B is a linearly
independent set of 3 vectors in R3, so it is also a basis for R3. ♦
Theorem 7 (Extending a linearly independent set to a basis in Rm).
Suppose that S = {v1, . . . ,vn} is a linearly independent subset of Rm and A is the matrix whose
columns are the members of S followed by the members of the standard basis for Rm. If U is a row-
echelon form for A and S′ is created by choosing those columns of A which correspond to leading
columns in U then S′ is a basis for Rm containing S as a subset.
Proof. Let S′′ be the set of n+m vectors from the columns of A. Since S′′ includes all the standard
basis vectors for Rm, so Rm = span (S′′). By Theorem 6, the set S′ is a basis for Rm.
To see that S′ contains S we need to prove that the first n columns of U (which correspond
to the members of S in A) are leading columns. Let B be the submatrix formed from the first n
columns of A and P be the submatrix formed from the first n columns of U . Since A reduces to
U , we also have B reduces to P . By Proposition 1 of Section 6.5, the linear independence of S
implies Bx = 0 has unique solution x = 0. Hence all columns of P , i.e. the first n columns of U ,
are leading. Hence the result follows.
Example 13. Find a basis for R4 containing the members of the linearly independent set
S =




1
2
4
−2

 ,


2
5
10
−5



 .
Solution. We form the matrix A whose columns are the members of S followed by the standard
basis vectors for R4 and reduce it to row-echelon form
A =


1 2 1 0 0 0
2 5 0 1 0 0
4 10 0 0 1 0
−2 −5 0 0 0 1


R2 = R2 − 2R1−−−−−−−−−−−−−−−→
R3 = R3 − 4R1
R4 = R4 + 2R1


1 2 1 0 0 0
0 1 −2 1 0 0
0 2 −4 0 1 0
0 −1 2 0 0 1


R3 = R3 − 2R2−−−−−−−−−−−−−→
R4 = R4 +R2


1 2 1 0 0 0
0 1 −2 1 0 0
0 0 0 −2 1 0
0 0 0 1 0 1

 R4 = R4 + 12R3−−−−−−−−−−−−−→


1 2 1 0 0 0
0 1 −2 1 0 0
0 0 0 −2 1 0
0 0 0 0 12 1

 = U.
The first, second, fourth and fifth columns are the leading columns in U , so we take the corre-
sponding columns in A and get a basis
S′ =




1
2
4
−2

 ,


2
5
10
−5

 ,


0
1
0
0

 ,


0
0
1
0




for R4. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.6. BASIS AND DIMENSION 45
Note. The procedure stated in the last theorem can also be used in situations where we have
a set S which is neither linearly independent nor spanning and we want to find a basis which
contains as many members of S as possible. We still form a matrix A from the members of S
followed by the standard basis vectors, reduce to echelon form U and pick out from A the columns
which correspond to leading columns in U . The only difference is that the first n columns are not
necessarily all leading columns (because S is not necessarily linearly independent), so the new set
S′ does not necessarily include all the members of S.
Example 14. Find a basis for R4 containing as many as possible of the members of the set
S =




1
2
4
−2

 ,


2
5
1
4

 ,


1
3
−3
6



.
Solution. We form the matrix A whose columns are the members of S followed by the standard
basis vectors for R4 and reduce it to row-echelon form.

1 2 1 1 0 0 0
2 5 3 0 1 0 0
4 1 −3 0 0 1 0
−2 4 6 0 0 0 1


R2 = R2 − 2R1−−−−−−−−−−−−−−−→
R3 = R3 − 4R1
R4 = R4 + 2R1


1 2 1 1 0 0 0
0 1 1 −2 1 0 0
0 −7 −7 −4 0 1 0
0 8 8 2 0 0 1


R3 = R3 + 7R2−−−−−−−−−−−−−→
R4 = R4 − 8R2


1 2 1 1 0 0 0
0 1 1 −2 1 0 0
0 0 0 −18 7 1 0
0 0 0 18 −8 0 1

 R4 = R4 +R3−−−−−−−−−−−−→


1 2 1 1 0 0 0
0 1 1 −2 1 0 0
0 0 0 −18 7 1 0
0 0 0 0 −1 1 1


The first, second, fourth and fifth columns are the leading columns in the row-echelon form matrix,
so we take the corresponding columns in A and get a basis
S′ =




1
2
4
−2

 ,


2
5
1
4

 ,


1
0
0
0

 ,


0
1
0
0




for R4.
Note that this basis only contains two of the three vectors in the original set because S was not
linearly independent (Question, what is dim(span(S))?) ♦
In Chapter 7 we shall need the following proposition which follows from Theorems 3 and 4.
Proposition 8. If V is a finite-dimensional space andW is a subspace of V and dim(W ) = dim(V )
then W = V .
Proof. By Theorem 4, there exists B a basis for W . So B is a linearly independent set in V . By
Theorem 3 part 4, B is also a basis for V and V = span (B) =W .
c©2020 School of Mathematics and Statistics, UNSW Sydney
46 CHAPTER 6. VECTOR SPACES
6.7 [X] Coordinate vectors
In Chapter 1, we have seen that the position vector a of a point in an n-dimensional space could
be represented by a (column) coordinate vector

a1...
an

 ∈ Rn, where the coordinates are the scalars
in the linear combination which expresses a in terms of the standard basis vectors {e1, . . . , en}.
As remarked in the last section, any basis B for a finite-dimensional vector space V over F has
the property that — every vector v ∈ V can be written as a unique linear combination of B. If we
specify a fixed ordering of the basis B = {v1, . . . ,vn} then we can associate with every vector v
the unique n-vector

x1...
xn

 ∈ Fn, where v = x1v1 + · · ·+ xnvn.
Note that the order of vectors in B is important because changing the order of the vectors also
changes the order of the coefficients and therefore changes the n-vector corresponding to v.
This generalise the notion of coordinates of a point in an n-dimensional real space to coordinates
of a vector in any finite dimensional vector space. In this case, we can represent a vector in any
vector space by a coordinate vector which is an n-vector in Fn. Consequently, we can use all the
techniques which we have learnt in the previous sections to study vector spaces by matrices over F.
Now, we introduce the definition of coordinate vectors.
Definition 1. Let V be an n-dimensional vector space and let the ordered set of
vectors B = {v1, . . . , vn} be a basis for V . If
v = x1v1 + · · ·+ xnvn
then the vector
[v]B = x =

x1...
xn


is called the coordinate vector of v with respect to the ordered basis B.
Example 1. With respect to the ordered basis B =




0
1
3
−1

 ,


2
5
−3
1

 ,


4
−1
0
2

 ,


−6
2
1
4



 of R4,
a vector v ∈ R4 has the coordinate vector [v]B =


1
−3
4
2

. Find v.
Solution.
v = 1


0
1
3
−1

− 3


2
5
−3
1

+ 4


4
−1
0
2

+ 2


−6
2
1
4

 =


−2
−14
14
12

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.7. [X] COORDINATE VECTORS 47

Example 2. Find the coordinate vector for the vector b =

−10
5

 with respect to the ordered
basis

v1 =

12
3

 ,v2 =

 11
−1



 of span (v1,v2).
Solution. From the result of Example 4 in Section 6.6, the coordinate vector for b with respect
to the ordered basis {v1, v2} is
(
1
−2
)
. ♦
Example 3. The set of vectors B = {u1,u2,u3}, where
u1 =


1√
2
0
− 1√
2

 , u2 =


1√
2
0
1√
2

 , u3 =

 0−1
0

 ,
is an orthonormal basis for R3. Find the coordinate vector of a =

a1a2
a3

 with respect to this basis.
Solution. From the result of Example 6 in Section 6.6, the required coordinate vector is therefore
[a]B =


1√
2
(a1 − a3)
1√
2
(a1 + a3)
−a2

 .

One of the most important aspects of coordinate vectors is that by using them we can reduce
problems in any finite-dimensional real vector space to problems in Rn, where of course we have
powerful matrix techniques available.
Example 4. Find the coordinate vector of p(x) = a0 + a1x+ · · ·+ anxn ∈ Pn(R) with respect to
the ordered basis B = {1, x, x2, . . . , xn}.
Solution. Of course there is no need to do any calculation in this case — we already know how
to write p as a linear combination of elements of B. The coordinate vector of p with respect to B
is [p]B =


a0
a1
...
an

 ∈ Rn+1. ♦
Things are more difficult if we are given a nonstandard basis for Pn(R).
Example 5. As a special case of Example 17 in 6.8.3, the set P2(R) of all real polynomials which
have degree two or less is a vector space. You are given an ordered basis B = {1+x, x+x2, 1+x2}
for P2(R). Find the coordinate vector for p(x) = 1− x2 with respect to the ordered basis B.
c©2020 School of Mathematics and Statistics, UNSW Sydney
48 CHAPTER 6. VECTOR SPACES
Solution. We need to find scalars α1, α2, α3 ∈ R such that
p(x) = 1− x2 = α1(1 + x) + α2(x+ x2) + α3(1 + x2).
Expanding the right-hand-side and comparing coefficients shows that we must have
α1 + α3 = 1
α1 + α2 = 0
α2 + α3 = −1.
This reduces the problem to that of solving a system of linear equations in the unknowns α1, α2 and
α3. Solving these equations by Gaussian elimination gives the unique solution α1 = 1, α2 = −1,
α3 = 0. Therefore [p]B =

 1−1
0

. ♦
Because coordinate vectors can be obtained in any finite-dimensional vector space, we can turn
problems involving vectors in the vector space into problems involving coordinate vectors in Fn.
There are three important results which are fundamental to this approach.
Theorem 1. If B is an ordered basis for a vector space V over a field F and u,v ∈ V and λ ∈ F,
then
(a) u = v if and only if [u]B = [v]B , that is, two vectors are equal if and only if the corresponding
coordinate vectors are equal.
(b) [u+v]B = [u]B + [v]B , that is, the coordinate vector of the sum of two vectors is equal to the
sum of the two corresponding coordinate vectors.
(c) [λv]B = λ[v]B , that is, the coordinate vector of a scalar multiple of a vector is equal to the
same scalar multiple of the corresponding coordinate vector.
Proof. Equality. If u and v have the same coordinates then they are equal to the same linear
combination of B and must therefore be equal to each other. Conversely, if u = v then, because B
is a basis and (by Theorem 2 of Section 6.5) no vector can have two different expressions as a linear
combination of a given basis, any expressions for u and v as linear combinations of B must be the
same. The coefficients in these linear combinations are the coordinates of u and v, so [u]B = [v]B .
Addition. Let B = {b1, . . . , bn} and [u]B =

λ1...
λn

 and [v]B =

µ1...
µn

. By the definition of
coordinate vectors, we have
u = λ1b1 + · · ·+ λnbn
and
v = µ1b1 + · · ·+ µnbn.
By adding these two equations we get
u+ v = (λ1 + µ1)b1 + · · ·+ (λn + µn)bn
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 49
and this implies (by the definition of coordinate vectors) that
[u+ v]B =

λ1 + µ1...
λn + µn

 =

λ1...
λn

+

µ1...
µn

 = [u]B + [v]B .
Multiplication by a scalar. The proof is similar to that for addition, and is omitted.
6.8 [X] Further important examples of vector spaces
All of the material in this section is regarded as being more difficult than the material in previous
sections of this chapter.
In the preceding sections we have developed a general theory of vector spaces. However, for
simplicity, we have concentrated on examples which illustrate the applications of the theory to
the particular vector space Rn. In this section we shall examine three other important examples of
vector spaces, namely the vector spaces of matrices, of real-valued functions, and of polynomials. We
shall show how the vector space ideas of subspace, linear combination and span, linear independence,
basis, dimension, and coordinate vector apply to these spaces.
Before the discussion of these vector spaces, we introduce an alternative way of proving a subset
to be a subspace.
Theorem 1 (Alternative Subspace Theorem). A subset S of a vector space V over a field F is a
subspace of V if and only if S contains the zero vector and it satisfies the closure condition:
If v1,v2 ∈ S, then λ1v1 + λ2v2 ∈ S for all λ1, λ2 ∈ F. (#)
Proof. We prove that the closure condition (#) is satisfied if and only if both closure under
addition and closure under scalar multiplication are also satisfied.
We first assume that (#) is satisfied. Then, for the special case of λ1 = λ2 = 1, condition (#)
becomes
v1 + v2 ∈ S for all v1,v2 ∈ S,
and hence S is closed under addition. Further, for the special case of λ2 = 0, condition (#) becomes
λ1v1 ∈ S for all v1 ∈ S and for all λ1 ∈ F,
and hence S is closed under multiplication by a scalar. Thus, if (#) is satisfied then closure under
addition and closure under scalar multiplication are also satisfied.
We now assume that both closure conditions are satisfied. Then, from closure under multipli-
cation by a scalar, we have, for all v1,v2 ∈ S and for all λ1, λ2 ∈ F , that
λ1v1 ∈ S and λ2v2 ∈ S.
Then, on adding these scalar multiples and using closure under addition, we have that
λ1v1 + λ2v2 ∈ S,
and hence (#) is satisfied.
Thus, (#) is satisfied if and only if closure under addition and under scalar multiplication are
both satisfied. We then use the Subspace Theorem of Section 6.3 to complete the proof.
c©2020 School of Mathematics and Statistics, UNSW Sydney
50 CHAPTER 6. VECTOR SPACES
6.8.1 Vector spaces of matrices
We have seen in Example 3 in Section 6.1 that Mmn(R) the set of all m×n real matrices is a vector
space over R and Mmn(C) the set of m × n complex matrices is a vector space over C. In this
section, we are going to see some examples of their subspaces, their bases and coordinate vectors
with respect to these bases.
Example 1. Show that the set
S = {A ∈M22(R) : [A]11 = [A]22 = 1}
is not a vector subspace of M22(R).
Solution. Since the zero matrix is not in S, then vector space axiom 4 Existence of Zero is not
satisfied. Therefore the set S is not a vector subspace. ♦
Example 2. Prove that the set of n× n real symmetric matrices is a subspace of Mnn(R).
Solution. Recall that a square matrix A is called symmetric if A = AT . Let S be the set of n×n
symmetric matrices. Obvious, the zero matrix is symmetric, and so belongs to S.
Suppose that A, B ∈ S and λ, µ ∈ R. By a property of transpose in Chapter 5 and the fact
that A, B are symmetric, we have
(λA+ µB)T = λAT + µBT = λA+ µB.
Therefore, λA + µB is symmetric and so it is in S. Hence by the Alternative Subspace Theorem,
S is a subspace. ♦
Example 3. For 1 6 i 6 m, 1 6 j 6 n, let Eij be the m× n matrix with all entries 0 except that
the ijth entry is 1. Show that the set B = {Eij : 1 6 i 6 m, 1 6 j 6 n} is a basis for Mmn(C).
Solution. For any A = (aij) ∈Mmn(C), we have
A = a11E11 + · · ·+ a1nE1n + a21E21 + · · ·+ amnEmn =
m∑
i=1
n∑
j=1
aijEij.
Thus, S is a spanning set for Mmn(C).
Suppose that
m∑
i=1
n∑
j=1
λijEij = 0, where 0 is the m× n zero matrix. Note that
m∑
i=1
n∑
j=1
λijEij =

λ11 · · · λ1n... . . . ...
λm1 · · · λmn


is the zero matrix. Hence λ11 = · · · = λmn = 0 and so B is an independent set. Therefore B is a
basis for Mmn(C). ♦
Note. In general, the set B is a basis called the standard basis for Mmn =Mmn(F) for F = Q,R
or C. As a result the dimension of Mmn is mn.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 51
Example 4. The coordinate vector of the matrix
(
a b
c d
)
with respect to the standard basis of
M22 is


a
b
c
d

.
We can also solve problems of independent sets, spanning sets and bases by Gaussian Elimina-
tion. Be careful not to mix up the augmented matrix formed from the linear combinations and the
elements in Mmn.
Example 5. Show that set{(
1 1
1 0
)
,
(
1 1
0 2
)
,
(
1 0
1 2
)
,
(
0 1
1 2
)}
is a basis for M22.
Solution. Note that the dimension of M22 is 4 and the number of vectors in the given set is also
4. To prove this set is a basis, by Theorem 3 part 4 in Section 6.6.2 we only need to show that this
set is independent.
Suppose that
λ1
(
1 1
1 0
)
+ λ2
(
1 1
0 2
)
λ3
(
1 0
1 2
)
+ λ4
(
0 1
1 2
)
=
(
0 0
0 0
)
.
By equating the corresponding entries in both sides, we have
λ1 + λ2 + λ3 + 0 = 0
λ1 + λ2 + 0 + λ4 = 0
λ1 + 0 + λ3 + λ4 = 0
0 + 2λ2 + 2λ3 + 2λ4 = 0
Since the right hand sides are all zeros, we can simply reduce the coefficient matrix to row-echelon
form. 

1 1 1 0
1 1 0 1
1 0 1 1
0 2 2 2

 R2 = R2 −R1−−−−−−−−−−−−−−→R3 = R3 −R1


1 1 1 0
0 0 −1 1
0 −1 0 1
0 2 2 2


R2 ↔ R3−−−−−−−−−−→


1 1 1 0
0 −1 0 1
0 0 −1 1
0 2 2 2

 R4 = R4 + 2R2 + 2R3−−−−−−−−−−−−−−−−−−−−→


1 1 1 0
0 −1 0 1
0 0 −1 1
0 0 0 6


Thus the system of equations has unique solution λ1 = λ2 = λ3 = λ4 = 0. Hence the set of matrices
is a basis. ♦
Note. The four columns of the coefficient matrix are the coordinate vectors of the four matrices
with respect to the standard basis.
c©2020 School of Mathematics and Statistics, UNSW Sydney
52 CHAPTER 6. VECTOR SPACES
6.8.2 Vector spaces associated with real-valued functions
Before reading this section, you might like to quickly read the brief review of function notation
given in Appendix 6.9.
We know how to add two functions and how to multiply a function by a real number, so it is
natural to ask whether or not these operations satisfy the axioms of a vector space. In this section
we shall see that they do.
The Vector Space of Real-Valued Functions. Let X be a non-empty set. Consider the set
(which we call R[X]) of all possible real-valued functions with domain X, that is,
R[X] = {f : X → R}.
We also let + be the usual rule for adding real functions, i.e., f + g is the function given by
(f + g)(x) = f(x) + g(x) for all x ∈ X,
and we let ∗ represent the usual rule for multiplying a real function by a real number, i.e., λ ∗ f is
the function given by
(λ ∗ f)(x) = (λf)(x) = λf(x) for all x ∈ X.
We then have the following result:
Proposition 2. The system (R[X],+, ∗,R) is a vector space over the real-number field R.
Proof. The proof of this proposition follows the usual procedure of proving that all ten of the vector
space axioms are satisfied. We give proofs of two of the axioms and leave the proofs of the other
ones as exercises.
Closure under Addition. If f, g ∈ R[X], then f(x) and g(x) are defined and are real numbers
for all x ∈ X. Then, using the usual rule for function addition, we have (f + g)(x) = f(x) + g(x)
for all x ∈ X.
Thus, (f + g)(x) is also defined and is a real number for all x ∈ X. Hence f + g : X → R and
therefore f + g ∈ R[X].
Closure under Multiplication by Scalars. If f ∈ R[X] and λ ∈ R, then, from the usual rule
for multiplication of a function by a real number, we have
(λf)(x) = λf(x) for all x ∈ X.
Thus (λf)(x) is defined and is a real number for all x ∈ X. Hence, λf : X → R and therefore
λf ∈ R[X].
In the next examples we briefly mention some subspaces of the vector space of real-valued
functions. These subspaces are of importance in many areas of modern mathematics, science and
engineering. The most important one, the subspace of polynomials will be discussed in the next
subsection.
Example 6. Let (a, b) be an interval of R, and let C[(a, b)] be the set of all continuous real-valued
functions on (a, b). Show that C[(a, b)] is a subspace of the vector space R[(a, b)] of all real-valued
functions with domain (a, b).
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 53
Solution. We use the Alternative Subspace Theorem.
The set C[(a, b)] contains the zero function, as the zero function is continuous and so is in this
set.
From calculus, we know that if f and g are continuous on an interval, then λ1f + λ2g is also
continuous on the same interval for all real λ1 and λ2. Also, λ1f + λ2g is a real function, and
hence C[(a, b)] is a subset of R[(a, b)] which satisfies the condition (#) of the Alternative Subspace
Theorem. Thus C[(a, b)] is a subspace of R[(a, b)]. ♦
Calculus provides a very rich source of subspaces of the vector space of real-valued functions.
Example 7. Let C(1)[(a, b)] be the set of all real-valued functions whose first derivative exists and
is continuous on an interval (a, b) of R. Show that C(1)[(a, b)] is a subspace of the vector space
R[(a, b)].
Solution. We use the Alternative Subspace Theorem.
The set C(1)[(a, b)] contains the zero function, as the zero function has a continuous first deriva-
tive and so is in this set.
From calculus, we know that if the first derivatives of the real functions f and g exist and are
continuous on an interval, then, for all λ1, λ2 ∈ R, the function λ1f + λ2g is also a real-valued
function whose first derivative exists and is continuous on the same interval. Thus, C(1)[(a, b)] is a
subset of R[(a, b)] which satisfies condition (#) of the Alternative Subspace Theorem, and hence
it is a subspace of R[(a, b)].
Note that C(1)[(a, b)] is also a subspace of the vector space C[(a, b)] of real-valued, continuous
functions on (a, b) given in Example 6. Can you see why? ♦
An important class of function subspace is defined by the solutions of homogeneous, linear
differential equations.
Example 8. Let S be the subset of the vector space R[R] of real-valued functions on R defined by
S =
{
f ∈ R[R] : d
2f
dx2
− 6 df
dx
+ 5f = 0
}
.
Show that S is a subspace of R[R].
Solution. As the zero function satisfies the equation, so it belongs to S. For f1, f2 ∈ S and
λ1, λ2 ∈ R, we have, on using the properties of derivatives, that
d2
dx2
(λ1f1 + λ2f2)− 6 d
dx
(λ1f1 + λ2f2) + 5(λ1f1 + λ2f2)
= λ1
(
d2f1
dx2
− 6df1
dx
+ 5f1
)
+ λ2
(
d2f2
dx2
− 6df2
dx
+ 5f2
)
= λ10 + λ20 = 0.
Hence, λ1f1 + λ2f2 ∈ S, and therefore S is a subspace. ♦
Subspaces can also be defined by integration.
c©2020 School of Mathematics and Statistics, UNSW Sydney
54 CHAPTER 6. VECTOR SPACES
Example 9. Let S be the subset of the vector space C [[−π, π]] of all real-valued continuous
functions on the interval [−π, π] defined by
S =
{
f ∈ C [[−π, π]] :
∫ pi
−pi
f(x)g(x) dx = 0
}
,
where g is a fixed continuous function. Clearly the product f(x)g(x) is integrable on [−π, π], since
f and g are continuous. Prove that S is a subspace of C [[−π, π]].
Solution. The zero function is in S. For all f1, f2 ∈ S and for all λ1, λ2 ∈ R, we have
λ1f1 + λ2f2 ∈ C [[−π, π]] and∫ pi
−pi
(
λ1f1(x) + λ2f2(x)
)
g(x) dx = λ1
∫ pi
−pi
f1(x)g(x) dx + λ2
∫ pi
−pi
f2(x) g(x)dx
= λ10 + λ20 = 0.
Hence λ1f1 + λ2f2 ∈ S, and thus S is a subspace of C [[−π, π]]. ♦
Example 10. As shown in courses on differential equations, the homogeneous, linear differential
equation in Example 8 has the solution
f(x) = λ1e
5x + λ2e
x for λ1, λ2 ∈ R.
Hence the set S of solutions is given by S = span
(
e5x, ex
)
, and thus {e5x, ex} is a spanning set for
S. ♦
Example 11. Show that the set S = {sin(x), cos(x)} is a linearly independent subset of the vector
space R [[−π, π]] of all real-valued functions on the interval [−π, π].
Solution. We have to show that if
f(x) = λ1 sin(x) + λ2 cos(x) = 0 for all x ∈ [−π, π]
then λ1 and λ2 are zero.
We first note that if the linear combination f(x) is zero for all x ∈ [−π, π] then f(x) must also be
zero for the values x = 0 and x = pi2 . We then obtain
0 = f(0) = λ2 and 0 = f

2
)
= λ1.
Thus the scalars are zero, and hence the set is linearly independent. ♦
Example 12. Show that the subset Sn of the vector space R[R] defined by
Sn = {sin(kx) : k = 1, . . . , n and x ∈ R}
is a linearly independent set.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 55
Solution. We will use a trick which is based on an extension to the vector space of real functions of
the idea of orthogonality of geometric vectors and vectors in Rn (see Example 5 and 6 of Section 6.6).
We first prove that if k and m are integers then∫ pi
0
sin(kx) sin(mx)dx =
{
0 for k 6= m
pi
2 for k = m.
From trigonometry, we have
sin(kx) sin(mx) =
1
2
(cos(k −m)x− cos(k +m)x) .
Then, for k 6= m,∫ pi
0
sin(kx) sin(mx)dx =
1
2
∫ pi
0
cos(k −m)x dx− 1
2
∫ pi
0
cos(k +m)x dx
=
[
1
2(k −m) sin(k −m)x
]pi
0

[
1
2(k +m)
sin(k +m)x
]pi
0
= 0
as sin(0) = 0, sin(k −m)π = 0, and sin(k +m)π = 0 for integers k and m. We leave the proof
of the result for k = m as a simple exercise.
Now suppose that
n∑
k=1
λk sin(kx) = 0.
On multiplying this expression by sin(mx) and integrating from 0 to π, we have
0 =
n∑
k=1
λk
∫ pi
0
sin(kx) sin(mx) dx = λm
π
2
Hence, λm = 0 for 1 6 m 6 n, and thus the set Sn is linearly independent. ♦
Example 13. Use the previous example, we can show that the vector space R[R] cannot be
spanned by a finite set and hence it is an infinite dimensional vector space. If it can be spanned
by a finite set of m elements, by Theorem 1 in Section 6.6, all independent set has at most m
elements. We can choose an integer n such that n > m, by the previous example, Sn will be a
linearly independent set of more than m elements. Hence R[R] cannot be spanned by a finite set.♦
6.8.3 Vector spaces associated with polynomials
Polynomials can be added, subtracted and multiplied. From the point of view of vector spaces,
only addition (and subtraction) and multiplication of a polynomial by a scalar are relevant.
We will be concerned with polynomials defined over either the real or complex fields. Although
it is possible to generalise all of the results in this section to the rational field Q and to also generalise
many of the results to the finite fields Zp, (p a prime), we will not do so here. In the following,
therefore, the field F should be taken as either the real numbers R or the complex numbers C.
We begin by quickly reviewing the definitions of polynomial function, polynomial addition,
multiplication of a polynomial by a scalar, and equality of polynomials.
c©2020 School of Mathematics and Statistics, UNSW Sydney
56 CHAPTER 6. VECTOR SPACES
Definition 1. A function p : F → F is called a polynomial function over F if
there is a natural number n ∈ N and numbers a0, a1, . . . , an ∈ F such that
p(z) = a0 + a1z + · · ·+ anzn =
n∑
k=0
akz
k for all z ∈ F.
For brevity we will usually refer to a polynomial function as a polynomial even though it is important
in advanced mathematics courses to distinguish between the two.
Polynomials may be added and multiplied by scalars in such a way as to produce other polyno-
mials. The formal definitions follow the usual definitions of addition and multiplication by a scalar
for functions (see Appendix 6.9).
Definition 2. If p and q are polynomials over the same field F given by
p(z) =
n∑
k=0
akz
k and q(z) =
m∑
k=0
bkz
k for all z ∈ F,
then the sum polynomial is the polynomial p+ q given by
(p+ q)(z) = p(z) + q(z) =
max(n,m)∑
k=0
(ak + bk)z
k for all z ∈ F.
That is, the rule is to add corresponding coefficients. The rule for subtraction of polynomials follows
immediately from the addition rule, and it is to subtract corresponding coefficients.
Definition 3. If λ ∈ F and p is a polynomial over F given by
p(z) =
n∑
k=0
akz
k for all z ∈ F
then the scalar multiple λ p of p is the polynomial given by
(λp)(z) = λ(p(z)) =
n∑
k=0
(λak)z
k for all z ∈ F.
That is, the rule is to multiply each coefficient by the scalar.
The last main property of polynomials that we need is given by the following Uniqueness
Proposition (see Chapter 3 for a proof for complex polynomials).
Proposition 3 (Uniqueness Proposition for Real and Complex Polynomials). Let p and q be
polynomials over F given by
p(z) =
n∑
k=0
akz
k and q(z) =
n∑
k=0
bkz
k for all z ∈ F.
Then, if the field F is either R or C, we have that p(z) = q(z) for all z ∈ F if and only if ak = bk
for all k = 0, 1, 2, . . . , n.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 57
An immediate consequence of Proposition 3 is the following important result.
Corollary 4. If F is the field R or C, then a polynomial p over F has the property that p(z) = 0
for all z ∈ F if and only if all of its coefficients are zero.
Note. that this is not true for other fields such as Zp, p a prime.
The unique polynomial, whose function values are all zero and which has all of its coefficients
equal to zero, is called the zero polynomial.
The fundamental vector space associated with polynomials is defined in the following example.
Example 14 (The Vector Space of Polynomials over F). The vector space of polynomials over a
field F is the system (P(F),+, ∗,F) defined as follows. The set of “vectors” is the set P(F) of all
possible polynomials over the field F, i.e.,
P(F) = {p : p(z) = a0 + a1z + · · ·+ anzn for n ∈ N, aj ∈ F, 1 6 j 6 n, z ∈ F}.
The rule of “vector addition” is the polynomial addition rule given in Definition 2 and the rule of
multiplication is the scalar multiplication rule for polynomials given in Definition 3.
To prove that this system is a vector space it is necessary to check that all ten of the vector
space axioms are satisfied. We leave the checking of the axioms as an exercise. ♦
Example 15. As special cases of Example 14, we have the vector space of real polynomials
(P(R),+, ∗,R) (we have seen this in Example 4 in Section 6.1) and the vector space of complex
polynomials (P(C),+, ∗,C). ♦
Notation. We will usually talk of the polynomial vector space P instead of the more formal
(P(F),+, ∗,F) when there can be no possibility of confusion over the field (R or C) being used.
When necessary, the vector space of real polynomials will be referred to as P(R) and the vector
space of complex polynomials as P(C).
As shown in the following example, the field for the polynomials and the field for the scalars
must be, in some sense, compatible.
Example 16. Show that the system (P(C),+, ∗,R) of complex polynomials and real scalars is a
vector space, whereas the system (P(R),+, ∗,C) of real polynomials and complex scalars is not a
vector space.
Solution. It can be checked without too much difficulty that both systems satisfy nine of the
ten vector space axioms. The axioms satisfied by both are the five axioms of addition, the three
scalar-multiplication axioms of associativity, commutativity, multiplication by 1, and the scalar
and vector distributive axioms.
The remaining axiom to check is closure under scalar multiplication.
For the system of complex polynomials and real scalars, we have that if p ∈ P(C) is a complex
polynomial and λ ∈ R, then λp is also a complex polynomial. The closure under scalar multiplica-
tion axiom is therefore satisfied, and hence the system of complex polynomials and real scalars is
a vector space.
For the system of real polynomials and complex scalars, we note that x ∈ P(R), i ∈ C, ix 6∈ P(R).
The closure under scalar multiplication axiom is therefore not satisfied, and hence the system of
real polynomials with complex scalars is not a vector space. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
58 CHAPTER 6. VECTOR SPACES
We now consider some of the subspaces of the polynomial vector space P = P(F). The most
important subspaces of P are the vector spaces of polynomials of degree less than or equal to n, for
some n.
Example 17. Let P be the vector space of polynomials over F, and let Pn be the subset of P
consisting of all polynomials of degree less than or equal to some fixed integer n > 0, that is,
Pn = {p ∈ P : deg(p) 6 n}.
Show that Pn is a subspace of P.
Solution. If p, q ∈ Pn, then there exist coefficients {a0, . . . , an} and {b0, . . . , bn} such that
p(z) = a0 + a1z + · · ·+ anzn
q(z) = b0 + b1z + · · ·+ bnzn
for all z ∈ F. Then, for λp+ µq with λ, µ ∈ F, we have
(λp+ µq)(z) = λp(z) + µq(z) = (λa0 + µb0) + · · ·+ (λan + µbn)zn for all z ∈ F.
But the coefficients λaj + µbj, 1 6 j 6 n, are also scalars in F, and hence λp+ µq ∈ Pn. Thus the
condition in the Alternative Subspace Theorem is satisfied, and Pn is a subspace of P. ♦
Note that the subset of all polynomials of degree exactly n is not a subspace, since this subset
does not contain the zero polynomial.
Subspaces of the space of all polynomials can also be formed by selecting all polynomials which
have their roots at given points.
Example 18. Let Pn be the vector space of polynomials of degree less than or equal to n over F.
Show that the subset S of Pn given by
S = {p ∈ Pn : p(5) = α}
is a subspace of Pn if and only if α = 0.
Solution. We use the Alternative Subspace Theorem.
If α 6= 0, then the zero polynomial is not in S and so S is not a subspace.
If α = 0, then the zero polynomial is in S and so S is not empty. If p, q ∈ S and λ1, λ2 ∈ F,
then p(5) = 0 and q(5) = 0 and
(λ1p+ λ2q)(5) = λ1p(5) + λ2q(5) = 0.
Hence λ1p+ λ2q ∈ S, and S is a subspace.
Therefore, S is a subspace of Pn if and only if α = 0.
Note that this subspace is the set of all polynomials of degree less than or equal to n over F which
have a root at z = 5. ♦
As the ideas of linear combination, span and linear independence apply to all vector spaces,
they apply to spaces of polynomials. The methods used in Sections 6.4 and 6.5 for Rn can be used
to solve problems of spanning sets and independent sets.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 59
Example 19. Does the complex polynomial p belong to span (p1, p2), where p, p1, p2 are complex
polynomials defined by
p(z) = 4 + z + 2z2, p1(z) = 1 + z − z2 and p2(z) = 2− z for all z ∈ C?
Solution. p ∈ span (p1, p2) if and only if p is a linear combination of p1 and p2, that is, if and
only if there are scalars x1, x2 such that
p(z) = x1p1(z) + x2p2(z) for all z ∈ C.
That is,
4 + z + 2z2 = x1(1 + z − z2) + x2(2− z)
= (x1 + 2x2) + (x1 − x2)z + (−x1)z2 for all z ∈ C.
From the Uniqueness Proposition for Polynomials (Section 6.8.3), we know that polynomials
are equal if and only if coefficients of corresponding powers of z are equal. Equating coefficients of
equal powers then gives the system of linear equations
x1 + 2x2 = 4, x1 − x2 = 1, −x1 = 2,
with augmented matrix
(A|b) =

 1 2 41 −1 1
−1 0 2

 .
Then, p is in the span of p1 and p2 if and only if these equations have a solution. It is easy to see
that these equations have no solution, and hence p is not in span(p1, p2). ♦
Example 20. Is the set {p1, p2, p3} of polynomials, where
p1(z) = 1 + 2z − z2; p2(z) = −3− z + 2z2; p3(z) = 2 + 3z + z2,
a linearly independent subset of P2?
Solution. Following the usual test for linear independence, we look for scalars x1, x2, x3 such
that x1p1 + x2p2 + x3p3 = 0, that is, such that
x1(1 + 2z − z2) + x2(−3− z + 2z2) + x3(2 + 3z + z2) = 0 for all z ∈ C.
Thus, the polynomial on the left is the zero polynomial, and hence the coefficient of each power of
z is zero, that is,
x1 − 3x2 + 2x3 = 0; 2x1 − x2 + 3x3 = 0; −x1 + 2x2 + x3 = 0.
This system of equations corresponds to the homogeneous system Ax = 0, where the matrix A and
an equivalent row-echelon form U are
A =

 1 −3 22 −1 3
−1 2 1

 ; U =

 1 −3 20 5 −1
0 0 145

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
60 CHAPTER 6. VECTOR SPACES
Then, as all columns of U are leading columns, the only solution for the scalars is
x1 = x2 = x3 = 0, and hence the set is linearly independent. ♦
As we can discuss spans and independence of sets of polynomials, we also have the notion of
dimensions of subspaces of polynomials. In the next example, we construct a standard basis for
Pn.
Example 21. Show that the set {1, z, . . . , zn} is a basis for Pn(F), where F = R or C.
Solution. A polynomial p of degree less than or equal to n over a field F is a function of the form
p(z) = a0 + a1z + · · · + anzn with aj ∈ F for 0 6 j 6 n and for z ∈ F,
where any or all of the coefficients may be zero. Hence span (1, z, . . . , zn) = Pn(F).
Furthermore, if
a0 + a1z + · · · + anzn = 0 for all z ∈ C.
This linear combination is the zero polynomial, and hence, from the Uniqueness Proposition for
Polynomials of Section 6.8.3, all of the coefficients are zero. Hence {1, z, . . . , zn} is independent.
Therefore, this set is a basis for Pn(F). ♦
An important result which follows immediately from Example 21 is the following.
Proposition 5. The vector space Pn of polynomials of degree less than or equal to n has dimension
n+ 1.
As a consequence, P is not a finite dimensional space.
Example 22. The vector space P of all polynomials cannot be spanned by a finite set.
Solution. We can use the same argument used for R[R] in Example 13. However, we now use
another approach.
Assume the contrary that some set S containing a finite number of polynomials is a spanning
set for P. Then, since the number of polynomials in S is finite, there must be a highest-degree
polynomial in S. Let N be the degree of this polynomial. Then, no polynomial p with deg(p) > N
is in span (S). Hence, P is not spanned by any finite set of polynomials. ♦
Example 23. Show that the set S = {2 + z,−1 + z2, z − z2} is a basis for P2.
Solution. If p ∈ P2, then p is given by
p(z) = a0 + a1z + a2z
2 for all z ∈ C.
Then p ∈ span (S) if there exist scalars x1, x2, x3 such that
a0 + a1z + a2z
2 = x1(2 + z) + x2(−1 + z2) + x3(z − z2) for all z ∈ C.
On equating coefficients of powers of z, we obtain the system of linear equations with augmented
matrix
(A|b) =

 2 −1 0 a01 0 1 a1
0 1 −1 a2


c©2020 School of Mathematics and Statistics, UNSW Sydney
6.8. [X] FURTHER IMPORTANT EXAMPLES OF VECTOR SPACES 61
On using Gaussian elimination, we obtain the augmented matrix
(U |y) =

 2 −1 0 a00 12 1 −12a0 + a1
0 0 −3 a0 − 2a1 + a2

 .
Then, as there are no zero rows in U , there is a solution for all right hand sides. Thus, for every
polynomial p ∈ P2, we have p ∈ span (S), and hence S is a spanning set for P2. Further, as U has
no non-leading columns, the only solution for a zero right hand side is x1 = x2 = x3 = 0, and hence
S is linearly independent. S is therefore a basis for P2. ♦
The coordinate-vector idea applies immediately to the finite-dimensional vector space Pn.
Example 24. For the standard basis of Pn consisting of powers of z, that is, {1, z, . . . , zn}, the
coordinate vector consists of the coefficients of the polynomial. For example, the polynomial p ∈ Pn
defined by
p(z) = a0 + a1z + · · ·+ anzn
has the coordinate vector
(
a0 a1 · · · an
)T
with respect to the standard basis.
Note. The order is important. For example, the coordinate vector of p with respect to the ordered
basis {z2, z, 1, z3, . . . , zn} would be (a2 a1 a0 a3 · · · an )T . ♦
Example 25. Find the coordinate vector for the polynomial p3(z) = −1+ 5z2 with respect to the
ordered basis {p1, p2} of span (p1, p2), where p1(z) = 1 + 2z + 3z2 and p2 = 1 + z − z2.
Solution. We must find the scalars in the expression for p3 as a linear combination of p1 and p2.
On writing p3 = x1p1 + x2p2 and equating coefficients of equal powers of z, we get the system of
equations with augmented matrix
(A|b) =

 1 1 −12 1 0
3 −1 5

 .
The solution is x1 = 1, x2 = −2. Hence the coordinate vector of p3 with respect to the ordered
basis {p1, p2} is
(
1
−2
)
, i.e.,
p3(z) = −1 + 5z2 = 1p1(z)− 2p2(z) = 1(1 + 2z + 3z2)− 2(1 + z − z2).

Example 26. A polynomial p has a coordinate vector

 2−1
4

 with respect to the ordered basis
{p1 = 1, p2 = 1 + z, p3 = 2− z + z2} of P2. Find p.
Solution.
p = 2p1 − p2 + 4p3 = 9− 5z + 4z2.

c©2020 School of Mathematics and Statistics, UNSW Sydney
62 CHAPTER 6. VECTOR SPACES
6.9 A brief review of set and function notation
6.9.1 Set notation.
A set is any collection of elements. Sets are usually defined either by listing their elements or by
giving a rule for selection of the elements. The elements of a set are usually enclosed in braces {}.
Example 1. S = {1, 4,−7} is the set whose elements are 1,4, and −7. ♦
Common notation for a set defined by a rule is shown in the following example.
Example 2. The notation
S = {x ∈ Rn : x1 > 0, x3 6 4}
is read as: the set S of vectors x in Rn such that x1 is greater than or equal to zero and x3 is less
than or equal to 4. Note that the colon (:) is read as “such that” and the comma (,) is read as
“and”. ♦
Definition 1. Two sets A and B are equal (notation A = B) if every element of
A is an element of B, and if every element of B is an element of A.
To prove that A = B it is necessary to prove that the two conditions:
1. if x ∈ A then x ∈ B, and
2. if x ∈ B then x ∈ A
are both satisfied.
Definition 2. A set A is a subset of another set B (notation A ⊆ B) if every
element of A is also an element of B.
To prove that A ⊆ B it is necessary to prove that the condition:
if x ∈ A then x ∈ B
is satisfied.
Definition 3. A is said to be a proper subset of B if A is a subset of B and at
least one element of B is not an element of A.
To prove that A is a proper subset of B it is necessary to prove that the two conditions:
1. if x ∈ A then x ∈ B, and
2. for some x ∈ B, x is not an element of A
are both satisfied.
c©2020 School of Mathematics and Statistics, UNSW Sydney
6.9. A BRIEF REVIEW OF SET AND FUNCTION NOTATION 63
Definition 4. The intersection of two sets A and B (notation: A∩B) is the set
of elements which are common to both sets.
That is,
A ∩B = {x : x ∈ A and x ∈ B}.
Definition 5. The union of two sets A and B (notation: A ∪ B) is the set of all
elements which are in either or both sets.
That is,
A ∪B = {x : x ∈ A or x ∈ B}.
6.9.2 Function notation
The notation f : X → Y (which is read as “f is a function (or map) from the set X to the set Y ”)
means that f is a rule which associates exactly one element y ∈ Y to each element x ∈ X. The y
associated with x is written as y = f(x) and is called the “value of the function f at x” or “the
image of x under f”. The set X is often called the domain of the function f and the set Y is often
called the codomain of the function f .
Equality of Functions. Two functions f : X → Y and g : X → Y are defined to be equal if and
only if f(x) = g(x) for all x ∈ X.
Addition of Functions. If f : X → Y and g : X → Y , and if elements of Y can be added, then
the sum function f + g is defined by
(f + g)(x) = f(x) + g(x) for all x ∈ X.
Multiplication by a Scalar. If f : X → Y and λ ∈ F, where F is a field, and if elements of Y
can be multiplied by elements of F, then the function λf is defined by
(λf)(x) = λ
(
f(x)
)
for all x ∈ X.
Multiplication of Functions. If f : X → Y and g : X → Y , and if elements of Y can be
multiplied, then the product function fg is defined by
(fg)(x) = f(x)g(x) for all x ∈ X.
Composition of Functions. If g : X → W and f : W → Y , then the composition function
f ◦ g : X → Y is defined by
(f ◦ g)(x) = f(g(x)) for all x ∈ X.
c©2020 School of Mathematics and Statistics, UNSW Sydney
64 CHAPTER 6. VECTOR SPACES
6.10 Vector spaces and MAPLE
Most of the problems in this chapter can be solved using matrix methods, which of course means
that Maple can be very helpful. You might look at your session 1 notes to refresh your memory as
to how Maple handles vectors and matrices. As usual, you should type
with(LinearAlgebra);
in order to load the linear algebra package.
The following commands, for example, can be used to put the vectors v1,v2, and v3 as the
columns in a 3× 3 matrix.
v1:=<1,2,3>;
v2:=<0,-1,2>;
v3:=<3,-1,3>;
A:=;
We could now test whether these three vectors are linearly independent by performing Gaussian
elimination on A.
GaussianElimination(A);
In this particular example, there are no non-leading columns so the vectors are linearly independent
(and hence form a basis for R3).
Most of the other problems concerning problems in Rn can be solved using similar methods.
Actually, the LinearAlgebra package contains many ready-made commands for performing the
standard calculations. For example
Basis({v1,v2,v3});
returns a subset of {v1,v2,v3} which forms a basis for span (v1,v2,v3).
Problems in other vector spaces can often be solved by using coordinate vectors to convert the
problem to one in Rn. Finding the coordinate vector for a vector b ∈ Rn with respect to a new
basis is quite simple. For example, with the ordered basis {v1,v2,v3}, the commands
b:=<1,1,1>;
coordvect:=LinearSolve(A,b);
find the coordinate vector of b. The firstyear package contains commands to find the coordinate
vector of a polynomial in Pn with respect to the standard basis {1, x, x2, . . . , xn} (be careful with
the capital letters).
with(firstyear);
p:=3*x^2-x+4;
v:=polytoVect(p,2);
Vecttopoly(v,x);
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 6 65
Problems for Chapter 6
Questions marked with [R] are routine, [H] harder, [X] extra for MATH1241, [M] Maple. You
should try to solve some of the questions in Sections 6.4 to 6.7 with Maple.
Problems 6.1 : Definitions and examples of vector spaces
1. [R] Show that the set
S =
{
x ∈ R3 : x1 6 0, x2 > 0
}
,
with the usual rules for addition and multiplication by a scalar in R3 is not a vector space
by showing that at least one of the vector space axioms is not satisfied. Give a geometric
interpretation of this result.
2. [R] Show that the system S with the usual rules for addition and multiplication by a scalar
in R3, and where
S =
{
x ∈ R3 : 2x1 + 3x32 − 4x23 = 0
}
,
is not a vector space by showing that at least one of the vector space axioms is not satisfied.
3. [R] Let S =



ab
c

 ∈ R3 : (a− b)c = 0

.
a) Write down two non-zero elements of S.
b) Show that S is not closed under vector addition.
4. [H] The set Cn is a vector space over C (see Example 2 of Section 6.1). Check that axioms 1,
2, 6, 9 are satisfied by this system.
5. [X] Let Mmn(C) be the set of all m × n matrices with complex entries with addition the
usual rule for addition of complex matrices, and multiplication by a scalar the usual rule
for multiplication of a complex matrix by a complex scalar. Prove that the vector space
Mmn(C) satisfies axioms 1, 3, 6 and 10.
6. [H] Prove that the system (Cn,+, ∗,R) with “natural” definitions of + and ∗ is a vector space,
whereas the system (Rn,+, ∗,C) with “natural” definitions of + and ∗ is not a vector
space.
7. [X] Consider the system (R2,+′, ∗′,R) in which the usual operations of “addition” and “mul-
tiplication by a scalar” are replaced by the new definitions:(
a1
a2
)
+′
(
b1
b2
)
=
(
a1 + b1
a2 − 3b2
)
λ ∗′
(
a1
a2
)
=
(
4λa1
λa2
)
.
Give a list of the vector space axioms satisfied by this system, and a list of any which are
not satisfied. Is this system a vector space?
c©2020 School of Mathematics and Statistics, UNSW Sydney
66 CHAPTER 6. VECTOR SPACES
Problems 6.2 : Vector arithmetic
8. [H] Prove that the following properties are true for every vector space.
a) 2v = v + v.
b) nv = v + · · ·+ v, where there are n terms on the right.
9. [H] Prove parts 2, 4 and 5 of Proposition 2 of Section 6.2.
Problems 6.3 : Subspaces
10. [R] Suppose v =

12
3

. Show that the line segment defined by
S =
{
x ∈ R3 : x = αv, for 0 6 α 6 10}
is not a subspace of R3.
11. [R] Show that the set
S =
{
x ∈ R3 : 2x1 + 3x2 − 4x3 = 6
}
,
is not a subspace of R3. Give a geometric interpretation of this result.
12. [R] Let S the set
S =
{
x ∈ R3 : 2x1 + 3x2 − 4x3 = 0
}
,
a) Find three distinct members of S.
b) Show that S is a subspace of R3.
c) Give a geometric interpretation of this latter result.
13. [R] Show that
T =



xy
z

 : −1 6 x+ y + z 6 1


is not a vector subspace of R3.
14. [R] Show that the set
S =
{
x ∈ R3 : 2x1 + 3x2 − 4x3 = 4x1 − 2x2 + 3x3 = 0
}
is a subspace of R3.
15. [R] Show that the set
S =
{
x ∈ R3 : 2x1 + 3x2 − 4x3 = 0 or 4x1 − 2x2 + 3x3 = 0
}
is not a subspace of R3.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 6 67
16. [R] Show that the set
S =
{
b ∈ R2 : b = Ax for some x ∈ R3} ,
where
A =
(
2 −3 1
4 5 −3
)
,
is a subspace of R2. Explain why each column of the matrix belongs to the set S.
17. [R] For each of the following subsets of R3, either prove that the given subset is a subspace of
R3 or explain why it is not a subspace.
a) S =



x10
x3

 ∈ R3 : x1 + 2x3 > 0

.
b) T =
{
4∑
i=1
λivi : λi ∈ R, 1 6 i 6 4
}
, where v1, v2, v3, v4 are given fixed vectors in
R3.
c) U =
{
Ax : x ∈ R5}, where A is a fixed 3× 5 matrix.
18. [H] Suppose that u =

 10
−2

 and v =

12
3

. Show, by the Subspace Theorem that the
set
S =
{
x ∈ R3 : x = λu+ µv, for λ, µ ∈ R}
is a subspace of R3.
19. [H] Prove that the set S in Example 6 on page 15 is closed under multiplication by a scalar.
20. [H] Let a and b be two fixed non-zero vectors in R5. Show that
W =
{
x ∈ R5 : x · a = x · b = 0}
is a subspace of R5.
If a = e1 =


1
0
0
0
0

 and b = e2 =


0
1
0
0
0

, describe W.
21. [R] Show that the set S = {p ∈ P2 : p(0) = 1} is NOT a subspace of P2.
22. [R] Show that the set
S =
{
p ∈ P3 : p′′(x) = 0 for all x ∈ R
}
is a subspace of P3.
c©2020 School of Mathematics and Statistics, UNSW Sydney
68 CHAPTER 6. VECTOR SPACES
23. [H] Is the set
S =
{
p ∈ P3 : p′(x) + x+ 1 = 0 for all x ∈ R
}
a subspace of P3?
24. [H] Consider the set
S =
{
p ∈ P3 : (x+ 1)p′(x)− 3p(x) = 0 for all x ∈ R
}
.
a) Show that S is a subspace of P3, (the set of all real polynomials of degree 6 3).
b) Find a polynomial in S where not all the coefficients are zero.
25. [H] By constructing a counterexample, show that the union of two subspaces is not, in general,
a subspace.
26. [H] Let W1 and W2 be two subspaces of a vector space V over the field F. Prove that the
intersection of W1 and W2 (i.e., the set W1 ∩W2) is a subspace of V .
27. [X] Let V be a vector space over the field F.
a) Let {Wk : 1 6 k 6 m} be m subspaces of V , and let W be the intersection of these
m subspaces. Prove that W is a subspace of V .
b) Let S be any set of vectors in V , and let W be the intersection of all subspaces of V
which contain S (that is, x ∈W if and only if x lies in every subspace which contains
S). Prove that W is the set of finite linear combinations of vectors from S.
Problems 6.4 : Linear combinations and spans
28. [R] Let a =

1011
4

 , v1 =

21
4

 , v2 =

−1−2
1

 and v3 =

 33
−1

 .
a) Is a ∈ span(v1,v2,v3)? If so, express a as a linear combination of v1,v2 and v3.
b) Do the vectors v1,v2,v3 span R
3? If not, find condition(s) on b =

b1b2
b3

 for b to
belong to span(v1,v2,v3) and interpret your answer geometrically.
29. [R] Repeat the preceding question using
a =

 9−2
−4

 , v1 =

02
3

 , v2 =

 5−2
−3

 and v3 =

 15−4
−6

.
30. [R] Repeat using a =


1
1
−9
1

 , v1 =


1
3
0
5

 , v2 =


2
2
1
2

 , v3 =


−1
0
4
1

 and b =


b1
b2
b3
b4

.
[Replace R3 by R4, of course.]
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 6 69
31. [R] Is the vector b =

−2−6
−4

 ∈ span (v1,v2,v3,v4), where
v1 =

13
0

, v2 =

22
1

, v3 =

−10
−1

, v4 =

 1−2
1

?
32. [R] Is the set of vectors v1 =

12
3

, v2 =

 11
−1

, v3 =

−10
5

 a spanning set for R3?
33. [R] Does v belong to the column space of A, col(A), where
v =


2
−5
19
−13

 and A =


1 3 −1
0 1 2
2 −3 −5
1 2 7

?
If so, write v as a linear combination of the columns of A.
34. [R] Is the polynomial p(x) = 1 + x+ x2 in span
(
1− x+ 2x2, −1 + x2, −2− x+ 5x2)?
35. [R] Is S =
{
1 + x, 1− x2, x+ 2x2} a spanning set for P2?
36. [X] Prove Proposition 1 of Section 6.4.
37. [X] Use the vector space axioms to prove that we do not need to use brackets when writing
down the linear combination
n∑
k=1
λkvk = λ1v1 + · · ·+ λnvn.
That is, prove that the result of the operations in independent of the order in which the
additions are performed.
Problems 6.5 : Linear independence
38. [R] Is the set of vectors v1 =

12
3

, v2 =

 11
−1

, v3 =

−10
5

 linearly independent? Are
these three vectors coplanar?
39. [R] Is the set S = {v1,v2,v3}, where v1 =

 11
−1

, v2 =

 2−1
0

, v3 =

 5−4
3

, a linearly
independent set? Are these three vectors coplanar?
40. [R] Can a set of linearly independent vectors contain a zero vector? Explain your answer.
c©2020 School of Mathematics and Statistics, UNSW Sydney
70 CHAPTER 6. VECTOR SPACES
41. [R] Given the set S = {v1,v2,v3}, where v1 =

 1−3
−2

, v2 =

32
1

, v3 =

 4−1
−1

, do the
following.
a) Show that S is a linearly dependent set.
b) Show that at least one of the vectors in S can be written as a linear combination of
the others, and find the corresponding linear combination.
c) Find all possible ways of writing the vector

89
5

 as a linear combination of the
vectors in the set.
d) Find a linearly independent subset of S with the same span as S, and then show that
89
5

 can be written as a unique linear combination of this subset.
e) Give a geometric interpretation of span (S).
42. [R] Repeat the previous question for the set of four vectors S = {v1,v2,v3,v4}, where
v1 =

13
0

, v2 =

22
1

, v3 =

−10
−1

, v4 =

 1−2
1

.
43. [R] Is
{
1− x+ 2x2, −1 + x2, −2− x+ 5x2} a linearly independent subset of P2? If the set
is not linearly independent express one of the polynomials as a linear combination of the
others.
44. [H] (For discussion). Let the set S = {v1, . . . ,vn} be a linearly independent subset of a vector
space V . You are standing at the origin of V and set off in the direction of v1. After a
certain length of time, you turn and head in direction v2 — then in direction v3 and so
on. Is it possible for you to return to the origin? (Note: You may walk any distance that
you like along any of the directions, but you are not allowed to retrace your steps).
45. [H] What would happen in the previous question if the set S were a linearly dependent set?
46. [H] Assume that m 6 n and that S = {v1, . . . ,vm} is a set of mutually orthogonal, non-zero
vectors in Rn, that is, the dot products satisfy (see Section 5.3.1)
vi · vj = 0 for i 6= j; 1 6 i, j 6 m
vi · vi 6= 0 for 1 6 i 6 m.
Show that S is a linearly independent set.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 6 71
Problems 6.6 : Basis and dimension
47. [R] Is the set S = {v1,v2,v3}, where v1 =

 11
−1

, v2 =

 2−1
0

, v3 =

 5−4
3

, a basis for
R3?
48. [R] Find a basis for, and the dimension of, W = span (v1,v2,v3), where
v1 =

12
3

, v2 =

 11
−1

, v3 =

−10
5

.
49. [R] Without doing any calculation, explain why




1
3
0
5

 ,


2
2
1
3

 ,


−1
0
4
0



 is not a spanning
set for R4.
Similarly, without doing any calculation, explain why



13
0

 ,

22
1

 ,

−10
1

 ,

 1−3
1




is a linearly dependent set.
50. [R] Which of the following statements are true and which are false? Explain your answer.
a) Any set of 6 vectors in R5 is linearly dependent.
b) Some sets of 6 vectors in R5 are linearly independent.
c) Any set of 6 vectors in R5 is a spanning set for R5.
d) Some sets of 6 vectors in R5 span R5.
e) Same as in (a) – (d), with 6 replaced by 4.
f) Any set of 5 vectors in R5 is a basis for R5.
g) Some sets of 5 vectors in R5 are bases for R5.
h) Any set of vectors which spans R5 is linearly independent.
i) Any set of 5 vectors which spans R5 is linearly independent.
j) Any 5 linearly independent vectors in R5 form a basis for R5.
51. [R] Let V be a finite dimensional real vector space, and let S = {v1, . . . ,vn} be a finite set
of vectors in V . Suppose also that the dimension of V is ℓ. State, with brief reasons, the
relationship, if any, between n and ℓ if
a) S is linearly independent.
b) S is linearly dependent.
c) S spans V .
d) S is a basis for V .
c©2020 School of Mathematics and Statistics, UNSW Sydney
72 CHAPTER 6. VECTOR SPACES
52. [H] Explain why it is impossible to have a set of m mutually orthogonal, non-zero vectors in
Rn with m > n.
53. [R] Consider the plane P in R3 whose equation is
x+ y + z = 0.
a) Prove that P is a subspace of R3.
b) Find a basis for P . Give reasons for your answer.
54. [R] Find a basis for, and the dimension of, the column space of the matrix
A =


1 1 −1 −2 1
0 0 1 4 −1
0 0 0 2 2
0 0 0 0 0

 .
55. [R] Find a basis for, and the dimension of, col(A), where
A =


1 1 0 2 1
0 0 −1 −2 2
−1 −1 1 4 −1
1 1 0 4 2

 .
56. [R] Show that the columns of the matrix A given below are not a spanning set for R4. Then
find a basis for R4 which contains as many of the columns of A as possible.
A =


1 3 3 −7 5
2 6 5 −8 1
3 9 5 −3 −2
4 12 5 2 −5

 .
57. [R] Consider the set T = {v1, v2, v3, v4, x} where
v1 =


1
2
−1
0

 , v2 =


3
6
−3
0

 , v3 =


2
1
−1
4

 , v4 =


−1
−5
2
4

 , x =


6
−3
−1
20

 .
a) Find a basis B for span (v1, v2, v3, v4, x).
b) Explain why x belongs to span (v1, v2, v3, v4). Write x as a linear combination of
B.
c) Suppose the matrix A has columns v1, v2, v3, v4. What is the dimension of the
column space of A?
58. [R] Show that the set
{
1− x2 + x3, x+ 2x2, 2 + x− x2 + 2x3, 2x− x2 + x3} forms a basis
for P3.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 6 73
59. [H] Consider the set of polynomials S = {p1, p2, p3, p4} in P2, where
p1(z) = 1 + z − z2, p2(z) = 2− z, p3(z) = 5− 4z + z2, p4(z) = z2.
Show that S is a linearly dependent spanning set for P2, and then find a subset of S which
is a basis for P2.
60. [H] You are given that V is a vector space and that S = {v1,v2,v3} is a subset of V . Suppose
that w ∈ span(S). Prove that the set
{v1,v2,v3,w}
is a linearly dependent set.
61. [X] Prove that the only subspaces of R4 are (i) {0}, (ii) lines through the origin, (iii) planes
through the origin, (iv) subspaces of the form span (S), where S is any set of three linearly
independent vectors in R4, and (v) R4 itself.
Problems 6.7 : [X] Coordinate vectors
62. [X] Show that the columns of the matrix
A =


1 2 −1 1
3 2 0 −2
0 1 −1 1
5 3 0 −1


are a basis for R4. Then find the coordinate vector of v =


−2
−6
−4
−2

 with respect to the
ordered basis given by the columns of A.
63. [X] A vector v ∈ R4 has the coordinate vector


1
6
−1
4

 with respect to the ordered basis formed
by the columns of the matrix A of the previous question. Find v.
64. [X] Find the vector v that has coordinate vector

 2−1
1

 with respect to the ordered basis



 12
−2

 ,

 37
−5

 ,

24
9



 of R3.
65. [X] Find the coordinates of the following vectors with respect to the given ordered bases.
c©2020 School of Mathematics and Statistics, UNSW Sydney
74 CHAPTER 6. VECTOR SPACES
a)

12
3

 with respect to



10
1

 ,

11
1

 ,

−20
−1



.
b)

a1a2
a3

 with respect to



10
1

 ,

11
1

 ,

−20
−1



.
66. [X] With respect to the basis B =



 1−1
2

 ,

34
6

 ,

−23
−3



 of R3,
a) find the vector v with coordinate vector [v]B =

 31
−3

;
b) find the coordinate vector of w =

 7−3
11

.
67. [X] Consider the set S = {v1,v2,v3}, where v1 =


1√
2
− 1√
2
0

, v2 =


1√
3
1√
3
1√
3

, v3 =

−
1√
6
− 1√
6
2√
6

.
Without solving systems of linear equations, do the following.
a) Show that S is an orthonormal set of vectors in R3.
b) Show that S is a basis for R3.
c) Find the coordinate vector of

−13
4

 with respect to the ordered basis S.
HINT. See Example 6 of Section 6.6.
68. [X] Let S = {u1, . . . ,un} be an orthonormal set of n vectors in Rn. Prove that S is a basis
for Rn, and then show that the coordinate vector for any v ∈ Rn is given by
[v]S =

x1...
xn

 , where xj = uj · v.
Problems 6.8 : [X] Further important examples of vector spaces
69. [X] Let M22 be the vector space of all 2 × 2 matrices with real entries (see Example 3 of
Section 6.1). Let S be the set
S = {A ∈M22 : a11 + a22 = 5} .
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 6 75
a) Find three matrices in S.
b) Is S a subspace of M22? Give a reason.
70. [X] Let T be the set
T = {A ∈M22 : a11 + a22 = 0} .
a) Find three matrices in T .
b) Is T a subspace of M22? Give a reason.
71. [X] Show that the four matrices(
1 0
0 0
)
,
(
0 1
0 0
)
,
(
0 0
1 0
)
,
(
0 0
0 1
)
.
form a basis for M22(R). This set is called the standard basis for M22(R).
72. [X] Show that the four matrices of the previous question also form a basis for the vector space
M22(C) of all 2× 2 matrices with complex entries.
HINT: Can you see why your proof of the previous question will also be valid for complex
numbers?
73. [X] Show that the set of four matrices(
1 0
0 1
)
,
(
0 1
1 0
)
,
(
0 −i
i 0
)
,
(
1 0
0 −1
)
.
form a basis for M22(C). These matrices are called the Pauli spin matrices, and they are
important in quantum physics and chemistry.
74. [X] Find the coordinates of the following vectors with respect to the given ordered bases.
a) The matrix
A =
(
a11 a12
a21 a22
)
with respect to the standard basis for M22 given in question 71. Note that the results
are the same for both real and complex numbers
b) Repeat part (b) for the basis of Pauli spin matrices given in question 73. In this case
the entries of A should be regarded as complex numbers.
75. [X] Let R =
{(
1 0
−2 0
)
,
(
0 1
3 0
)
,
(
0 0
5 1
)}
.
a) Express
( −4 2
−1 −3
)
as a linear combination of elements of R.
b) Does R span M22(R), the space of all 2 × 2 matrices? Give a brief reason for your
answer.
c©2020 School of Mathematics and Statistics, UNSW Sydney
76 CHAPTER 6. VECTOR SPACES
76. [X] Complete the proof of Proposition 2 of Section 6.8 that the system (R[X],+, ∗,R) is a
vector space.
77. [X] Let C[X] be the set of all complex-valued functions with domain X. Show that the system
(C[X],+, ∗,C), where + and ∗ are the usual rules for addition and multiplication by a
scalar of functions, satisfies vector space axioms 2, 4, 7 and 9. This system is a vector
space over the complex field C.
78. [X] Show that the set
S =
{
y ∈ R[R] : d
2y
dx2
+ 3
dy
dx
+ 4y = 0
}
is a subspace of the vector space R[R] of all real-valued functions with domain R.
79. [X] Is the set
S =
{
y ∈ R[R] : d
2y
dx2
+ 3
dy
dx
+ 4y = 5
}
a subspace of R[R]? Prove your answer.
80. [X] Let C(k)[R] be the set of all real-valued functions with domain R for which the first k
derivatives exist and are continuous. Prove that C(k)[R] is a subspace of the vector space
R[R] of all real-valued functions with domain R.
81. [X] Show that the vector spaces C(k)[R] defined in the previous question have the property
that, if m > n, then C(m)[R] is a subspace of C(n)[R].
82. [X] Let S be the subset of R [[−π, π]] defined by
S =
{
f ∈ R [[−π, π]] :
∫ pi
−pi
cos(x+ t)f(t)dt = 0 for all x ∈ [−π, π]
}
.
Prove that S is a subspace of the vector space R [[−π, π]].
83. [X] This question generalises the results of question 46 to real-valued functions.
Let S = {f1, . . . , fn} be a set of real-valued functions defined on an interval [a, b] with the
properties that
∫ b
a
fi(x)fj(x)dx = 0 for i 6= j; 1 6 i, j 6 n
∫ b
a
f2i (x)dx 6= 0 for 1 6 i 6 n.
Prove that S is a linearly independent set.
Note. A set of functions with these properties is said to be mutually orthogonal on the
interval [a, b].
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 6 77
84. [X] Show that the set
S =
{
p ∈ Pn(R) : 5p′(6) + 3p(6) = 0
}
, where p′(x) =
dp
dx
,
is a subspace of the vector space Pn(R) of all real polynomials of degree less than or equal
to n.
85. [X] Is the set
S =
{
p ∈ Pn(R) : 5p′(6) + 3p(6) = 8
}
, where p′(x) =
dp
dx
,
a subspace of Pn(R)? Prove your answer.
86. [X] Let P be the set of all polynomials over the complex-number field C. Show that P is a
subspace of the vector space C[C] of all complex-valued functions with domain C.
87. [X] Is the polynomial p ∈ span (p1, p2, p3), where the polynomials are defined by
p(z) = −6+2z+30z2, p1(z) = 1+2z+3z2, p2(z) = −4−z+9z2, p3(z) = −5−z+12z2.
88. [X] Find conditions on the coefficients of the polynomial p ∈ P2 for p to be a linear combination
of the three polynomials p1,p2, p3, where the polynomials are given by
p1(z) = 2z + 3z
2, p2(z) = 5− 2z − 3z2, p3(z) = 15− 4z − 6z2.
89. [X] Are the polynomials p1, p2, p3 in the previous two questions spanning sets for P2?
90. [X] Is the set of polynomials S = {p1, p2, p3} in P2, where
p1(z) = 1 + z − z2, p2(z) = 2− z, p3(z) = 5− 4z + z2,
a linearly independent set? If not, express one of the polynomials as a linear combination
of the others.
91. [X] Show that
p1(z) = −2 + 5z − 4z2 + 15z3 − 5z4 + z5,
p2(z) = 3z + 4z
2 − 3z3 + 6z5,
p3(z) = 2 + 3z
2 − 4z3 + 10z4 − 5z5,
p4(z) = 3 + 14z
2 − 5z3 + 6z4 − 3z5,
p5(z) = 3 + 8z + 17z
2 + 3z3 + 11z4 − z5,
p6(z) = −3 + 11z − 7z2 + 10z3 − z4 + 11z5,
are not a spanning set for P5, and then construct a basis for P5 containing as many of the
given polynomials as possible.
HINT. Check using Maple.
c©2020 School of Mathematics and Statistics, UNSW Sydney
78 CHAPTER 6. VECTOR SPACES
92. [X] Find the coordinate vector for p(x) = 1 + 2x + x2 with respect to the ordered basis{
1 + x, 1− x2, x+ 2x2} of P2.
93. [X] Find the coordinate vector of 1+ 2z+3z2 with respect to the ordered basis of P2 given by{
1
8
z(z − 2), 1 − 1
4
z2,
1
8
z(z + 2)
}
.
Note. This question and the one that follows do not require Gaussian Elimination.
94. [X] Find the coordinate vector of a0+a1z+a2z
2 with respect to the ordered basis of P2 given
by {
1
2
z(z − 1), 1 − z2, 1
2
z(z + 1)
}
.
95. [X] Let S = {p1, . . . , pn} be a set of n polynomials in Pn−1(R) with the property that∫ b
a
pi(x)pj(x)dx =
{
0 for i 6= j
1 for i = j
for 1 6 i, j 6 n.
A set of polynomials with this property is called an orthonormal set of polynomials on the
interval [a, b].
Prove that S is a basis for Pn−1(R), and then show that the coordinate vector for any
p ∈ Pn−1(R) is given by
[p]S =

x1...
xn

 , where xi = ∫ b
a
pi(x)p(x)dx.
c©2020 School of Mathematics and Statistics, UNSW Sydney
79
Chapter 7
LINEAR TRANSFORMATIONS
“But I don’t need a Sillygism, you know,
to prove that mathematical axiom you mentioned.”
“Nor to prove that ‘all angles are equal’, I suppose?”
“Why, of course not! One takes such a simple truth as that for granted!”
Lewis Carroll, Sylvie and Bruno.
The purpose of this chapter is to give an introduction to an extremely important class of
functions called “linear transformations” or “linear maps”. Mathematical examples of linear trans-
formations include geometric transformations such as stretching, reflection and rotation, algebraic
operations such as matrix multiplication, and calculus operations such as differentiation and in-
tegration. Linear transformations are also widely used in many applications of mathematics, and
objects which are often modelled (either exactly or approximately) by linear transformations are
related to radio and TV sets, amplifiers and hi-fi equipment, atomic spectra, molecular vibrations,
sound waves, ocean waves, oil refineries, chemical plants, profit of a company, inventory of a factory
or shop, and the state of an economy.
7.1 Introduction to linear maps
Before reading this chapter you should quickly read the brief review of function notation given in
Appendix 6.9.
As stated in Appendix 6.9, a function f with domain X and codomain Y (notation f : X → Y )
is a rule which associates exactly one element y = f(x) of Y to each element x ∈ X. Note that an
element x in the domain X is called an “argument” of the function and the corresponding element
y = f(x) in the codomain Y is usually called either “the function value of x” or the “image of x
under f”.
Linear maps are an important special class of functions, in which both the domain and the
codomain are vector spaces (that is, all arguments and values of the functions are vectors), and in
which the two vector-space operations of addition and scalar multiplication are “preserved” by the
function in the sense that:
Addition Condition. The function value of a sum of the two vectors is equal to the sum of the
function values of the vectors.
Scalar Multiplication Condition. The function value of a scalar multiple of a vector is equal
to the scalar multiple of the function value of the vector.
c©2020 School of Mathematics and Statistics, UNSW Sydney
80 CHAPTER 7. LINEAR TRANSFORMATIONS
A more formal mathematical definition of a linear map is as follows.
Definition 1. Let V andW be two vector spaces over the same field F. A function
T : V →W is called a linear map or linear transformation if the following two
conditions are satisfied.
Addition Condition. T (v + v′) = T (v) + T (v′) for all v,v′ ∈ V , and
Scalar Multiplication Condition. T (λv) = λT (v) for all λ ∈ F and v ∈ V .
The domain and codomain of a linear map can be any vector spaces. In Section 7.5, we shall
be concentrate specifically on linear maps from Rn to Rm. Unless otherwise stated, the following
propositions and theorems are true for all linear maps.
The adjective “linear” in “linear map” suggests that the idea of a linear map arose from the
geometric idea of a line. The connection between the equation of a line and a linear map is shown
in Figure 1 (a) and (b) and in Example 1 below.
Example 1. Show that the function T : R→ R defined by
T (x) = a0 + a1x for x ∈ R,
where a0, a1 ∈ R are constants, is a linear map if and only if a0 = 0.
Solution. We check the conditions of the definition of a linear map. Firstly, the domain R and
codomain R are both vector spaces as R is a vector space. Further, we have,
for x, x′ ∈ R,
T (x+ x′) = a0 + a1(x+ x′),
whereas,
T (x) + T (x′) = (a0 + a1x) + (a0 + a1x′) = 2a0 + a1(x+ x′).
Thus, the addition condition is satisfied if and only if a0 = 2a0, that is, if and only if a0 = 0.
For a0 = 0, we check the scalar multiplication condition, and obtain
T (λx) = a1(λx) = λ(a1x) = λT (x),
as required.
Thus, the conditions for T to be a linear map are satisfied if and only if a0 = 0. ♦
Note.
1. The equation y = T (x) = a0 + a1x is the equation of a line in R
2. Example 1 shows that the
equation of a line defines a linear map if and only if the line goes through the origin.
2. The function T (x) = a0 + a1x is a polynomial of degree 1 and is often called a linear poly-
nomial. Example 1 shows that a “linear polynomial” is a “linear map” if and only if the
constant term in the polynomial is zero.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.1. INTRODUCTION TO LINEAR MAPS 81
x
T (x)
0
x
T (x)
0
Figure 1(a). A linear map. Figure 1(b). A linear polynomial which
is NOT a linear map.
Example 2. Show that the function T : R3 → R2 defined by
T (x) =
(−5x2 + 4x3
x1 + 2x3
)
for x =

x1x2
x3

 ∈ R3,
is a linear map.
Solution. The domain R3 and codomain R2 are both vector spaces. We next check the addition
and scalar multiplication conditions.
Addition condition. For x,x′ ∈ R3, we have x+ x′ =

x1 + x′1x2 + x′2
x3 + x

3

 ∈ R3, and hence
T (x+ x′) =
(−5(x2 + x′2) + 4(x3 + x′3)
(x1 + x

1) + 2(x3 + x

3)
)
=
(−5x2 + 4x3
x1 + 2x3
)
+
(−5x′2 + 4x′3
x′1 + 2x

3
)
= T (x) + T (x′).
Thus the addition condition is satisfied.
Scalar multiplication condition. For x ∈ R3 and λ ∈ R, we have λx =

λx1λx2
λx3

 ∈ R3, and
hence
T (λx) =
(−5(λx2) + 4(λx3)
λx1 + 2(λx3)
)
= λ
(−5x2 + 4x3
x1 + 2x3
)
= λT (x).
Thus, the scalar multiplication condition is also satisfied, and therefore T is a linear map. ♦
We shall now summarise some useful properties that are true for all linear maps. In all of the
following propositions and theorems the domain V and the codomain W are assumed to be vector
spaces over the same field F.
Here is a useful proposition regarding linear maps.
c©2020 School of Mathematics and Statistics, UNSW Sydney
82 CHAPTER 7. LINEAR TRANSFORMATIONS
Proposition 1. If T : V →W is a linear map, then
1. T (0) = 0 and
2. T (−v) = −T (v) for all v ∈ V .
An informal way of stating these results is that a linear map always:
1. transforms the zero vector in the domain into the zero vector in the codomain, and
2. transforms the negative of a vector v in the domain into the negative of the corresponding
function value T (v) in the codomain.
Proof. (1). Since V is a vector space, we have from Proposition 2 of Section 6.2 that 0v = 0 for
all v ∈ V . Thus,
T (0) = T (0v) = 0T (v) = 0,
where we have first used the scalar multiplication condition of a linear map and then applied
Proposition 2 of Section 6.2 to the vector T (v) ∈W .
(2). Since V is a vector space, we have from Proposition 2 of Section 6.2 that −v = (−1)v for all
v ∈ V . Hence,
T (−v) = T ((−1)v) = (−1)T (v) = −T (v),
where we have again used the scalar multiplication condition of a linear map, and then Proposition 2
of Section 6.2 applied to the vector T (v) ∈W .
Proposition 1 may often be used to provide a quick proof that some given function is not linear.
Example 3. Show that the function T : R2 → R defined by
T
(
x1
x2
)
= 4x1 + 3(x2 − 6)
is not linear.
Solution. T
(
0
0
)
= −18 6= 0, and hence T is not linear. ♦
Example 4. Show that the function T : R→ R defined by
T (x) = x2
is not linear.
Solution. T (3) = 9, but T (6) = 36 6= 2× 9. Hence T is not linear. ♦
To prove that a given map is not linear, it is easiest to provide a specific example that contravenes
one of the conditions. One should also check first that the given map, takes the zero vector to the
zero vector.
WARNING. The converses of the two results in Proposition 1 are not true in general as shown
in the following example.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.1. INTRODUCTION TO LINEAR MAPS 83
Example 5. The function T (x) = x3 satisfies
both T (−x) = −T (x) and T (0) = 0. However,
it is not a linear map by the following counterex-
ample. For x = 1,
T (2× 1) = 8 6= 2(1)3 = 2T (1).

x
T (x)
0
Figure 2.
The two conditions in the definition of a linear map are closely related to the two fundamental
vector operations of addition and scalar multiplication. We therefore expect there to be a close
relationship between linear combinations and linear maps. This relationship is given in Theorems 2
and 3 below.
Theorem 2. A function T : V →W is a linear map if and only if for all λ1, λ2 ∈ F and v1,v2 ∈ V
T (λ1v1 + λ2v2) = λ1T (v1) + λ2T (v2). (#)
Proof. Let T be a linear function. Then,
T (λ1v1 + λ2v2) = T (λ1v1) + T (λ2v2) (from the addition condition)
= λ1T (v1) + λ2T (v2) (using the scalar multiplication condition twice),
and hence (#) is satisfied.
Conversely, suppose (#) is satisfied. Then, for λ1 = λ2 = 1, the condition the (#) becomes the
addition condition, while for λ2 = 0 the condition reduces to the scalar multiplication condition.
The proof is complete.
Theorem 2 can be used to simplify the test for linearity, since it means that only one condition
must be checked instead of the two separate conditions of the original definition.
Example 6. Show that the function T : R2 → R3, defined by
T (x) =

 3x1 − x24x2
5x1 + 6x2

 for x = (x1
x2
)
∈ R2,
is a linear map.
Solution. For x,x′ ∈ R2 and λ, λ′ ∈ R, we have
λx+ λ′x′ =
(
λx1 + λ
′x′1
λx2 + λ
′x′2
)
∈ R2,
c©2020 School of Mathematics and Statistics, UNSW Sydney
84 CHAPTER 7. LINEAR TRANSFORMATIONS
and hence
T (λx+ λ′x′) =

 3(λx1 + λ′x′1)− (λx2 + λ′x′2)4(λx2 + λ′x′2)
5(λx1 + λ
′x′1) + 6(λx2 + λ
′x′2)


= λ

 3x1 − x24x2
5x1 + 6x2

+ λ′

 3x′1 − x′24x′2
5x′1 + 6x

2

 = λT (x) + λ′T (x′).
Thus, from Theorem 2, T is a linear map. ♦
A generalisation of Theorem 2 is also of considerable importance in the theory and applications
of linear maps.
Theorem 3. If T is a linear map with domain V and S is a set of vectors in V , then the function
value of a linear combination of S is equal to the corresponding linear combination of the function
values of S, that is, if S = {v1, . . . ,vn} and λ1,. . .,λn are scalars, then
T (λ1v1 + · · ·+ λnvn) = λ1T (v1) + · · · + λnT (vn).
Proof. This, left as an exercise (see question 5), is based on an easy inductive argument.
Theorem 3 has many uses. Some examples of its use are as follows.
Example 7. Let T : R3 → R2 be a linear map with values
T

10
0

 = (3
7
)
T

01
0

 = (−5
6
)
, T

00
1

 = (−2
8
)
.
Find the function value at x =

x1x2
x3

.
Solution. We have
x =

x1x2
x3

 = x1

10
0

+ x2

01
0

+ x3

00
1

 .
From Theorem 3, the function value at x is
T

x1x2
x3

 = x1T

10
0

+ x2T

01
0

+ x3T

00
1


= x1
(
3
7
)
+ x2
(−5
6
)
+ x3
(−2
8
)
=
(
3x1 − 5x2 − 2x3
7x1 + 6x2 + 8x3
)
.

c©2020 School of Mathematics and Statistics, UNSW Sydney
7.2. LINEAR MAPS FROM Rn TO Rm AND m× n MATRICES 85
Example 8. Show that the function T : R3 → R2 with function values, T

10
0

 = (3
7
)
, T

01
0

 =
(−5
6
)
, T

00
1

 = (−2
8
)
, and T

11
1

 = (−4
20
)
, is not a linear map.
Solution. We have 
11
1

 =

10
0

+

01
0

+

00
1

 .
Hence, if T is a linear map, we have from Theorem 3 that
T

11
1

 = T

10
0

+ T

01
0

+ T

00
1


=
(
3
7
)
+
(−5
6
)
+
(−2
8
)
=
(−4
21
)
.
But, T

11
1

 = (−4
20
)
6=
(−4
21
)
, and hence T is not a linear map. ♦
Examples 7 and 8 are actually special cases of the following extremely important result.
Theorem 4. For a linear map T : V →W , the function values for every vector in the domain are
known if and only if the function values for a basis of the domain are known.
Further, if B = {v1, . . . ,vn} is a basis for the domain V then for all v ∈ V we have
T (v) = x1T (v1) + · · ·+ xnT (vn),
where x1, . . . , xn are the scalars in the unique linear combination v = x1v1+ · · ·+xnvn of the basis
B.
Proof. It follows from Theorem 3 that
T (v) = x1T (v1) + · · ·+ xnT (vn).
The theorem follows immediately.
7.2 Linear maps from Rn to Rm and m× n matrices
If you look at the examples of functions with domain Rn and codomain Rm given in the previous
section, you will see that if T : Rn −→ Rm and x =

x1...
xn

 ∈ Rn, then we can write T (x) as Ax
where A is an m× n matrix. In this section we are going to show that every matrix A represents
a linear map and conversely that every linear map with domain Rn and codomain Rm can be
c©2020 School of Mathematics and Statistics, UNSW Sydney
86 CHAPTER 7. LINEAR TRANSFORMATIONS
represented by a matrix. Because of the close relation between T and the corresponding A and we
prefer to write Ax for T (x) instead of xA, the vector x must be a column vector.
We begin with the following theorem.
Theorem 1. For each m× n matrix A, the function TA : Rn → Rm, defined by
TA(x) = Ax for x ∈ Rn,
is a linear map.
Proof. We check the addition and scalar multiplication conditions, using the properties
A(x+ x′) = Ax+Ax′ and A(λx) = λAx from Chapter 5.
Addition Condition. For all x,x′ ∈ Rn, we have
TA(x+ x
′) = A(x+ x′) = Ax+Ax′ = TA(x) + TA(x′).
Scalar Multiplication Condition. For all λ ∈ R and x ∈ Rn, we have
TA(λx) = A(λx) = λ(Ax) = λTA(x).
Thus, since both the addition and scalar multiplication conditions are satisfied,
TA(x) = Ax is a linear map.
The matrix equation Ax = y therefore has the interpretation that y = TA(x) = Ax is the
function value of TA at the point x, or, for linear equations, the vector y may be regarded as the
function value of the vector x.
Example 1. Find a linear map TA such that TA(x) = Ax for the matrix
A =

 3 4−1 0
−5 6

 .
Solution. Since A has 3 rows and 2 columns, the domain is R2, the codomain is R3, and the map
TA : R
2 → R3 is given by
TA
(
x1
x2
)
= Ax =

 3x1 + 4x2−x1
−5x1 + 6x2

 .

Theorem 1 and Example 1 show that a matrix can be used to define a linear map. We shall
now show that every linear map with domain Rn and codomain Rm can be represented by an m×n
matrix with real entries. The basic theorem which establishes this result is the following.
Theorem 2 (Matrix Representation Theorem). Let T : Rn → Rm be a linear map and let the
vectors ej for 1 6 j 6 n be the standard basis vectors for R
n. Then the m × n matrix A whose
columns are given by
aj = T (ej) for 1 6 j 6 n
has the property that
T (x) = Ax for all x ∈ Rn.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.2. LINEAR MAPS FROM Rn TO Rm AND m× n MATRICES 87
Proof. Every vector x ∈ Rn can be written as a unique linear combination of the standard basis
vectors, that is,
x =

x1...
xn

 = x1e1 + · · ·+ xnen.
Then, from Theorem 3 of Section 7.1, we have
T (x) = T (x1e1 + · · ·+ xnen)
= x1T (e1) + · · ·+ xnT (en)
= x1a1 + · · · + xnan,
where aj = T (ej). Now aj ∈ Rm for 1 6 j 6 n, and hence from Proposition 3 of Section 6.4 the
linear combination can be rewritten in the matrix form Ax, where A is the matrix with the aj as
its columns. Thus, T (x) = Ax and the proof is complete.
The Representation Theorem can be used to construct a matrix for any given linear map with
domain Rn and codomain Rm, or more generally from Fn to Fm for any field F.
Example 2. Find a matrix A such that T (x) = Ax for the linear map T : R3 → R2 defined by
T

x1x2
x3

 = (3x1 − 5x2 + 6x3
5x2 + 31x3
)
.
[Notice that we are using columns.]
Solution. The first column of the matrix A is the vector given by
T (e1) = T

10
0

 = (3
0
)
,
the second column is given by
T (e2) = T

01
0

 = (−5
5
)
,
and the third column is given by
T (e3) = T

00
1

 = ( 6
31
)
.
Thus, the matrix A is
A =
(
3 −5 6
0 5 31
)
.

An alternative method, which is often simpler, of writing down the matrix for a given linear
function is shown in the following example.
c©2020 School of Mathematics and Statistics, UNSW Sydney
88 CHAPTER 7. LINEAR TRANSFORMATIONS
Example 3. Find a matrix A such that T (x) = Ax for the linear map T : R4 → R3 defined by
TA


x1
x2
x3
x4

 =

2x1 − 3x2 + 4x3 − 5x4−2x1 + 3x4
x1 − 5x2 + 6x3 − 8x4

 .
Solution. As usual for a linear map with domain Rn and codomain Rm (here n = 4 and m = 3),
the components of the function value look like the left hand side of a system of linear equations.
This system of equations is given by
T


x1
x2
x3
x4

 =

 2x1 − 3x2 + 4x3 − 5x4−2x1 + 3x4
x1 − 5x2 + 6x3 − 8x4

 =

 2 −3 4 −5−2 0 0 3
1 −5 6 −8




x1
x2
x3
x4


Then the coefficient matrix, namely
A =

 2 −3 4 −5−2 0 0 3
1 −5 6 −8

 ,
has the required property that T (x) = Ax for all x ∈ R4. ♦
In this section, we have shown that a matrix always defines a linear map and that a linear map
between the vector spaces Rn and Rm can always be represented by a matrix.
This result can easily be generalised to linear maps between any two finite–dimensional vector
spaces. This theorem is of fundamental importance in both the mathematical theory of linear maps
and in applying the ideas of linear maps to practical problems. See Section 7.6.
7.3 Geometric examples of linear transformations
In this section we shall examine some of the geometric mappings which can be represented by linear
maps and matrices. These mappings include stretching and compression, reflections, rotations,
projections, and the dot and cross products with a fixed vector.
We shall begin by looking at geometric interpretations which can be given to simple types of
matrices.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.3. GEOMETRIC EXAMPLES OF LINEAR TRANSFORMATIONS 89
Example 1 (Reflection in R2). The simplest examples
of reflections in R2 are reflections in one of the coordinate
axes. An example of a reflection in the x1-axis is shown in
Figure 3. Note that the reflection of the point with the
position vector x =
(
x1
x2
)
is the point represented by
the position vector x′ =
(
x1
−x2
)
. This reflection can be
represented by the 2× 2 diagonal matrix with a negative
diagonal entry given by
A =
(
1 0
0 −1
)
,
since
Ax =
(
1 0
0 −1
)(
x1
x2
)
=
(
x1
−x2
)
= x′.
Note that we know that this reflection is a linear map
since we have found a matrix that describes the effect of
the reflection. ♦
x1
x2
b
b
x =
(
x1
x2
)
x′ =
(
x1
−x2
)
0
Figure 3: A reflection in the
x1-axis.
Note that a linear transformation from Rn to Rm will map the position vector of a point in an
n-dimensional space to the position vector of a point in an m-dimensional space. The following
proposition tells us a linear transformation will map a line to a line or a point.
Proposition 1. Suppose that T : Rn −→ Rm is a linear map. It maps a line in Rn to either a
line or a point in Rm.
Proof. In Chapter 1, a line in Rn through a point represented by a parallel to v 6= 0 is defined to
be the set
{x ∈ Rn : x = a+ λv for some λ ∈ R} .
By Theorem 2 in Section 7.1, T (a+ λv) = T (a) + λT (v). Hence T maps the line to the following
subset of Rm.
{y ∈ Rm : y = T (a) + λT (v) for some λ ∈ R} .
This set is a line when T (v) 6= 0 and it contain a single vector T (a) otherwise.
REMARK. Using similar argument, we can show that T maps a line segment with end points of
position vectors a and b to a line segment with end points of position vectors T (a) and T (b).
Example 2 (Stretching and compression in R2). Let A be a 2 × 2 diagonal matrix with positive
diagonal entries, that is, a matrix of the form
A =
(
λ1 0
0 λ2
)
with λ1 > 0, λ2 > 0.
c©2020 School of Mathematics and Statistics, UNSW Sydney
90 CHAPTER 7. LINEAR TRANSFORMATIONS
Then the function value of x =
(
x1
x2
)
is y = TA(x) = Ax, where
y =
(
y1
y2
)
=
(
λ1 0
0 λ2
)(
x1
x2
)
=
(
λ1x1
λ2x2
)
.
Thus, b1 = λ1x1 and b2 = λ2x2, and hence the effect of the matrix is simply to multiply the first
component x1 by the scalar λ1 and the second component x2 by the scalar λ2. Note that the first
standard basis vector e1 =
(
1
0
)
is transformed into
(
λ1
0
)
= λ1e1, that is, its direction remains
the same but it is either stretched (if λ1 > 1) or compressed (if λ1 < 1). Similarly, the second
standard basis vector e2 =
(
0
1
)
is transformed into
(
0
λ2
)
= λ2e2 with a resulting stretching if
λ2 > 1 or compression if λ2 < 1.
Figure 4(a) shows a picture of a 5-point
star with vertices A(1, 5), B(4, 3), C(3,−1),
D(−1,−1) and E(−2, 3). Suppose X is the
point (1, 3) on the line segment BE.
Hence the position vectors of these points
are respectively
a =
(
1
5
)
, b =
(
4
3
)
,
c =
(
3
−1
)
, d =
(−1
−1
)
,
e =
(−2
3
)
and x =
(
1
3
)
.
b
A
B
CD
E
X
x1
x2
Figure 4(a): A 5-point star.
When λ1 = λ2 = 2, the matrix A is
(
2 0
0 2
)
. The points in Figure 4(a) will be “transformed”
to A′, B′, C ′, D′, E′ and X ′ according to
Aa =
(
2 0
0 2
)(
1
5
)
=
(
2
10
)
,
and similarly,
Ab =
(
8
6
)
, Ac =
(
6
−2
)
, Ad =
(−2
−2
)
, Ae =
(−4
6
)
and Ax =
(
2
6
)
.
By Theorem 1 and the remark after it, the line segment AB will be transformed to A′B′ and so
on. The star will be transformed to one shown in Figure 4(b).
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.3. GEOMETRIC EXAMPLES OF LINEAR TRANSFORMATIONS 91
When λ1 = λ2 = 0.5, the matrix A is
(
0.5 0
0 0.5
)
. The points in Figure 4(a) will be “trans-
formed” according to
Aa =
(
0.5
2.5
)
, Ab =
(
2
1.5
)
, Ac =
(
1.5
−0.5
)
, Ad =
(−0.5
−0.5
)
, Ae =
(−1
1.5
)
and Ax =
(
0.5
1.5
)
.
The star will be transformed to one shown in Figure 4(c).
b
A′
B′
C ′D′
E′
X ′
x1
x2
Figure 4(b): Image under A =
(
2 0
0 2
)
.
b
A′
B′
C ′D′
E′
X ′
x1
x2
Figure 4(c): Image under A =
(
0.5 0
0 0.5
)
.
Figure 4(d) shows the image of the 5-point
star when A =
(
2 0
0 1
)
.
The star is stretched to twice the width
horizontally.
b
A′
B′
C ′D′
E′
X ′
x1
x2
Figure 4(d): Image under A =
(
2 0
0 1
)
.

Example 3 (Rotation in a plane). Suppose that X is an arbitrary point in a plane and then X is
rotated about the origin O anticlockwise by an angle α to a new position X ′. Let x and x′ be the
position vectors of the points X and X ′ respectively. Show that the function Rα : R2 −→ R2 such
that Rα(x) = x
′ is a linear transformation. Find the matrix Aα such that Aαx = x′.
c©2020 School of Mathematics and Statistics, UNSW Sydney
92 CHAPTER 7. LINEAR TRANSFORMATIONS
Solution. We know that Rα will be linear if, for all vectors a, b ∈ R2 and scalar λ ∈ R, the
following two conditions are satisfied.
Addition condition Rα(a+ b) = Rα(a) +Rα(b)
Scalar multiplication condition Rα(λa) = λRα(a).
We can see the addition condition from Figure 5. The vector formed adding a and b first then
rotating the sum a+ b is the same as the one formed by first rotating a and b then adding these
rotated vectors.
a
b
0
Add a and b−−−−−−−−−−→
first
a
b
0
a+
b
Rotate
a and b
first
Rotate
a+ b
Rα(a)
Rα(b)
0
Add Rα(a) and−−−−−−−−−−−−→
Rα(b)
0
Rα(a) +Rα(b)
Figure 5: The geometry of the addition condition for rotations.
You should attempt to draw a picture to illustrate the scalar multiplication condition. In any
case, since both the addition condition and the scalar multiplication condition hold, Rα is a linear
transformation.
By Theorem 2 in Section 7.2, the columns of the matrix Aα are Rα(e1) and Rα(e2). From
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.3. GEOMETRIC EXAMPLES OF LINEAR TRANSFORMATIONS 93
Figure 6 and the fact that the length of both Rα(e1) and Rα(e2) are 1, we have
Rα(e1) =
(
cosα
sinα
)
and Rα(e2) =
(− sinα
cosα
)
.
P
M
Q N
O
Rα(e1) =
−−→
OP
Rα(e2) =
−−→
OQ
OM = ON = cosα
PM = QN = sinα
αα
Figure 6.
The matrix
Aα =
(
cosα − sinα
sinα cosα
)
,
is called the rotation matrix for angle α. ♦
Example 4 (Projections). The projection of a vector x ∈ Rn on a fixed, non-zero vector b ∈ Rn
is given by
projbx =
x · b
|b|2 b.
Show that the function T : Rn → Rn defined by
T (x) = projbx for x ∈ Rn.
is a linear map.
Solution. Clearly, the domain and codomain are vector spaces. Instead of proving that T is linear
by geometric properties, we use the algebraic properties of the dot product. For all x,x′ ∈ Rn
T (x+ x′) = projb(x+ x
′) =
(x+ x′) · b
|b|2 b =
x · b+ x′ · b
|b|2 b = T (x) + T (x
′),
and hence the addition condition is satisfied.
Finally, for all x ∈ Rn and λ ∈ R,
T (λx) =
(λx) · b
|b|2 b = λ
(
x · b
|b|2 b
)
= λT (x).
Thus, the scalar multiplication condition is also satisfied, and therefore T is a linear map. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
94 CHAPTER 7. LINEAR TRANSFORMATIONS
Example 5 (Dot Product). Let b be a fixed vector in Rn. Show that the function T : Rn → R,
defined by
T (x) = b · x for x ∈ Rn,
is a linear map.
The proof that T is a linear map is similar to the previous example and is left as an exercise. ♦
The following examples show the importance of the linear maps defined by dot product.
Example 6. From Example 5, for each 1 6 i 6 n the function Pi : R
n −→ R is defined by
Pi(x) = ei · x for x ∈ Rn,
where ei is the ith standard basis element. It is not difficult to see that if x =

x1...
xn

, the value
Pi(x) is simply xi, the ith component of x. ♦
This example can be generalised to any basis of Rn.
Example 7. Suppose that {v1, v2, . . . , vn} is an orthonormal basis for Rn and 1 6 i 6 n. The
function Pi : R
n −→ R is defined by
Pi(x) = vi · x for x ∈ Rn.
By the argument used in Example 6 in Section 6.6 we can prove that if x = λ1v1+ · · ·+ λnvn, the
value Pi(x) is simply λi, the coefficient of vi in the unique way of writing x as a linear combination
of the basis vectors. ♦
7.4 Subspaces associated with linear maps
There are two important subspaces associated with a linear map. These subspaces are called the
kernel (or null space, which is the name Maple uses) of the linear map and the image (or range)
of the map. Informally, the kernel is the set of zeroes of the function, and the image is the set of
all function values. This is shown diagrammatically in Figure 7. You should of course not take this
picture too literally — all the sets drawn as discs are vector spaces!
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 95
b bO O
ker(T )
im(T )
T
V W
Figure 7: Kernel and image.
7.4.1 The kernel of a map
You will be familiar with the fact that one of the important properties of functions (for example
of quadratics or polynomials) is the values of their zeroes. The set of zeroes of a linear map is also
of importance.
Definition 1. Let T : V → W be a linear map. Then the kernel of T (written
ker(T )) is the set of all zeroes of T , that is, it is the subset of the domain V defined
by
ker(T ) = {v ∈ V : T (v) = 0}.
Example 1. Showing that a vector v is in the kernel of a linear map T is simply a verification
that T (v) = 0. In particular, 0 ∈ ker(T ) for any linear map T , since T (0) = 0. ♦
Example 2 (Dot Product). In Example 5 of Section 7.3, we showed that the function T : Rn → R
defined by T (x) = b · x for x ∈ Rn is a linear map. The kernel of T is
ker(T ) = {x ∈ Rn : b · x = 0} ,
that is, ker(T ) is the set of vectors which are orthogonal to the given fixed vector b.
For the special case that x ∈ R3, the equation b ·x = 0 is the point-normal form of the equation
of a plane in R3, and hence ker(T ) corresponds to the points on a plane with normal b which passes
through the origin. ♦
For the important special case of a linear map TA : R
n → Rm associated with an m× n matrix
A, the kernel has a simple interpretation. For matrices, the definition of kernel becomes:
c©2020 School of Mathematics and Statistics, UNSW Sydney
96 CHAPTER 7. LINEAR TRANSFORMATIONS
Definition 2. For an m×n matrix A, the kernel of A is the subset of Rn defined
by
ker(A) = {x ∈ Rn : Ax = 0} ,
that is, it is the set of all solutions of the homogeneous equation Ax = 0.
Example 3. Suppose that A =
(
1 2
3 6
)
and x =
(
2
−1
)
. Since
Ax =
(
1 2
3 6
)(
2
−1
)
=
(
1× 2 + 2× (−1)
3× 2 + 6× (−1)
)
=
(
0
0
)
,
which means that x ∈ ker(A) by definition.
To find the kernel of a matrix A, we need to find the solution set of the equation Ax = 0.
Example 4. Find the kernel of the matrix A, where
A =

 1 4 2 73 6 0 15
2 −4 −8 2

 .
Solution. The kernel is the set of all solutions of the homogeneous system of equations Ax = 0.
An equivalent row-echelon form U for A is
U =

 1 4 2 70 −6 −6 −6
0 0 0 0

 .
We then set parameters to the variables of the non-leading columns — x3 = λ1 and x4 = λ2. By
back substitution, we obtain the solution of Ax = 0 as
x =


x1
x2
x3
x4

 = λ1


2
−1
1
0

+ λ2


−3
−1
0
1

 ,
and hence,
ker(A) =

x ∈ R
4 : x = λ1


2
−1
1
0

+ λ2


−3
−1
0
1

 for λ1, λ2 ∈ R

 .
In this example, the kernel can be interpreted geometrically as a plane in R4 through the origin
parallel to


2
−1
1
0

 and


−3
−1
0
1

. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 97
A very important property of the kernel of a linear map is given in the following theorem.
Theorem 1. If T : V → W is a linear map, then ker(T ) is a subspace of the domain V .
Proof. We use the Subspace Theorem (Theorem 1) of Section 6.3 and prove that ker(T ) is a
non-empty subset of V which is closed under addition and scalar multiplication.
It is not the empty set, since T (0) = 0 and so 0 ∈ ker(T ).
Suppose that v,v′ ∈ ker(T ) and λ ∈ F. Since T is linear, it satisfies the addition and scalar
multiplication conditions, so we have
T (v + v′) = T (v) + T (v′) = 0 and T (λv) = λT (v) = 0,
and hence both v + v′ and λv are in ker(T ). Thus, ker(T ) is closed under addition and scalar
multiplication, and the proof is complete.
The dimension of the kernel is important and is given a special name.
Definition 3. The nullity of a linear map T is the dimension of ker(T ). The
nullity of a matrix A is the dimension of ker(A).
Proposition 2. Let A be an m × n matrix with real entries and TA : Rn → Rm the associated
linear transformation. Then
ker(TA) = ker(A)
Proof. TA(x) = 0⇔ Ax = 0.
The nullity of a matrix A can be easily obtained from the properties of row-echelon forms by
using the following result.
Proposition 3. For a matrix A:
nullity(A) = maximum number of independent vectors in the solution space of Ax = 0
= number of parameters in the solution of Ax = 0 obtained by Gaussian
elimination and back substitution
= number of non-leading columns in an equivalent row-echelon form U for A.
Although a general proof of this proposition is not difficult to construct, we shall restrict
ourselves to looking at an example.
Example 4 (continued). Find nullity(A), and a basis for ker(A), for
A =

 1 4 2 73 6 0 15
2 −4 −8 2

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
98 CHAPTER 7. LINEAR TRANSFORMATIONS
Solution. We have found that any vector x ∈ ker(A) can be written as
x = λ1


2
−1
1
0

+ λ2


−3
−1
0
1

 .
This is a linear combination of the two vectors in the parametric vector form for the solution
of Ax = 0. These two vectors are linearly independent, since if x = 0 then the parameters λ1 and
λ2 are both zero (look at the third and fourth rows of the linear combination). Thus, we obtain a
basis for ker(A), 



2
−1
1
0

 ,


−3
−1
0
1




and hence nullity(A) = dim(ker(A)) = 2. This illustrates Proposition 3. ♦
For matrices, there is a close relationship between linear independence of the columns and the
nullity of the matrix.
Proposition 4. The columns of a matrix A are linearly independent if and only if nullity(A) = 0.
Proof. From Proposition 1 of Section 6.5, the columns of A are linearly independent if and only
if x = 0 is the only solution of Ax = 0. That is, if and only if 0 is the only element of ker(A), in
which case, nullity(A) = dim(ker(A)) = 0.
7.4.2 Image
The range or image of a function is the set of all function values (see, for example, Appendix 7.10).
In this course we will usually use the term image rather than range. A formal definition of the
image of a linear map is as follows.
Definition 4. Let T : V → W be a linear map. Then the image of T is the set
of all function values of T , that is, it is the subset of the codomain W defined by
im(T ) = {w ∈W : w = T (v) for some v ∈ V }.
For the special case of a linear map associated with a real m×n matrix, the definition becomes:
Definition 5. The image of an m× n matrix A is the subset of Rm defined by
im(A) = {b ∈ Rm : b = Ax for some x ∈ Rn} .
We have met this set several times before. In the language of linear equations, this set im(A) is
just the set of all right-hand-side vectors b for which the equation Ax = b has a solution, and in
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 99
vector-space language it is just the span of the columns of the matrix A, that is, the column space
of A. Thus, we have
range(A) = im(A) = col(A) = span (columns of A)
= {b ∈ Rm : Ax = b has a solution} .
These connections mean that any questions about the image of a matrix can be solved by
the methods previously given for linear equations and spans in Chapter 6. It is useful to give an
example, as it will serve to review some of the previous results on vector spaces and linear equations.
Example 4 (continued). Find conditions on a vector b for b to be in im(A) where
A =

 1 4 2 73 6 0 15
2 −4 −8 2

 .
Solution. We look for conditions on b =

b1b2
b3

 ∈ R3 for Ax = b to have a solution.
For hand calculations on a small system of equations, the simplest method of solution is as
follows. Instead of put b1, b2, b3 on the right hand side, the three right-hand columns of the
following augmented matrix are coefficients of b1, b2, b3.
(A|b) =

 1 4 2 7 1 0 03 6 0 15 0 1 0
2 −4 −8 2 0 0 1

 .
On reduction to row-echelon form using Gaussian elimination, we find
(U |y) =

 1 4 2 7 1 0 00 −6 −6 −6 −3 1 0
0 0 0 0 4 −2 1

 .
This system of equations has a solution if and only if the components of b satisfy
4b1 − 2b2 + b3 = 0.
In this case, im(A) has a geometric interpretation as a plane through the origin in R3 with
normal

 4−2
1

. In vector-space language, im(A) is a two-dimensional subspace of R3. ♦
Note that for larger matrices it is preferable to use computer packages such as Maple to solve
the equations.
An extremely important property of the image of a linear map is as follows.
Theorem 5. Let T : V → W be a linear map between vector spaces V and W . Then im(T ) is a
subspace of the codomain W of T .
c©2020 School of Mathematics and Statistics, UNSW Sydney
100 CHAPTER 7. LINEAR TRANSFORMATIONS
Proof. We use the Subspace Theorem (Theorem 1 of Section 6.3) and show that im(T ) is a
non-empty subset of W which is closed under vector addition and scalar multiplication.
We note first that im(T ) is a subset of W . Since, from Proposition 1 of Section 7.1, T (0) = 0,
we see that 0 ∈ im(T ).
Closure under addition. If w,w′ ∈ im(T ), then
w = T (v) for some v ∈ V and w′ = T (v′) for some v′ ∈ V,
and hence,
w +w′ = T (v) + T (v′) = T (v + v′).
But, since V is a vector space, v + v′ ∈ V , and therefore w + w′ ∈ im(T ) and im(T ) is closed
under addition.
Closure under scalar multiplication. If w ∈ im(T ) and λ ∈ F , then
w = T (v) for some v ∈ V, and hence λw = λT (v) = T (λv).
But, since V is a vector space, λv ∈ V , and therefore λw ∈ im(T ) and im(T ) is closed under
scalar multiplication.
The proof is complete.
The next result is obvious.
Proposition 6. Let A be an m × n matrix with real entries and TA : Rn → Rm the associated
linear transformation. Then
im(A) = im(TA)
We have shown in Theorem 5 that im(T ) is always a subspace of the codomain of T . Thus, the
fundamental vector-space properties of basis and dimension must apply to im(T ). The dimension
of the image is very important and it has therefore been given a special name.
Definition 6. The rank of a linear map T is the dimension of im(T ). The rank
of a matrix A is the dimension of im(A).
The rank is usually regarded as one of the most important properties of a matrix, since it is the
maximum number of linearly independent right-hand-side vectors for which a solution to Ax = b
can be found. Some important properties of the rank of a matrix are summarised in the following
proposition.
Proposition 7. For a matrix A:
rank(A) = maximal number of linearly independent columns of A
= number of leading columns in a row-echelon form U for A
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.4. SUBSPACES ASSOCIATED WITH LINEAR MAPS 101
Proof. From before, im(A) = col(A) = span (columns of A). A basis for span(columns of A)
is a maximal set of linearly independent columns of A. One maximal set of linearly independent
columns of A are the columns which reduce to leading columns in a row-echelon form U . Hence,
number of leading columns = number of linearly independent columns of A
= number of vectors in basis for col(A)
= dim(col(A)) = dim(im(A)) = rank(A).
Example 4 (continued). Find rank(A), and a basis for im(A), for
A =

 1 4 2 73 6 0 15
2 −4 −8 2

 .
Solution. Since there are two leading columns (1 and 2) in the row-echelon form
U =

 1 4 2 70 −6 −6 −6
0 0 0 0

 ,
hence rank(A) = 2.
A basis for im(A) therefore contains two vectors. One maximal set of linearly independent
columns of A is columns 1 and 2 of A, and hence a basis for im(A) is



13
2

 ,

 46
−4



. ♦
7.4.3 Rank, nullity and solutions of Ax = b
Example 4 illustrates the following important fact about the rank and nullity of a matrix.
Theorem 8 (Rank-Nullity Theorem for Matrices). For any matrix A,
rank(A) + nullity(A) = number of columns of A.
Proof. Let U be an equivalent row-echelon form for A obtained by the Gaussian elimination
algorithm. Then, from Proposition 7, rank(A) = number of leading columns in U . Also, from
Proposition 3, nullity(A) = number of non-leading columns in U . But, of course, number of leading
columns + number of non-leading columns = total number of columns in U = total number of
columns in A, and the result is proved.
The above theorem is equivalent to the following result for linear maps between finite dimen-
sional vector spaces.
Theorem 9 (Rank-Nullity Theorem). Suppose V and W are finite dimensional vector spaces and
T : V →W is linear. Then
rank(T ) + nullity(T ) = dim(V ).
c©2020 School of Mathematics and Statistics, UNSW Sydney
102 CHAPTER 7. LINEAR TRANSFORMATIONS
A proof of Theorem 9, using a suitably constructed basis of V , is given in Section 7.9.
For matrices, a very common use of rank and nullity is to classify the types of solution of a
system of linear equations Ax = b. The basic results are summarised in the following proposition.
Theorem 10. The equation Ax = b has:
1. no solution if rank(A) 6= rank([A|b]), and
2. at least one solution if rank(A) = rank([A|b]). Further,
i) if nullity(A) = 0 the solution is unique, whereas,
ii) if nullity(A) = ν > 0, then the general solution is of the form
x = xp + λ1k1 + · · ·+ λνkν for λ1, . . . , λν ∈ R,
where xp is any solution of Ax = b, and where {k1, . . . ,kν} is a basis for ker(A).
Proof. Let U and (U |y) be equivalent row-echelon forms for A and (A|b) obtained by the Gaus-
sian elimination algorithm.
Now, from Chapter 4, we know that Ax = b has a solution if and only if the right-hand-side
column y is a non-leading column, and hence if and only if the numbers of leading columns in U
and [U |y] are equal. But, from Proposition 7, rank(A) = number of leading columns in U , and
rank([A|b]) = number of leading columns in (U |y), and thus Ax = b has a solution if and only if
the ranks of A and (A|b) are equal.
The proof of parts 2(i) and 2(ii) follows immediately from the relation between solutions of
a non-homogeneous system and the corresponding homogeneous system (see Chapter 4) and the
fact that nullity(A) is equal to the maximum number of linearly independent solutions of the
homogeneous equation Av = 0.
Note. A similar type of solution to the general solution in 2(ii) above is also obtained as the solution
of a linear differential equation. In this differential equation case, xp is called a “particular solution”
and the parametric terms are called the “complementary function”. The similarity between the
two types of solution is due to the fact that both the matrix and differential equation problems
involve linear functions.
Example 5. Illustrate the above rules with the system of equations given by the augmented matrix
(A|b) =

 0 0 2 −1 3 11 −2 12 3 4 −1
3 2 1 4 6 0

 .
Solution. Gaussian elimination gives
(U |y) =

 1 −2
1
2 3 4 −1
0 8 −12 −5 −6 3
0 0 2 −1 3 1

 .
The system has a solution as y is a non-leading column. The number of leading columns in U
is 3, and hence rank(A) = 3. Similarly, rank(A|b) = 3.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.5. FURTHER APPLICATIONS AND EXAMPLES OF LINEAR MAPS 103
On back substitution, the parametric vector form of the solution is found to be
x =


− 716
13
32
1
2
0
0


+ λ1


−3116
21
32
1
2
1
0


+ λ2


−3116
21
32
3
2
0
1


= xp + λ1k1 + λ2k2.
Note that
A


− 716
13
32
1
2
0
0


=

 1−1
0

 ; A


−3116
21
32
1
2
1
0

 =

00
0

 ; A


−3116
21
32
3
2
0
1


=

00
0

 ,
and that the number of non-leading columns of U = 2 = nullity(A) = the number of parameters
in solution. This parametric vector form is the general solution of Ax = b, and as expected it is
the sum of a “particular solution” of Ax = b and a “complementary function” which is a linear
combination of two linearly independent solutions of Ax = 0 (a basis for ker(A)). ♦
7.5 Further applications and examples of linear maps
Although the theory that we have developed so far in this chapter applies to all linear maps, most
examples have been restricted to maps for which the domain is Rn and the codomain is Rm. In
this section we will give some examples of linear maps in which the domain and codomain are other
kinds of vector spaces.
A simple, but useful, map in any vector space is the map which takes a vector to itself.
Example 1. The identity map idV : V → V on a vector space V is defined by
idV (v) = v for all v ∈ V.
This map is linear, since for all v,v′ ∈ V and all scalar λ,
idV (v + v
′) = v + v′ = idV (v) + idV (v′)
and idV (λv) = λv = λ idV (v).

In Theorems 2 and 3 of Section 7.1 we showed that linear maps had the important property
that they preserved linear combinations. As the next example shows, every linear combination can
also be regarded as the image of a linear map.
c©2020 School of Mathematics and Statistics, UNSW Sydney
104 CHAPTER 7. LINEAR TRANSFORMATIONS
[H] Example 2. Let S = {v1, . . . ,vn} be a subset of a vector space V and let x1, . . . , xn ∈ R.
Show that the map T given by
T (x) = x1v1 + · · ·+ xnvn where x =

x1...
xn

 ∈ Rn
is a linear map.
Solution. The rule obviously defines a function, since T (x) is uniquely determined for each
x ∈ Rn.
To prove that T is linear we use Theorem 2 of Section 7.1.
Suppose x,x′ ∈ Rn and λ, λ′ ∈ R. Then
λx+ λ′x′ = (λx1 + λ′x′1, . . . , λxn + λ
′x′n),
and hence
T (λx+ λ′x′) = (λx1 + λ′x′1)v1 + · · ·+ (λxn + λ′x′n)vn
= λ(x1v1 + · · ·+ xnvn) + λ′(x′1v1 + · · · + x′nvn)
= λT (x) + λ′T (x′).
Thus T is a linear map. ♦
This example shows that all properties of linear combinations discussed in Chapter 6 can in
fact be restated in the language of linear maps.
Example 3. Let V be a vector space over the real numbers, and let B = {v1, . . . , vn} be an
ordered basis for V . For any v ∈ V , we can write the vector uniquely as a linear combination of B,
v = x1v1 + · · ·+ xnvn.
Show that the rule T : V → Rn defined by
T (v) =

x1...
xn

 for v ∈ V
is a linear map.
Solution. Obviously the function T has domain V and codomain Rn which are vector spaces.
To prove that this function is a linear map, we check the addition and scalar multiplication
conditions. For all λ ∈ R and v,v′ ∈ V , we can write in a unique way that
v = x1v1 + · · ·+ xnvn and v′ = x′1v1 + · · ·+ x′nvn.
Since
v + v′ = (x1 + x′1)v1 + · · ·+ (xn + x′n)vn and λv = (λx1)v1 + · · ·+ (λxn)vn
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.5. FURTHER APPLICATIONS AND EXAMPLES OF LINEAR MAPS 105
we have
T (v + v′) =

x1 + x

1
...
xn + x

n

 =

x1...
xn

+

x

1
...
x′n

 = T (v) + T (v′)
and T (λv) =

λx1...
λxn

 = λ

x1...
xn

 = λT (v),
and hence T is a linear map. ♦
[X] REMARK: The above example simply says that the function which maps a vector to its coor-
dinate vector with respect to a basis is linear.
We shall now give some examples of linear maps associated with the vector spaces of polynomials
and real-valued functions.
Example 4. Show that the function T : C3 → P2(C) defined by T

a0a1
a2

 = p, where a0, a1, a2 ∈ C
and
p(z) = a0 + a1 + (a2 + 3a0)z + a1z
2 for z ∈ C,
is a linear map.
Before we solve this problem, note that an argument of T is a complex vector a =

a0a1
a2

 ∈ C3,
while the corresponding function value p = T (a) is a complex polynomial of degree less than or
equal to 2. Some function values are the polynomials given by
T

12
3



 (z) = 1 + 2 + (3 + 3)z + 2z2 = 3 + 6z + 2z2,

T

−20
i



 (z) = −2 + (i− 6)z,

T

00
0



 (z) = 0.
Solution. We use 7.1.2. Let λ, λ′ ∈ C, a =

a0a1
a2

 ∈ C3, a′ =

a′0a′1
a′2

 ∈ C3 and let
s = T (λa+ λ′a′), p = T (a), and q = T (a′). Then
s = T

λa0 + λ′a′0λa1 + λ′a′1
λa2 + λ
′a′2

 ,
and hence
c©2020 School of Mathematics and Statistics, UNSW Sydney
106 CHAPTER 7. LINEAR TRANSFORMATIONS
s(z) = λa0 + λ
′a′0 + λa1 + λ
′a′1 +
(
λa2 + λ
′a′2 + 3(λa0 + λ
′a′0)
)
z + (λa1 + λ
′a′1)z
2
= λ
(
a0 + a1 + (a2 + 3a0)z + a1z
2
)
+ λ′
(
a′0 + a

1 + (a

2 + 3a

0)z + a

1z
2
)
= λp(z) + λ′q(z).
Thus, T (λa+ λ′a′) = s = λp+ λ′q = λT (a) + λ′T (a′), and hence ♦
As the next examples show, calculus provides many important applications of linear maps as
differentiation and integration are both associated with linear maps.
Example 5 (Differentiation of polynomials). Let Pn(R) be the vector space of real polynomials of
degree less than or equal to n. Show that the function D : Pn(R)→ Pn−1(R), defined by
D(p) = p′, where p′(x) =
dp
dx
for p ∈ Pn(R) and x ∈ R,
is a linear map.
Solution. Firstly, we note that if p is a polynomial of degree k then the derivative p′ exists and is a
polynomial of degree k−1. Hence, if p ∈ Pn(R) then p′ ∈ Pn−1(R), and thus D : Pn(R)→ Pn−1(R)
is a function.
We now prove D is linear by checking the addition and scalar multiplication conditions of the
definition of a linear map.
For all p, q ∈ Pn(R), we have from the properties of derivatives that
(p+ q)′(x) =
d
dx
(
p(x) + q(x)
)
=
d
dx
p(x) +
d
dx
q(x) = p′(x) + q′(x).
Hence, D(p + q) = (p+ q)′ = p′ + q′ = D(p) +D(q), and the addition condition is satisfied.
Further, for all p ∈ Pn(R) and λ ∈ R, we have that
d
dx
(λp(x)) = λ
d
dx
p(x),
and hence D(λp) = λD(p) and the scalar multiplication condition is also satisfied. Thus, D is a
linear map. (Note that nullity (D) = 1 and rank(D) = n.) ♦
Example 6 (Integration of Polynomials). Show that the function I : Pn(R) → Pn+1(R), defined
by I(p) = q, where
q(x) =
∫ x
0
p(t)dt for p ∈ Pn(R) and x ∈ R,
is a linear map.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.5. FURTHER APPLICATIONS AND EXAMPLES OF LINEAR MAPS 107
Before solving this problem, we give some examples of function values of I. If p is the zero
polynomial we have
I(p) =
∫ x
0
0 dx = 0,
whereas if p is the polynomial of degree 2 defined by
p(x) = 1− 3x+ 4x2 then I(p) =
∫ x
0
(1− 3t+ 4t2)dt = x− 3
2
x2 +
4
3
x3
is a polynomial of degree 3.
Solution. We will prove that I is a linear map by using Theorem 2 of Section 7.1.
For ease of writing, we let q1 = I(p1), q2 = I(p2) and q = I(λ1p1 + λ2p2), where
p1, p2 ∈ Pn(R) and λ1, λ2 ∈ R. Then, from the properties of integration,
q(x) =
∫ x
0
(
λ1p1(t) + λ2p2(t)
)
dt
= λ1
∫ x
0
p1(t)dt+ λ2
∫ x
0
p2(t)dt
= λ1q1(x) + λ2q2(x).
Thus, I(λ1p1 + λ2p2) = q = λ1q1 + λ2q2 = λ1I(p1) + λ2I(p2), and hence I is a linear map. (Note
that nullity (I) = 0 and rank(I) = n+ 1.) ♦
The next example is one of a class of so-called integral transforms which have many uses in
mathematics, science, engineering and economics.
[X] Example 7. The Laplace transform. Let s and a be real numbers, and let Va be the set of
real-valued functions on the interval (0,∞) defined by
Va =
{
f ∈ R[(0,∞)] :
∫ ∞
0
e−stf(t)dt exists for a < s <∞
}
.
Now, from the theory of integration, if f, g ∈ Va and λ, µ ∈ R then∫ ∞
0
e−st(λf(t) + µg(t))dt exists for a < s <∞,
and thus λf + µg ∈ Va. Hence, from the Alternative Subspace Theorem of Section 6.8, Va is a
subspace of the vector space R[(0,∞)] of all real-valued functions with domain (0,∞).
We now define a function L : Va → R[(a,∞)] with function values L(f) = fL, where fL is the
function from the domain (a,∞) to the codomain R defined by
fL(s) =
∫ ∞
0
e−stf(t)dt for a < s <∞.
fL is called the Laplace transform of the function f . ♦
We shall now prove that L is a linear map.
c©2020 School of Mathematics and Statistics, UNSW Sydney
108 CHAPTER 7. LINEAR TRANSFORMATIONS
Proof. Let f, g ∈ Va and λ, µ ∈ R. Then, as noted above, the function h = λf +µg is an element
of Va, and its Laplace transform hL = L(λf + µg) satisfies
hL(s) =
∫ ∞
0
e−sth(t)dt
=
∫ ∞
0
e−st
(
λf(t) + µg(t)
)
dt
= λ
∫ ∞
0
e−stf(t)dt+ µ
∫ ∞
0
e−stg(t)dt
= λfL(s) + µgL(s).
Thus,
L(λf + µg) = hL = λfL + µgL = λL(f) + µL(g),
and hence L is a linear map.
The Laplace transform is widely used in, for example, the solution of linear differential equations
and the theory of dynamical systems. To understand the technique, work through question 54. It
has extensive applications in electrical engineering, computer science, physics, applied and pure
mathematics, and so on.
To finish this section, we shall describe some applications of linear maps in the areas of optics,
chemical engineering, electrical engineering, and population dynamics.
[H] Example 8 (Optics). White light is made up of the seven colours: red, orange, yellow, green,
blue, indigo, and violet. Assume that a green filter transmits 0% of red and violet, 5% of orange
and indigo, 20% of yellow and blue, and 90% of the green light that falls on it. This green filter can
be represented by a linear map, and the kernel and image of the map have a simple interpretation.
We let a1, a2, a3, a4, a5, a6 and a7 be the intensities of the red, orange, yellow, green, blue,
indigo and violet light respectively in the incoming light. Then the filter can be represented by the
very simple linear map T : R7 → R7 given by
T (a) =


0 0 0 0 0 0 0
0 0.05 0 0 0 0 0
0 0 0.2 0 0 0 0
0 0 0 0.9 0 0 0
0 0 0 0 0.2 0 0
0 0 0 0 0 0.05 0
0 0 0 0 0 0 0




a1
a2
a3
a4
a5
a6
a7


.
The kernel of the filter is the set of all possible incident light such that there is no transmitted
light. The filter will transmit no light if the incoming light contains only red and violet light. Thus,
a basis for the kernel of the map T which models the filter is [1 0 0 0 0 0 0]T (red light only) and
[0 0 0 0 0 0 1]T (violet light only), and the nullity is 2.
The image of the filter is the transmitted light. The transmitted light can only contain orange,
yellow, green, blue, and indigo, and these colours may be taken as a basis for the image. As five
basic colours are transmitted, the rank is 5. Mathematically, a basis for the image of the map T is
the set of 5 vectors [0 1 0 0 0 0 0]T , [0 0 1 0 0 0 0]T , [0 0 0 1 0 0 0]T , [0 0 0 0 1 0 0]T and [0 0 0 0 0 1 0]T . ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.6. [X] REPRESENTATION OF LINEAR MAPS BY MATRICES 109
[X] Example 9 (Population Dynamics). As a simple model of the growth of a human population
in a given country, we neglect males, and divide females into the six age groups of 0–14, 15–29,
30–44, 45–59, 60–74 and 75–89. It is found that, on average, 5% of the 0–14 group, 3% of the
15–29 group, 5% of the 30–44 group, 10% of the 45–59 group, 40% of the 60–74 group and 100%
of the 75–89 group die in a fifteen-year period, whereas, on average, 0% of the 0–14 group, 50% of
the 15–29 group, 45% of the 30–44 group, 6% of the 45–59 group and 0% of the 60–74 and 75–89
groups give birth to a female baby in a fifteen-year period.
The population at any time can be represented by a linear-transformation model as follows.
Starting at some convenient time, say January 1 1970, at which the population of the country is
known, we divide time into intervals of length 15 years. Let k represent the kth of these 15-year
periods, starting with k = 0 in the period 1970–1984. Thus, k = 1 represents 1985–1999, k=2
represents 2000–2014, etc. We then let
x1(k) number of females of age 0–14 in interval k
x2(k) number of females of age 15–29 in interval k
x3(k) number of females of age 30–44 in interval k
x4(k) number of females of age 45–59 in interval k
x5(k) number of females of age 60–74 in interval k
x6(k) number of females of age 75–89 in interval k
Now, if we know the values of xj(k), 1 6 j 6 6 in a given interval k, we can calculate the values
of xj(k + 1), 1 6 j 6 6 in the k + 1th interval from the given data. For example, the females in
the age group 15–29 in interval k+1 are the survivors of those in the age group 0–14 in interval k.
Thus, x2(k+1) = 0.95x1(k). The numbers in all groups other than the 0–15 group can be obtained
in a similar fashion. In our model, the only way that females can enter the 0–15 group is to be
born from mothers in the 15–29, 30–44 and 45–59 groups. Thus,
x1(k + 1) = 0.50x2(k) + 0.45x3(k) + 0.06x4(k).
We therefore obtain the model
x(k + 1) =


x1(k + 1)
x2(k + 1)
x3(k + 1)
x4(k + 1)
x5(k + 1)
x6(k + 1)

 =


0 0.5 0.45 0.06 0 0
0.95 0 0 0 0 0
0 0.97 0 0 0 0
0 0 0.95 0 0 0
0 0 0 0.90 0 0
0 0 0 0 0.60 0




x1(k)
x2(k)
x3(k)
x4(k)
x5(k)
x6(k)

 = Ax(k).
Thus, the population vector in time interval k+1 is the image under the linear map whose matrix
is given above of the population vector in interval k. ♦
7.6 [X] Representation of linear maps by matrices
We have seen in Section 7.2 that every linear map between the vector spaces Rn and Rm can be
represented by a matrix. The next theorem shows that this result can be generalised to any linear
map between any finite-dimensional vector spaces.
c©2020 School of Mathematics and Statistics, UNSW Sydney
110 CHAPTER 7. LINEAR TRANSFORMATIONS
Theorem 1 (General Matrix Representation Theorem). Let T : V →W be a linear map from an
n-dimensional vector space V to an m-dimensional vector space W , and let BV = {v1, . . . ,vn} be
an ordered basis for V and BW = {w1, . . . ,wm} be an ordered basis for W . Then, there is a unique
m× n matrix A such that
[T (v)]BW = A[v]BV .
Further, A is the matrix whose columns are
aj = [T (vj)]BW for 1 6 j 6 n.
Proof. Let w = T (v) ∈W and
[v]BV =

x1...
xn

 , i.e. v = x1v1 + · · · + xnvn.
By Theorem 3 of Section 7.1, we have
w = T (v) = T (x1v1 + · · ·+ xnvn) = x1T (v1) + · · · + xnT (vn).
Now, we have shown (Example 3 of Section 7.5) that taking coordinate vectors is a linear map,
and hence, on using Theorem 3 of Section 7.1, we have
[w]BW = [T (v)]BW = x1[T (v1)]BW + · · · + xn[T (vn)]BW = x1a1 + · · ·+ xnan,
where aj = [T (vj)]BW . Finally, we note that aj ∈ Rm for 1 6 j 6 n, and hence, from Proposition 3
of Section 6.4, the linear combination can be rewritten in the matrix form Ax, and the theorem is
proved.
Note.
1. This theorem says that if T maps v to T (v) then the matrix A transforms the coordinate
vector [v]BV for v with respect to an ordered basis BV of the domain into the coordinate
vector [T (v)]BW for T (v) with respect to an ordered basis BW of the codomain.
2. It is important to note that the matrix A depends only on the map T , the ordered basis BV
and the ordered basis BW . It does not depend on the particular vector v of V whose image
is being found.
Theorem 1 provides a straightforward algorithm for finding a matrix representation.
Algorithm 1. Constructing a matrix representation for a linear map.
1. Find a basis BV = {v1, . . . ,vn} for the domain V and a basis BW = {w1, . . . ,wm} for the
codomain W .
2. Find the function values T (vj), 1 6 j 6 n, of the domain basis vectors.
3. Find the coordinate vectors [T (vj)]BW of the function values T (vj) with respect to the
codomain basis BW .
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.6. [X] REPRESENTATION OF LINEAR MAPS BY MATRICES 111
4. Construct the m × n matrix A with the coordinate vectors [T (vj)]BW , 1 6 j 6 n, as its
columns.
Example 1. Construct the matrix representation of the derivative map D : P3(R) → P2(R),
defined by
D(p) = p′, where p′(x) =
dp
dx
for x ∈ R,
with respect to the standard bases of P3(R) and P2(R).
Solution. As shown in Example 5 of Section 7.5, D is a linear map, and hence it can be represented
by a matrix. We again follow algorithm 1.
For the domain P3(R), the standard basis is
{
1, x, x2, x3
}
. The function values of the basis vectors
are
D(1) = 0, D(x) = 1, D(x2) = 2x, D(x3) = 3x2.
The coordinate vectors of these function values with respect to the standard basis
{
1, x, x2
}
of
the codomain are

00
0

,

10
0

,

02
0

,

00
3

 respectively. Hence the matrix is
A =

0 1 0 00 0 2 0
0 0 0 3

 .
As an example of the use of this matrix, we find D(p) for
p(x) = 1− 3x+ 4x2 + 7x3.
The coordinate vector for p with respect to the standard basis
{
1, x, x2, x3
}
of the domain P3(R) is

1
−3
4
7

. Then the coordinate vector for D(p) with respect to the standard basis {1, x, x2} of the
codomain P2(R) is
A


1
−3
4
7

 =

 −38
21

 ,
and hence D(p) is given by (
D(p)
)
(x) = −3 + 8x+ 21x2.
This is clearly the derivative of the polynomial p. ♦
A similar procedure to that given in Example 1 can be used to find a matrix representation for
definite integration of polynomials.
In the simple case described in Example 1, it is obviously a waste of time to go through the
matrix formalism. However, in more complicated examples, it is often useful to be able to use the
powerful and efficient algorithms of matrix algebra to solve problems involving differentiation of
polynomials.
Another example involving polynomials is as follows.
c©2020 School of Mathematics and Statistics, UNSW Sydney
112 CHAPTER 7. LINEAR TRANSFORMATIONS
Example 2. Find the matrix with respect to standard bases in domain and codomain for the
linear transformation T : P2 → C3 defined by
T (a0 + a1z + a2z
2) =

2a0 + 3a2−a2
4a1 + 6a2

 .
Solution. For the domain, the standard basis is
{
1, z, z2
}
and hence the images of the domain
basis vectors are
T (1) =

20
0

 , T (z) =

00
4

 , T (z2) =

 3−1
6

 .
For the codomain, the standard basis for C3 is



10
0

 ,

01
0

 ,

00
1



. The coordinate vectors
for this basis are just the three image vectors given above, and hence the matrix is
A =

 2 0 30 0 −1
0 4 6

 .

In the above examples, we have restricted the choice of bases to standard bases in domain
and codomain. However, it is frequently possible to achieve great simplifications in calculations
involving linear maps by using special choices of bases. Two examples of the simplification which
can be achieved in this way are given below.
Example 3. A linear map TA : R
3 → R3 is defined by
TA(x) = Ax, where A =

 3.12 0.16 −0.324.76 −1.32 −7.36
2.8 −1.6 −1.8

 .
Find the matrix which represents TA with respect to the bases in both domain and codomain given
by the columns of the matrix
B =

 1 0 4−3 2 1
2 1 2

 .
Solution. The images of the basis vectors for the domain basis are
TA

 1−3
2

 = A

 1−3
2

 =

 2−6
4

 , TA

02
1

 =

 0−10
−5

 , TA

41
2

 =

123
6

 .
From algorithm 1, the columns of the matrix representing TA with respect to the given codomain
basis are just the coordinate vectors for the above images with respect to the codomain basis.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.6. [X] REPRESENTATION OF LINEAR MAPS BY MATRICES 113
These are

20
0

 ,

 0−5
0

, and

00
3

 respectively. The required matrix has these three coordinate
vectors as its columns, and it is therefore
X =

 2 0 00 −5 0
0 0 3

 .

Note that, in this example, we have found a diagonal matrix to represent a transformation
TA. A general theory which shows the conditions under which a diagonal matrix can be found to
represent a linear map will be developed in Chapter 8.
Example 4. A linear map TA : R
3 → R4 is defined by
TA(x) = Ax, where A =


1 4 2
3 4 −1
−2 0 5
3 −4 4

 .
Find the matrix U which represents TA for the standard basis in the domain R
3 and for the basis
in the codomain R4 which consists of the column vectors of the matrix
L =


1 0 0 0
3 1 0 0
−2 −1 1 0
3 2 6 1

 .
Solution. The standard basis for R3 is



10
0

 ,

01
0

 ,

00
1



, and hence the images of the
domain basis vectors are
T

10
0

 = A

10
0

 =


1
3
−2
3

 , T

01
0

 = A

01
0

 =


4
4
0
−4

 , T

00
1

 =


2
−1
5
4

 .
These images are just the columns of the matrix A.
Now, following Algorithm 1, we must find the coordinate vectors for T

10
0

, T

01
0

 and
T

00
1

 with respect to the codomain basis given by the columns of L, using the standard Gaussian
elimination algorithm.
c©2020 School of Mathematics and Statistics, UNSW Sydney
114 CHAPTER 7. LINEAR TRANSFORMATIONS
They turn out to be


1
0
0
0

 ,


4
−8
0
0

 and


2
−7
2
0

 . Thus
U =


1 4 2
0 −8 −7
0 0 2
0 0 0

 .
Note that L is “lower triangular”, U is “upper triangular”, and that LU = A. ♦
Both of the above examples illustrate the importance of suitable choices of bases in solving
problems about linear maps. If the bases are suitably chosen, a matrix representation for a map
will usually take on a very simple form from which the properties of the map can be immediately
read off. We have, of course, used a special case of this approach repeatedly throughout these
notes, where we have solved most problems about linear equations, vector spaces, and linear maps
by constructing row-echelon form matrices from which the solutions can be immediately read off.
7.7 [X] Matrix arithmetic and linear maps
We have seen in Sections 7.2 and 7.6 that every matrix defines a linear map and that every linear
map between finite-dimensional vector spaces can be represented by a matrix. We would therefore
expect that properties of matrices — such as matrix addition, multiplication of a matrix by a
scalar, matrix multiplication, inverse of a matrix — should also have an interpretation in terms
of properties of linear maps. In this section, we shall show that matrix addition corresponds to
addition of linear maps, that multiplication of a matrix by a scalar corresponds to multiplication of
a linear map by a scalar, and that matrix multiplication corresponds to composition of linear maps.
The relationship between inverses of linear maps and matrix inverses is discussed in Section 7.8.
Proposition 1 (Addition). Let A and B be real m × n matrices and let TA : Rn → Rm and
TB : R
n → Rm be the linear maps given by TA(x) = Ax and TB(x) = Bx for all x ∈ Rn. Then the
sum T = TA + TB is the linear map T : R
n → Rm given by T (x) = (A+B)x for all x ∈ Rn.
Proof. By definition of function addition, the sum T = TA + TB is the linear map
T : Rn → Rm given by
T (x) = (TA + TB)(x) = TA(x) + TB(x) for all x ∈ Rn.
Hence, on using the definition of TA and TB and the distributive law for matrix addition and
multiplication, we have
T (x) = Ax+Bx = (A+B)x for all x ∈ Rn.
Proposition 2 (Multiplication by a Scalar). Let A be a real m× n matrix and TA : Rn → Rm be
the linear map defined by TA(x) = Ax for all x ∈ Rn. Then the scalar multiple T = λTA is the
linear map T : Rn → Rm given by T (x) = (λA)x for all x ∈ Rn.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.8. [X] ONE-TO-ONE, ONTO AND INVERTIBLE LINEAR MAPS AND MATRICES 115
Proof. By definition of multiplication of a function by a scalar, the function T = λTA is given by
T (x) = (λTA)(x) = λTA(x) for all x ∈ Rn.
Hence, on using the definition of TA, we have
T (x) = λ(Ax) = (λA)x for all x ∈ Rn.
Proposition 3 (Multiplication and Composition). Let A be a real m× n matrix and B be a real
n × p matrix and let TA : Rn → Rm be the linear map defined by TA(x) = Ax for all x ∈ Rn and
let TB : R
p → Rn be the linear map defined by TB(x) = Bx for all x ∈ Rp. Then the composite
TA ◦ TB is the linear map T : Rp → Rm defined by T (x) = (AB)x for all x ∈ Rp.
Proof. The composite TA ◦ TB is the linear map T : Rp → Rm defined by
T (x) = (TA ◦ TB)(x) = TA(TB(x)) for all x ∈ Rp.
Hence, on substituting the function values for TA and TB and using the associative law of matrix
multiplication, we have
T (x) = TA(Bx) = A(Bx) = (AB)x for all x ∈ Rp.
Note. B is acting first, then A.
Similar relationships hold between linear transformations on general finite-dimensional vector
spaces and their corresponding matrices. We shall give just one example of these more general
theorems, whose proof we shall leave to the exercises.
Proposition 4. Let U , V and W be finite-dimensional vector spaces with bases BU , BV and BW
respectively. Let T : U → V and S : V →W . Then
a) S ◦ T is a linear transformation from U →W .
b) If the matrix for T with respect to BU and BV is AT , the matrix for S with respect to BV and
BW is AS and the matrix for S ◦ T with respect to BU and BW is AST , then AST = ASAT .
7.8 [X] One-to-one, onto and invertible linear maps and matrices
In this section we shall discuss the main results involving the ideas of one-to-one, onto and inverses
for linear maps and matrices and we shall show how these results for linear maps compare with the
results for general functions summarised in Appendix 7.10.
c©2020 School of Mathematics and Statistics, UNSW Sydney
116 CHAPTER 7. LINEAR TRANSFORMATIONS
7.8.1 Linear maps
The definitions of one-to-one, onto and inverse for linear maps are virtually identical to the defini-
tions for general functions. However, for ease of reading it is convenient to restate them here. On
applying the definitions of one-to-one, onto and inverse given in Appendix 7.10 to linear maps, we
obtain the following definitions.
Definition 1. A linear map T : V →W is said to be:
a) one-to-one if for all v1,v2 ∈ V , T (v1) = T (v2) only if v1 = v2.
b) onto if for all w ∈ W there exists v ∈ V such that w = T (v), that is, if
im(T ) =W .
Definition 2. Let T : V → W be a linear map. Then a function S : W → V is
called an inverse of T if it satisfies the two conditions:
a) S ◦T = idV , where idV is the identity map on V defined by idV (v) = v for all
v ∈ V .
b) T ◦S = idW , where idW is the identity map on W defined by idW (w) = w for
all w ∈W .
For linear maps, there is a simple connection between a map being one-to-one and the kernel
of the map. This connection, which is not in general true for all functions, is as follows.
Proposition 1. A linear map T : V →W is one-to-one if and only if ker(T ) = {0}, that is, if and
only if nullity(T ) = 0.
Proof. We first prove that if T is one-to-one then ker(T ) = {0}. Now, as T is a linear map
T (0) = 0, and as T is one-to-one T (v) = T (0) only if v = 0. Thus, T (v) = 0 only if v = 0, and
hence ker(T ) = {0}. We next prove that if ker(T ) = {0} then T is one-to-one. We let v1,v2 satisfy
T (v1) = T (v2), that is, T (v1)−T (v2) = 0. But, as T is a linear map, T (v1)−T (v2) = T (v1−v2).
Then as ker(T ) = {0}, we have T (v1 − v2) = 0 only if v1 − v2 = 0. Thus, T (v1) = T (v2) only if
v1 = v2, and hence T is one-to-one.
The result stated in Proposition 1 is not true in general for non-linear functions — for example,
it is not true for the function of Example 1 of Appendix 7.10 which is neither one-to-one nor onto
even though f(x) = x2 = 0 only for x = 0.
Proposition 2. If the codomain W of a linear function T : V → W is finite dimensional then T
is onto if and only if rank(T ) = dim(W ).
Proof. We know that im(T ) is a subspace of W and rank(T ) = dim(im(T )) so, by Theorem 8
of Section 6.6, rank(T ) = dim(W ) if and only if im(T ) =W , that is, if and only if T is onto.
The following result is also of importance for linear maps.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.8. [X] ONE-TO-ONE, ONTO AND INVERTIBLE LINEAR MAPS AND MATRICES 117
Proposition 3. If the domain and codomain of a linear map T : V → W are finite-dimensional
then:
a) If T is one-to-one and onto then dim(V ) = dim(W ).
b) If dim(V ) = dim(W ) and T is one-to-one then T is onto.
c) If dim(V ) = dim(W ) and T is onto then T is one-to-one.
Proof. The proofs of the three parts are based on the Rank-Nullity Theorem and Propositions 1
and 2 above.
a) If T is one-to-one then nullity(T ) = 0, and if T is onto then rank(T ) = dim(W ). Therefore,
if T is one-to-one and onto, we have from the Rank-Nullity Theorem that
dim(V ) = rank(T ) + nullity(T ) = dim(W ) + 0 = dim(W ).
b) If dim(V ) = dim(W ) and T is one-to-one then nullity(T ) = 0, and so
rank(T ) = dim(V )− nullity(T ) = dim(W ),
and therefore T is onto.
c) If dim(V ) = dim(W ) and T is onto then rank(T ) = dim(W ), and so
nullity(T ) = dim(V )− rank(T ) = dim(V )− dim(W ) = 0,
and therefore T is one-to-one.
Proposition 4. If a linear map has an inverse then the inverse is also a linear map.
Proof. Let T : V →W be a linear map with inverse S : W → V . Then, for all v ∈ V and for all
w ∈W , we have v = S(w) if and only if w = T (v).
To prove S is linear, we must show that for all w1,w2 ∈W and all scalars λ1, λ2 ∈ F
S(λ1w1 + λ2w2) = λ1S(w1) + λ2S(w2).
Now if S(w1) = v1 and S(w2) = v2, we have w1 = T (v1) and w2 = T (v2), and hence on using
the fact that T is linear we obtain
λ1w1 + λ2w2 = λ1T (v1) + λ2T (v2) = T (λ1v1 + λ2v2).
Thus, by definition of S,
S(λ1w1 + λ2w2) = λ1v1 + λ2v2
= λ1S(w1) + λ2S(w2).
The proof is complete.
c©2020 School of Mathematics and Statistics, UNSW Sydney
118 CHAPTER 7. LINEAR TRANSFORMATIONS
A fundamental result which summarises the main properties of inverses of linear maps is as
follows.
Theorem 5. If V and W are finite-dimensional vector spaces and if T : V → W is a linear map
then the following statements are equivalent:
1. T is invertible, that is, there exists a linear map S : W → V such that (S ◦ T )(v) = v for all
v ∈ V and such that (T ◦ S)(w) = w for all w ∈W .
2. T is one-to-one and onto.
3. dim(V ) = dim(W ) and there exists S : W → V such that (T ◦ S)(w) = w for all w ∈W .
4. dim(V ) = dim(W ) and there exists S : W → V such that (S ◦ T )(v) = v for all v ∈ V .
Further, the map S in statements 1, 3 and 4 is the inverse of T .
Note. The phrase “The statements are equivalent” means that if one statement is true then all
statements are true, and also that if one statement is false then all statements are false.
Proof. The equivalence of statements 1 and 2 follows immediately from Theorem 1 of Section 7.10.
We shall now show that statements 1 and 3 are equivalent by first showing that statement 1 implies
statement 3 and then by showing that statement 3 implies statement 1.
1 implies 3. If T is invertible, then there exists S : W → V such that (T ◦ S)(w) = w for all
w ∈ W . Further, as T is invertible it is also one-to-one and onto, and hence, from part (a) of
Proposition 3, dim(V ) = dim(W ).
3 implies 1. We first prove that T is onto. Let w ∈ W . Then, as S is a function from W to V ,
there exists v ∈ V such that S(w) = v. Then, on first taking the function value of v under T and
next using the property of T ◦ S given in statement 3, we have
T (v) = T (S(w)) = (T ◦ S)(w) = w.
Thus, we have shown that for all w ∈ W , there exists v ∈ V such that T (v) = w, and hence T is
onto. Then, from part (c) of Proposition 3, T is also one-to-one, and thus T is one-to-one and onto
and therefore invertible.
To complete the proof we prove that 1 and 4 are equivalent by proving first that 1 implies 4 and
then that 4 implies 1.
1 implies 4. The proof of this is virtually identical to the proof given above that 1 implies 3 and
hence we omit it.
4 implies 1. We first prove T is one-to-one. Let T (v1) = T (v2) for v1,v2 ∈ V . On taking the
composite of both sides with S, we have
(S ◦ T )(v1) = (S ◦ T )(v2).
Then, using the property of S ◦ T given in 4, we have
v1 = (S ◦ T )(v1) = (S ◦ T )(v2) = v2.
Thus, T (v1) = T (v2) only if v1 = v2, and hence T is one-to-one. Then, from part (b) of Proposi-
tion 3, T is also onto, and thus T is one-to-one and onto and therefore invertible.
The proof is complete.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.9. [X] PROOF OF THE RANK-NULLITY THEOREM 119
7.9 [X] Proof of the Rank-Nullity Theorem
In this section we give a proof of the general Rank-Nullity Theorem stated in Section 7.4.3.
Theorem 1 (Rank-Nullity Theorem). If V is a finite dimensional vector space and T : V → W
is linear then
rank(T ) + nullity(T ) = dim(V ).
Proof. Recall that rank(T ) = dim(im(T )) = number of vectors in a basis for im(T ), and that
nullity(T ) = dim(ker(T )) = number of vectors in a basis for ker(T ).
Let {w1, . . . ,wr}, where r = rank(T ), be a basis for im(T ). Then, since wj ∈ im(T ), there exists
an element vj ∈ V such that T(vj) = wj.
Let {vr+1, . . . ,vr+ν}, where ν = nullity(T ), be a basis for ker(T ).
We shall now prove that the combined set S = {v1, . . . ,vr+ν} is a basis for the domain V , and
hence that r + ν = dim(V ).
We first prove that S is linearly independent.
Suppose,
λ1v1 + · · ·+ λr+νvr+ν = 0. (#)
Taking the image of this linear combination and using the fact that linear maps preserve linear
combinations, we have
λ1T (v1) + · · · + λr+νT (vr+ν) = T (0) = 0.
Now, for j > r, vj ∈ ker(T ), and hence T (vj) = 0 for j > r. Further, for j 6 r, T (vj) = wj.
On substituting these results in the previous equation, we have
λ1w1 + · · ·+ λrwr = 0.
But the set {w1, . . . ,wr} is linearly independent (since it is a basis for im(T )), and thus λj = 0
for j 6 r. On substituting these values of the scalars in (#) we have
λr+1vr+1 + · · ·+ λr+νvr+ν = 0.
But the set {vr+1, . . . ,vr+ν} is linearly independent (it is a basis for ker(T )), and thus λj = 0
for j > r. We have therefore proved that (#) is satisfied only if λj = 0 for 1 6 j 6 r+ν, and hence
S is linearly independent.
Now we show that S is a spanning set for V .
Since S ⊆ V , span(S) is a subset of V . To complete the proof that span(S) = V we must prove
that V ⊆ span (S), that is, if v ∈ V then v ∈ span (S).
Suppose v ∈ V . Then w = T (v) exists and is an element of im(T ).
Now the set {w1, . . . ,wr} is a basis for im(T ), and hence there exist scalars such that
w = λ1w1 + · · ·+ λrwr.
Using these scalars we now form the linear combination
vI = λ1v1 + · · ·+ λrvr.
c©2020 School of Mathematics and Statistics, UNSW Sydney
120 CHAPTER 7. LINEAR TRANSFORMATIONS
Now, by definition of vI and the vj for j 6 r, and on using the fact that linear maps preserve
linear combinations, we have T (vI) = w. We now define
vR = v − vI .
Now vR satisfies
T (vR) = T (v − vI) = T (v)− T (vI) = w −w = 0,
and hence vR ∈ ker(T ). Then, since {vr+1, . . . ,vr+ν} is a basis for ker(T ), there exist scalars such
that
vR = λr+1vr+1 + · · · + λr+νvr+ν .
Then, on adding the linear combinations for vI and vR, we have
v = vI + vR = λ1v1 + · · ·+ λrvr + λr+1vr+1 + · · ·+ λr+νvr+ν ,
and hence v ∈ span (S).
We have therefore established that S is a linearly independent spanning set for V , and hence
dim(V ) = r + ν = rank(T ) + nullity(T ) as asserted in the theorem.
7.10 One-to-one, onto and inverses for functions
A fundamental problem about functions is the relationship between the points in the codomain
and the points in the domain. Now, we know that by the definition of function each point in the
domain of a function has exactly one point in the codomain as its function value. However, for a
given point in the codomain it is possible in general that either (i) it is not the function value of
any point in the domain, (ii) it is the function value of exactly one point in the domain, or (iii) it is
the function value of more than one point in the domain. To cover these possibilities the following
definitions are introduced.
Definition 1. The range or image of a function is the set of all function values,
that is, for a function f : X → Y ,
im(f) = {y ∈ Y : y = f(x) for some x ∈ X}.
Definition 2. A function is said to be onto (or surjective) if the codomain is
equal to the image of the function, that is, a function f : X → Y is onto if for all
y ∈ Y there exists an x ∈ X such that y = f(x).
Definition 3. A function is said to be one-to-one (or injective) if no point in
the codomain is the function value of more than one point in the domain, that is, a
function f : X → Y is one-to-one if f(x1) = f(x2) if and only if x1 = x2.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.10. ONE-TO-ONE, ONTO AND INVERSES FOR FUNCTIONS 121
Note that a function is onto when every point in the codomain is a function value and that it
is one-to-one when each function value corresponds to exactly one point in the domain. Further, a
function is one-to-one and onto if and only if every point in the codomain is the function value of
exactly one point in the domain.
Example 1. The function f : R→ R, defined by
f(x) = x2 for x ∈ R,
is neither one-to-one nor onto. It is not one-to-one, since, for example, x21 = x
2
2 is true for x1 = 3
and x2 = −3 6= x1. It is not onto, since there are some y ∈ R (y < 0) for which no x ∈ R exists
such that y = x2. ♦
Example 2. The function f : [0,∞)→ R defined by
f(x) = x2 for x > 0
is one-to-one but not onto. It is one-to-one, since if x21 = x
2
2 (and x1 > 0 and x2 > 0) then x1 = x2.
However, it is not onto for the reason given in Example 1. ♦
Example 3. The function f : [0,∞)→ [0,∞) defined by
f(x) = x2 for x > 0
is both one-to-one and onto. It is one-to-one for the reason given in Example 2. The function is
also onto, since for all y > 0 there is an x > 0 such that y = f(x) = x2. ♦
The definition of inverse of a function is as follows.
Definition 4. Let f : X → Y be a function. Then a function g : Y → X is called
an inverse of f if it satisfies the two conditions:
a) g ◦ f = idX , where idX is the identity function in X with the property that
idX(x) = x for all x ∈ X.
b) f ◦ g = idY , where idY is the identity function in Y with the property that
idY (y) = y for all y ∈ Y .
An alternative way of stating (a) is that (g ◦ f)(x) = g(f(x)) = x for all x ∈ X, and an alternative
way of stating (b) is that (f ◦ g)(y) = f(g(y)) = y for all y ∈ Y .
An important connection between the existence of an inverse and the properties of one-to-one
and onto is given in the following proposition.
Theorem 1. A function has an inverse if and only if the function is both one-to-one and onto.
c©2020 School of Mathematics and Statistics, UNSW Sydney
122 CHAPTER 7. LINEAR TRANSFORMATIONS
[X] Proof. Let f : X → Y be a function with domain X and codomain Y .
We first prove that if f has an inverse g : Y → X then f is one-to-one and onto.
To prove one-to-one, we note that if f(x1) = f(x2) then on taking the composition with the
inverse g we obtain g(f(x1)) = g(f(x2)), and hence from condition (a) of the definition of inverse
x1 = g(f(x1)) = g(f(x2)) = x2.
Thus, for all x1, x2 ∈ X, f(x1) = f(x2) only if x1 = x2 and f is one-to-one.
We now prove that f is onto. From condition (b) of the definition of inverse, y = f(g(y)) for all
y ∈ Y . But g is a function with codomain X, and hence g(y) = x for x ∈ X. Thus, for all y ∈ Y
there exists an x = g(y) ∈ X such that y = f(x), and hence f is onto.
To complete the proof of the theorem, we must prove that if f is one-to-one and onto then f
has an inverse function g : Y → X.
Now, as f is an onto function, for each y ∈ Y there exists x ∈ X such that y = f(x). Further,
as f is one-to-one, y1 = f(x1) = f(x2) = y2 only if x1 = x2, and hence for each y ∈ Y there exists
a unique x ∈ X such that y = f(x). We can therefore define a function g : Y → X by the rule:
For each y ∈ Y define g(y) = x where x is the unique x ∈ X such that y = f(x).
This function g then has the property that y = f(x) = f(g(y)) = (f ◦g)(y) for all y ∈ Y , and hence
g satisfies condition (b) of the definition of inverse.
To complete the proof we must prove that the function g also satisfies condition (a). Now, as
f is a function with domain X and codomain Y , for each x ∈ X there exists a unique y ∈ Y
such that y = f(x). But, as y ∈ Y , we have from the definition of g that x = g(y). Therefore
x = g(y) = g(f(x)) = (g ◦ f)(x) for all x ∈ X. The proof is complete.
7.11 Linear transformations and MAPLE
The main result of this chapter is the Matrix Representation Theorem. Accordingly, we can do
calculations with linear transformations by getting Maple to manipulate the corresponding matrices.
Usually you will have to calculate the appropriate matrix by hand, but Maple can handle some
linear transformations directly. [Be aware that Maple displays vectors as rows but treats them as
columns.] Consider for example the linear transformation of projecting onto a fixed vector b ∈ Rn
(see Section 7.3). In the following, the LinearAlgebra command Norm(b,2) calculates ||b||.
with(LinearAlgebra):
b:=<1,2,3>;
T:=a->(b.a/norm(b,2)^2).b);
a:=<1,0,-1>;
T(a);
We could then get Maple to find the matrix for T (with respect to the standard basis) by using
A:=< T(e1) | T(e2) | T(e3) >;
where e1, e2, e3 have been previously defined as the standard basis vectors in R3. Indeed, you can
even get Maple to check that the operator T defined above is linear.
c©2020 School of Mathematics and Statistics, UNSW Sydney
7.11. LINEAR TRANSFORMATIONS AND MAPLE 123
x:=Vector(3,i->X[i]);
y:=Vector(3,i->Y[i]);
T(x+y)-T(x)-T(y);
simplify(%);
This last calculation will give zero, showing that T (x + y) = T (x) + T (y). You should have a go
at showing that T preserves scalar multiplication as well (this turns out to be just a little more
complicated for Maple!). You can also check that the matrix defined above actually does give the
linear transformation T by using:
simplify(A.x - T(x));
Many of the standard calculations concerning linear transformations have special commands in the
LinearAlgebra package. In particular you should look at the procedures NullSpace, ColumnSpace,
and Rank. As usual, the details of these commands are available using the on-line help facil-
ity. For example to find out about NullSpace (which calculates the kernel), enter the command
?NullSpace.
c©2020 School of Mathematics and Statistics, UNSW Sydney
124 CHAPTER 7. LINEAR TRANSFORMATIONS
Problems for Chapter 7
Problems 7.1 : Introduction to linear maps
1. [R] Explain why the function S : [−1, 1] → R defined by S(x) = 5x for x ∈ [−1, 1] is not a
linear map. Then show that the function T : R → R defined by T (x) = 5x for x ∈ R is a
linear map.
2. [R] For the following examples, determine whether T is a linear map by using Definition 1.
a) T : R2 → R4 defined by
T (x) =


3x1 − x2
2x1 + 4x2
−3x1 − 3x2
x2

 for x =
(
x1
x2
)
∈ R2.
b) T : R4 → R3 defined by
T (x) =

 −2x1 + 5x36x1 − 8x2 + 2x4
−2x1 + 4x2 − 3x3

 for x =


x1
x2
x3
x4

 ∈ R4.
c) T : R3 → R2 defined by
T (x) =
(
3x1 + 4
−2x1 + 3x2 − x3
)
for x =

x1x2
x3

 ∈ R3.
d) T : R4 → R3 defined for x =


x1
x2
x3
x4

 ∈ R4 by
T (x) = x1

 13
−2

+ x2

 0−4
2

+ x3

−2−3
4

+ x4

−41
0

 .
e) T : R3 → R2 defined by
T (x) =
(
3x22 − x3
x1 − 4x2
)
for x =

x1x2
x3

 ∈ R3.
3. [X] Consider the complex numbers as a real vector space. Specify the “natural” domain and
codomain for each of the following functions of a complex number and determine if the
function is a linear function with F = R.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 7 125
a) T (z) = Re(z) , b) T (z) = Im(z), c) T (z) = |z|,
d) T (z) = Arg (z) , e) T (z) = z¯ .
4. [R] Show that the sine function T : R→ R, defined by
T (x) = sin(x) for x ∈ R,
satisfies parts 1 and 2 of Proposition 1 of Section 7.1 but that it is not a linear map.
5. [H] Use proof by induction to prove Theorem 3 of Section 7.1.
6. [R] If {v1,v2} are linearly independent in a real vector space V and v3 = 2v1 + v2, is there a
linear map T : W → R2 where W = span(v1,v2) such that
T (v1) =
(
1
2
)
, T (v2) =
(−3
2
)
, T (v3) =
(−1
3
)
?
7. [R] A linear map T : R3 → R4 has function values given by
T

10
0

 =


1
2
3
4

, T

01
0

 =


−3
0
1
4

, T

00
1

 =


4
0
−5
6

. Find T

 2−1
4

 and T

x1x2
x3

.
8. [R] Show that any function T : R3 → R4 with function values given by T

10
0

 =


1
2
3
4

,
T

01
0

 =


−3
0
1
4

, T

00
1

 =


4
0
−5
6

, and T

 1−3
1

 =


2
4
3
1

 is not a linear map.
9. [R] A linear function T : R3 → R2 has function values given by
T

12
3

 = (4
1
)
, T

−21
−4

 = (−1
2
)
, T

 1−1
2

 = (−4−2
)
.
Write

02
1

 as a linear combination of the vectors in the basis



12
3

 ,

−21
−4

 ,

 1−1
2




of R3. Hence find T

02
1

.
HINT. Use Theorem 3 of Section 7.1.
10. [H] Given that T

12
3

 = (4
1
)
, T

−21
4

 = (−1
2
)
and T

 17
13

 = ( 4−2
)
, show that T is
not a linear map.
c©2020 School of Mathematics and Statistics, UNSW Sydney
126 CHAPTER 7. LINEAR TRANSFORMATIONS
Problems 7.2 : Linear maps from Rn to Rm and m× n matrices
11. [R] For any function in question 2 which is a linear map, find a matrix A such that T (x) =
Ax for x in the domain by using the results of the Matrix Representation Theorem of
Section 7.2.
12. [R] For any function in question 2 which is a linear map, write a system of linear equations for
T (x) for x in the domain, and hence find a matrix A such that T (x) = Ax. Check that
the matrices you obtain in this question are the same as the matrices that you obtained
in the previous question.
Problems 7.3 : Geometric examples of linear transformations
13. [R] For each of the following 2 × 2 matrices, draw a picture to show Ae1, Ae2, Ab, where e1
and e2 are the standard basis vectors in R
2 and where b = 2e1 + 3e2.
a)
(
2 0
0 0.7
)
, b)
( −2 0
0 2
)
, c)
( −2 0
0 −3
)
, d)
(
6 −2
6 −1
)
, e)
(
4 −4
3 −4
)
.
14. [R] Draw the image of the star in Figure 4(a) on page 90 under each of the transformations
defined by the matrices in Quesion 13.
15. [R] Let T be the rotation in the plane R2 through angle
π
3
in the anti-clockwise direction.
Find the matrix which represents the linear transformation T .
16. [H] Let x be the position vector of a point X in R2, and let x′ be the position vector of the
point X ′ which is the reflection of X in the x2-axis. (That is, assume that a mirror is
placed along the x2-axis and that X
′ is the reflection of X in the mirror.) Show that the
function T : R2 → R2 defined by T (x) = x′ is a linear map. Find a matrix A which
transforms x =
(
x1
x2
)
into x′ =
(
x′1
x′2
)
.
17. [H] Let x be the position vector of a point X in R3 and let x′ be the position vector of the point
X ′ which is the reflection of X in the (x1, x2)-plane. Show that the function T : R3 → R3
defined by T (x) = x′ is a linear map. Find a matrix A which transforms the position
vector

x1x2
x3

 of X into the position vector

x′1x′2
x′3

 of X ′.
18. [X] Let p be the position vector of a point P in Rn and let q be the position vector of the
point Q which is the reflection of P in the line
x = λd; λ ∈ R.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 7 127
Show that the function T : Rn → Rn defined by T (p) = q is a linear map. Find a matrix
A which transforms p =

p1...
pn

 into q =

q1...
qn

.
19. [R] Let b be a fixed vector in R3. Is the function T : R3 → R3 defined by
T (x) = b× x for x ∈ R3,
where b × x is the cross product, a linear map? Prove your answer. Find a matrix A
which transforms the vector x =

x1x2
x3

 into its function value T (x) =

x′1x′2
x′3

.
20. [H] Suppose that b =

10
2

. Prove that the projection function T : R3 → R3 of vectors onto
b, which is defined by
T (a) = projb(a) =
a · b
|b|2 b for a ∈ R
3,
is a linear map. Find a matrix A which transforms the vector a into its projection T (a).
21. [H] If a is regarded as a fixed non-zero vector, is the function S : Rn → Rn defined by
S(b) =
{
projba for b ∈ Rn\{0}
0 b = 0
a linear map? Prove your answer.
22. [X] Let Aφ, Aθ and Aφ+θ be the matrices for rotations in the plane by angles φ, θ and φ+ θ
respectively (see Example 3 of Section 7.3). Prove that
AθAφ = Aφ+θ.
What is this saying geometrically?
23. [X] Let B = {i, j,k} be an ordered orthonormal basis (Cartesian coordinate system) for a
three-dimensional geometric vector space. Let a be a three-dimensional geometric vector
and let a′ = Rα(a) be the vector obtained by rotating a anticlockwise by an angle α
about an axis parallel to j. If the coordinate vectors of a and a′ are [a]B =

a1a2
a3

 and
[a′]B =

a′1a′2
a′3

, find the rule Rα

a1a2
a3

 =

a′1a′2
a′3

 and show that it defines a linear map
from R3 to R3. Also find the matrix A such that a′ = Aa.
c©2020 School of Mathematics and Statistics, UNSW Sydney
128 CHAPTER 7. LINEAR TRANSFORMATIONS
Problems 7.4 : Subspaces associated with linear maps
24. [R] Show that the set



 λ−2λ
λ

 : λ ∈ R

 is the kernel of the matrix
(
3 1 −1
8 3 −2
)
.
25. [R] Find the kernel and the nullity of each of the following matrices.
a) A =

 2 −1 31 −2 3
4 1 −1

, b) B =

 0 5 152 −2 −4
3 −3 −6

,
c) C =

 1 2 −1 13 2 0 −2
0 1 −1 1

.
Where possible, give a geometric interpretation of the kernels.
26. [R] Find a basis for the kernel, and the nullity, of each of the following matrices.
a) D =

 1 2 5 01 −1 −4 0
−1 0 1 1

, b) E =


1 1 −1
2 −1 0
5 −4 1
0 0 1

.
27. [H] Let W =
{(
x1 x2 x3 x4
)T
: x1 + x2 + x3 + x4 = x1 + 2x2 + 3x3 + 4x4 = 0
}
. Find a
matrix A such that W = kerA.
28. [R] Find ker(T ) and nullity(T ) for the linear functions of question 2.
29. [R] Find ker(T ) and nullity(T ) for the linear functions of questions 16 through 20. Give a
geometric interpretation of the kernels.
30. [H] Suppose that b =

 12
3

.
a) Prove that the mapping T : R3 → R3, given by T (x) = b × x for all x ∈ R3, is a
linear mapping.
b) Find the dimension of the kernel of this mapping.
31. [R] For each given vector b and matrix A, determine if b ∈ im(A).
a) b =

1110
4

 , A =

1 −2 32 −1 3
4 1 −1

.
b) b =

 9−2
−4

 , A =

0 5 152 −2 −4
3 −3 −6

.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 7 129
c) b =

−2−6
−4

 , A =

1 2 −1 13 2 0 −2
0 1 −1 1

.
32. [R] Find conditions on b1, b2, b3 for the vector b =

b1b2
b3

 ∈ R3 for b to belong to im(A) for
the matrices in the preceding question.
33. [R] Find a basis for the image, and the rank, of each of the matrices in questions 25 and 26.
34. [R] By comparing the answers to questions 25, 26 and 33, verify the conclusion of the Rank-
Nullity Theorem.
35. [R] Find a basis for the image, and the rank, of each of the matrices
A =


1 1 −1 −2 1
0 0 1 4 −1
0 0 0 2 2
0 0 0 0 0

 , B =


1 1 0 2 1
0 0 −1 −2 2
−1 −1 1 4 −1
1 1 0 4 2

 .
36. [R] Find a basis for R3 which contains a basis of im(C), where
C =

 1 2 3 42 −4 6 −2
−1 2 −3 1

 .
37. [R] Find a basis for R4 which contains a basis of im(D), where
D =


1 3 3 −1 7
2 6 3 1 8
3 9 3 4 7
4 12 0 8 4

 .
38. [R] A linear map T : R4 → R4 has the property that
T


1
0
0
0

 =


3
0
0
0

 , T


0
1
0
0

 =


4
0
0
0

 , T


0
0
1
0

 =


−1
3
0
0

 , T


0
0
0
1

 =


0
−3
0
1

 .
a) Write down the matrix representation of T with respect to the standard basis (in
both domain and co-domain).
b) Find a basis for the image of T and find the rank of T .
c) State the dimension of the kernel of T .
c©2020 School of Mathematics and Statistics, UNSW Sydney
130 CHAPTER 7. LINEAR TRANSFORMATIONS
d) Does the vector


8
−3
1
2

 belong to the image of T ? Give reasons.
39. [H] Let A ∈Mnn(R). Show that the following statements are equivalent, that is, show that if
any statement is true then all are true, whereas if any statement is false then all are false.
a) For all x and y in Rn, Ax = Ay if and only if x = y.
b) ker(A) = {0}.
c) nullity(A) = 0.
d) rank(A) = n.
e) im(A) = Rn.
f) The columns of A form a basis for Rn.
40. [H] Let A ∈Mmn(R), and let {ej : 1 6 j 6 m} be the set of m standard basis vectors of Rm.
If A is of rank r, explain why at most r of the m equations Axj = ej can have solutions.
41. [H] Let A and ej be as in the previous question. If nullity(A) = ν, explain why at least
m− n+ ν of the m equations Axj = ej do not have solutions.
42. [H] Let A ∈ Mnn(R), rank(A) = n, and ej be the standard basis vectors for Rn. Prove that
each of the n equations Axj = ej , 1 6 j 6 n, has a unique solution.
43. [X] Let T : V → V be a linear map and assume that dim(V ) = n. Show that the following
statements are equivalent.
a) T (v) = T (w) if and only if v = w for all v,w ∈ V .
b) ker(T ) = {0}.
c) nullity(T ) = 0.
d) rank(T ) = n.
e) im(T ) = V .
Problems 7.5 : Further applications and examples of linear maps
44. [X] Show that the function T : R4 →M22(R) defined by
T (a) =
(
a1 a2
a3 a4
)
for a = (a1, a2, a3, a4) ∈ R4
is a linear map.
45. [X] Show that the function T : R4 →M22(R) defined by
T (a1, a2, a3, a4) =
(
3a1 − 2a4 a4 + 2a3
−5a2 + 3a3 a1
)
is a linear map.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 7 131
46. [X] Is the function T :M23(R)→ R6 defined by
T (A) =
(
a11 a12 a13 a21 a22 a23
)T
for A =
(
a11 a12 a13
a21 a22 a23
)
∈M23(R)
a linear map?
47. [R] Show that the function T : P2 → C3 defined by
T (a0 + a1z + a2z
2) =

a0a1
a2


is a linear map.
[X] Note that T maps a polynomial in P2 into its coordinate vector with respect to the
standard basis {1, z, z2}.
48. [H] Show that the function T : C4 → P4 defined by T (a) = p for a =


a1
a2
a3
a4

 ∈ C4, where
p(z) = (a1 − 3a2) + (2a3 − 3a4)z + a2z3 + (3a1 − a2 + 2a3 + 4a4)z4 for all z ∈ C,
is a linear map.
49. [R] Show that the function T : P3(R)→ P3(R) defined by
T (p) = 4p′ + 3p, where p′(x) =
dp
dx
,
is a linear map.
50. [R] Is the function T : P3(R)→ P3(R) defined by T (p) = q, where
q(x) = 4xp′(x)− 8p(x) for x ∈ R,
a linear map? Prove your answer.
51. [H] Show that the function T : P3(R)→ P4(R) defined by T (p) = q, where
q(x) =
∫ x
0
p(t)dt for x ∈ R,
is a linear map.
52. [X] Let V be the subset of the vector space R[R] of all real-valued functions on R defined by
V =
{
f ∈ R[R] :
∫ x
0
f(t)dt exists for all x ∈ R
}
.
c©2020 School of Mathematics and Statistics, UNSW Sydney
132 CHAPTER 7. LINEAR TRANSFORMATIONS
Show that V is a subspace of R[R], and then show that the rule T : V →R[R] defined by
T (f) = g, where
g(x) =
∫ x
0
f(t)dt for f ∈ V and x ∈ R,
is a linear map.
53. [X] A function S : R→ Z is defined by S(x) = y, where y is the integer obtained on rounding
x to the nearest integer. Is S a linear map? A function T : R→ R is defined by T (x) = y,
where y is the integer obtained on rounding x to the nearest integer. Is T a linear map?
54. [X] Let y be a real-valued function with domain R such that y and its first two derivatives y′
and y′′ exist, and such that the Laplace transforms (see Example 7 of Section 7.5) of y,
y′ and y′′ also exist on the interval (0,∞). Given that y(0) = 1 and y′(0) = 2 and that y
satisfies the differential equation
y′′(x) + 4y′(x) + 3y = e−3x,
find an explicit formula for the Laplace transform yL(s) of y in terms of s.
HINT. Take the Laplace transform of the differential equation and use integration by parts
to find formulae for the Laplace transforms of y′ and y′′ in terms of yL(s).
55. [H] Consider the mapping T : P3(R)→ R2 defined by
T (p(x)) =
(
a− b
c− d
)
where p(x) = a+ bx+ cx2 + dx3.
a) Prove that T is linear.
b) Show that p(x) = 3x3 + 3x2 − 2x− 2 is in the kernel of T .
56. [H] Consider the function T : R4 → P1 defined by
T


a
b
c
d

 = (a− 2b) + (c+ d)x.
a) Find T


1
−3
2
−4

.
b) Show T is a linear transformation.
c) Write down a non-zero vector in R4 which lies in ker(T ).
57. [H] A linear map T : C3 → P3 has function values given by
T

10
0

 = 1 + (2 + i)z − 3z3, T

01
0

 = (4− 3i)z + z2, T

00
1

 = −2 for z ∈ C.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 7 133
Find T

 i2
−1

 and T

x1x2
x3

.
58. [X] Let Pn be the real vector space of polynomials of degree less than or equal to n, and take
its standard basis to be
{1, x, x2, . . . , xn}.
For p(x) ∈ P3, let (T (p)) (x) ∈ P4 be defined by
(T (p)) (x) =
∫ x
0
p(t)dt.
a) Show that T is a linear transformation from P3 to P4.
b) Calculate the matrix A of this linear transformation with respect to the standard
bases of P3 and P4.
c) Find a basis for the image of T , im(T ).
d) Find a basis for the kernel of T , ker(T ).
59. [R] A car manufacturer produces a station wagon, a four-wheel drive, a hatchback and a sedan
model. Each model is made from steel, plastics, rubber and glass, and it also requires a
number of hours of labour to produce. The requirements per car of these inputs for each
model are as shown in the following table.
steel plastics rubber glass labour
(tonnes) (tonnes) (tonnes) (tonnes) (hours)
station wagon 1 0.5 0.1 0.2 1
4-wheel drive 1.5 0.6 0.2 0.15 1.5
hatchback 0.8 0.7 0.2 0.2 1.1
sedan 0.9 0.6 0.25 0.3 0.9
Construct a matrix which can be used to express the factory input as a linear function of
the factory output.
Problems 7.6 : [X] Representation of linear maps by matrices
60. [R] Let idR2 be the identity map for R
2. Find a matrix representation of this map with respect
to standard bases in domain and codomain.
61. [X] Let idR2 be the identity map for R
2. Find a matrix representation of this map with respect
to the domain basis
{(
1
1
)
,
(
1
−1
)}
and the codomain basis
{(
1
3
)
,
(
1
2
)}
.
62. [X] A linear mapping G : P2 → P2 has matrix representation
A =

 1 2 −1−1 1 0
3 4 1


c©2020 School of Mathematics and Statistics, UNSW Sydney
134 CHAPTER 7. LINEAR TRANSFORMATIONS
with respect to the standard basis {1, x, x2} in both domain and co-domain. Find G(p),
where p(x) = −3 + x+ 5x2.
63. [X] For each of the linear maps in questions 48 to 51, find a matrix which represents the linear
map for standard bases in the domain and codomain.
64. [X] Using your results of the previous question or otherwise, find the kernel, nullity, image and
rank of the linear maps in questions 48 to 51.
65. [X] For the linear map T : Pn(R)→ Pn(R) defined by T (p) = q, where
q(x) = x2
d2p
dx2
− 3xdp
dx
+ 3p(x) for x ∈ R,
find a matrix which represents T with respect to standard bases in domain and codomain.
Hence, or otherwise, find the kernel and nullity of T .
66. [X] Let V be a vector space and let B = {u1,u2,u3} be an orthonormal basis for V . Let
a ∈ V be a vector whose coordinate vector with respect to B is

a1a2
a3

. Let

a′1a′2
a′3

 be the
coordinate vector of a with respect to the basis B′ = {u′1,u′2,u′3} given by
u′1 =
1√
2
u1 +
1√
2
u3,
u′2 = −
1√
2
u1 +
1√
2
u3,
u′3 = −u2
Show that B′ is an orthonormal basis, then show that the rule T

a1a2
a3

 =

a′1a′2
a′3

 is a
linear map from R3 to R3, and find a matrix representation for this function with respect
to standard bases for R3 in domain and codomain. Finally, show that the matrix you have
constructed is a matrix representation of the identity map idV : V → V with respect to
the basis B in domain and B′ in codomain.
Problems 7.7 : [X] Matrix arithmetic and linear maps
67. [X] Let T : V → W and S : V → W be linear maps. Let BV be a basis for V and BW be a
basis for W and let A and B be the matrices representing T and S with respect to the
bases BV and BW . Prove that A+ B is the matrix representing the sum function T + S
with respect to the bases BV and BW .
68. [X] Let T : U → V and S : V → W be linear maps. Let BU , BV and BW be bases for U , V
and W respectively. Let A be the matrix representing T with respect to bases BU and BV
and let B be the matrix representing S with respect to bases BV and BW . Prove that the
matrix product BA is the matrix which represents the composition function S◦T : U →W
with respect to the bases BU and BW .
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 7 135
Problems 7.8 : [X] One-to-one, onto and invertible linear maps and matrices
69. [X] Let V = C[R], the vector space of all continuous real-valued functions on R. Let
B = {ex, (x− 1)ex, (x− 1)(x− 2)ex} ⊆ V
a) Prove that B is linearly independent.
b) Let W = span (B), and let D : W → W denote the linear transformation
D(f) = f ′,
where f ′ is the derivative of f.
i) Find the matrix for D with respect to the ordered basis B of W.
ii) Find the matrix for the linear transformation T = D ◦D.
iii) Hence or otherwise, prove that for every g ∈W, there exists f ∈W such that
f ′′ = g.
70. [X] For a field F, define T :M22(F)→ F3 by
T
([
a11 a12
a21 a22
])
=

 a22a12 − a21
3a11 + a12

 .
a) Show that T is a linear transformation.
b) Find the kernel of T and the nullity of T .
c) Find the rank of T .
d) Is T one-to-one (injective)? Give a brief reason for your answer.
e) Find the matrix of T with respect to the standard bases of M22(F) and F
3.
Problems 7.11 : Linear transformations and MAPLE
71. [M] Consider the following MAPLE output
> with(LinearAlgebra):
> A:=<<2,4,-2,4>|<-3,-6,3,-6>|<1,2,-2,1>|<-1,-3,2,-2>|<1,-1,-1,-1>>;
A :=


2 −3 1 −1 1
4 −6 2 −3 −1
−2 3 −2 2 −1
4 −6 1 −2 −1


> b:=;
b :=


a1
a2
a3
a4


c©2020 School of Mathematics and Statistics, UNSW Sydney
136 CHAPTER 7. LINEAR TRANSFORMATIONS
> GaussianElimination();

2 −3 1 −1 1 a1
0 0 −1 1 0 a3 + a1
0 0 0 −1 −3 a2 − 2 a1
0 0 0 0 0 a4 − a1 − a3 − a2


a) Find a basis for col(A), the column space of A.
b) What is the dimension of col(A)?
c) Under what conditions does (a1, a2, a3, a4) belong to col(A).
d) Find a basis for the kernel, or null space, of A.
e) What are the values of the rank and nullity of A?
c©2020 School of Mathematics and Statistics, UNSW Sydney
137
Chapter 8
EIGENVALUES AND
EIGENVECTORS
. . . she set to work very carefully, nibbling first at one and then at the other,
and growing sometimes taller, and sometimes shorter,. . .
Lewis Carroll, Alice in Wonderland.
Eigenvalues and eigenvectors are of great theoretical and practical importance. Some practical
applications of eigenvalues and eigenvectors include the following.
1. Oscillations. For example, vibrating strings, organ pipes, wing flutter on an aircraft, vibra-
tions of buildings and bridges, etc.
2. Quantum Physics and Chemistry. Structure of atoms, molecules, nuclei, solids etc.
3. Electronics and Electrical Engineering. Microwave oscillators, amplifiers, signal transmission,
communications networks, etc.
4. Economics. Stability of economic systems, dynamic econometric models, Leontief input-
output models, inventory models, stock market models, etc.
5. Biological and Ecological Systems. Solution of population models, stability of ecological
systems etc.
In this chapter we shall only be able to give a brief introduction to this extremely important topic.
A general theory of eigenvalues and eigenvectors and some applications of them is given in the
second year mathematics courses.
8.1 Definitions and examples
We are concerned with linear maps in which the domain and the codomain are the same vector
space, that is, with linear maps of the form T : V → V . The fundamental questions asked are:
1. Given a map T , are there vectors v ∈ V which are related in a very simple way to their
images T (v) ∈ V ?
c©2020 School of Mathematics and Statistics, UNSW Sydney
138 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
2. [X] Is there a choice of basis for V such that the matrix representing T for this basis takes
on a very simple form?
The answer to both of these questions is yes.
For question 1, we look for vectors for which T (v) is a multiple of v. Formally, we have
Definition 1. Let T : V → V be a linear map. Then if a scalar λ and non-zero
vector v ∈ V satisfy
T (v) = λv,
then λ is called an eigenvalue of T and v is called an eigenvector of T for the
eigenvalue λ.
Note. An eigenvector is non-zero, but zero can be an eigenvalue.
Example 1. For infinitely differentiable real-valued functions f , the derivative
D(f) = f ′, where f ′(x) =
df
dx
for x ∈ R
defines a linear map D. The exponential function satisfies
D(eλx) = λeλx,
and hence eλx is an eigenvector ofD with eigenvalue λ. It should be noted that the great importance
of exponential functions in calculus is due to the fact that they are the only functions f where f ′
is a multiple of f . ♦
Calculus and its applications provides a very rich source of eigenvalue and eigenvector problems.
However, we are mainly concerned in this course with algebraic problems involving linear maps
between finite-dimensional vector spaces. These linear maps can always be represented by matrices,
and hence we will be concerned in the remainder of this chapter with eigenvalues and eigenvectors
of matrices.
When dealing with eigenvalues and eigenvectors of matrices we will be forced to use complex
numbers for our scalar field. The fundamental reason for this is that the eigenvalues of a matrix
are actually zeroes of some polynomial and, as we have seen in Chapter 3, we can only be certain of
finding zeroes when the polynomials are complex polynomials. Thus, the “natural” field of scalars
for eigenvalues and eigenvectors is the set of complex numbers C, and the “natural” vector spaces
are the complex vector spaces Cn (see Example 2 of Section 6.1).
For the special case of a matrix, Definition 1 becomes:
Definition 2. Let A ∈ Mnn(C) be a square matrix. Then if a scalar λ ∈ C and
non-zero vector v ∈ Cn satisfy
Av = λv,
then λ is called an eigenvalue of A and v is called an eigenvector of A for the
eigenvalue λ.
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.1. DEFINITIONS AND EXAMPLES 139
Example 2. For the diagonal 2× 2 matrix,
A =
(
λ1 0
0 λ2
)
,
the standard basis vectors e1 =
(
1
0
)
and e2 =
(
0
1
)
satisfy
A
(
1
0
)
=
(
λ1
0
)
= λ1
(
1
0
)
and
A
(
0
1
)
=
(
0
λ2
)
= λ2
(
0
1
)
.
Thus, e1 is an eigenvector of A with eigenvalue λ1
and e2 is an eigenvector of A with eigenvalue λ2.
A picture of this result is shown in Figure 1 for the
special case of λ1 = 3 and λ2 = −2. ♦
e1
Ae1 = 3e1
e2
Ae2 = −2e2
0
Figure 1: The eigenvectors of the
diagonal matrix
(
3 0
0 −2
)
.
Example 3. For the matrix
A =

 0 1 00 0 1
20 −24 9

 and the vector v =

 15
25


by matrix multiplication, we have Av = 5v, and hence v is an eigenvector of A for the eigenvalue
λ = 5. ♦
8.1.1 Some fundamental results
The fundamental theoretical results for eigenvalues and eigenvectors draw on results given in pre-
vious chapters on linear equations, polynomials, vector spaces, linear maps, and determinants.
The following theorem is extremely important.
Theorem 1. A scalar λ is an eigenvalue of a square matrix A if and only if det(A− λI) = 0, and
then v is an eigenvector of A for the eigenvalue λ if and only if v is a non-zero solution of the
homogeneous equation (A− λI)v = 0, i.e., if and only if v ∈ ker(A− λI) and v 6= 0.
Proof. From Definition 2, A is a square matrix, and an eigenvalue λ and corresponding eigenvector
v of A satisfy the equation
Av = λv, where v 6= 0.
This equation can be rearranged in the form
0 = Av − λv = Av − λIv = (A− λI)v,
where I is an identity matrix of the same size as A.
c©2020 School of Mathematics and Statistics, UNSW Sydney
140 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
Now, A− λI is a square matrix, and hence (by a proposition in Chapter 5) the equation
(A− λI)v = 0
can have a non-zero solution if and only if det(A− λI) = 0. Thus, λ is an eigenvalue if and only if
det(A− λI) = 0 and the first part of the theorem is proved.
Then, if λ is an eigenvalue, v is an eigenvector if and only if it is a non-zero solution of the
above homogeneous equation, that is, if and only if v ∈ ker(A − λI) and v 6= 0. The proof is
complete.
Note. The set of all eigenvectors of A for eigenvalue λ is therefore equal to ker(A − λI) with 0
removed. Also, there are infinitely many eigenvectors corresponding to a single eigenvalue.
A second fundamental result for the theory of eigenvalues is the following.
Theorem 2. If A is an n × n matrix and λ ∈ C, then det(A − λI) is a complex polynomial of
degree n in λ.
This theorem can be proved in a straightforward, but tedious, fashion by direct expansion of
the determinant det(A− λI). For example, for n = 3, we have
A− λI =

a11 a12 a13a21 a22 a23
a31 a32 a33

− λ

1 0 00 1 0
0 0 1

 =

a11 − λ a12 a13a21 a22 − λ a23
a31 a32 a33 − λ

 .
Then, by direct evaluation of det(A− λI) by expansion along the first column, we obtain
det(A− λI) = (a11 − λ)
(
(a22 − λ)(a33 − λ)− a32a23
)
− a21
(
a12(a33 − λ)− a32a13
)
+ a31
(
a12a23 − (a22 − λ)a13
)
= −λ3 + terms containing λ2, λ and constants.
Hence, for n = 3, det(A− λI) is a complex polynomial of degree 3 as stated in Proposition 2.
Definition 3. For a square matrix A, the polynomial p(λ) = det(A− λI) is called
the characteristic polynomial for the matrix A.
Example 4. For the 3× 3 matrix,
A =

 1 −1 23 −4 −1
5 1 2

 ,
the characteristic polynomial is the cubic
p(λ) =
∣∣∣∣∣∣
1− λ −1 2
3 −4− λ −1
5 1 2− λ
∣∣∣∣∣∣ = −λ3 − λ2 + 16λ+ 50.

c©2020 School of Mathematics and Statistics, UNSW Sydney
8.1. DEFINITIONS AND EXAMPLES 141
We can now apply the theory of roots of complex polynomials developed in Chapter 3 to obtain
the following fundamental result.
Theorem 3. An n × n matrix A has exactly n eigenvalues in C (counted according to their
multiplicities). These eigenvalues are the zeroes of the characteristic polynomial p(λ) = det(A−λI).
Proof. From Proposition 2, the characteristic polynomial p(λ) = det(A − λI) is a polynomial of
degree n over the complex field. Thus, from the Factorisation Theorem of Chapter 3, p has exactly
n zeroes (counted according to their multiplicities) which from Theorem 1 are the eigenvalues of
A.
Example 5. For the matrix A of Example 4, the roots of the cubic characteristic polynomial are
(to 4-figure accuracy) 4.688, −2.844 + 1.605i, −2.844− 1.605i, and these are the three eigenvalues
of A. ♦
Note.
1. The equation p(λ) = 0 is called the characteristic equation for A.
2. Theorem 3 is of fundamental theoretical importance, as it proves the existence of eigenvalues
of a matrix. However, with the exception of 2× 2 and specially constructed larger matrices,
modern methods of finding eigenvalues of matrices do not make use of this theorem. These
efficient modern methods are currently available in standard matrix software packages such
as MAPLE, MATLAB.
8.1.2 Calculation of eigenvalues and eigenvectors
As stated above, Theorem 3 provides a practical method for finding eigenvalues of 2×2 or specially
constructed larger matrices. The corresponding eigenvectors can then be obtained from Theorem 1.
Some examples of the calculation of eigenvalues and eigenvectors for simple matrices are as follows:
Example 6. For a diagonal matrix the diagonal entries are eigenvalues and the standard basis
vectors are eigenvectors. For example, if
A =
(
λ1 0
0 λ2
)
then
det(A− λI) = det
((
λ1 0
0 λ2
)

(
λ 0
0 λ
))
=
∣∣∣∣λ1 − λ 00 λ2 − λ
∣∣∣∣ = (λ1 − λ)(λ2 − λ) = 0
has the solutions λ = λ1 and λ = λ2.
Then for the eigenvector corresponding to λ = λ1, we solve
(A− λ1I)v =
(
0 0
0 λ2 − λ1
)(
v1
v2
)
= 0.
It is obvious that one of the solutions is v =
(
1
0
)
= e1.
Similarly, e2 is a solution of the homogeneous equation (A− λ2I)v = 0, and hence e2 is one of
the eigenvectors corresponding to the eigenvalue λ2. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
142 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
Example 7. Find eigenvalues and eigenvectors of
A =
(
2 2
2 2
)
.
Solution. The first step is to find the eigenvalues from the characteristic polynomial.
We have
p(λ) = det(A− λI) =
∣∣∣∣2− λ 22 2− λ
∣∣∣∣ = λ2 − 4λ.
Note that A− λI is obtained from A by subtracting λ from each diagonal element of A, and that
the characteristic polynomial is a quadratic.
The roots of the characteristic equation are
λ1 = 0 and λ2 = 4.
Note that, as asserted in Theorem 3, there are two eigenvalues for the 2× 2 matrix A.
The next step is to find an eigenvector for each eigenvalue by finding ker(A−λI), first for λ = 0,
and then for λ = 4.
For eigenvalue λ1 = 0, the eigenvectors are the non-zero vectors in ker(A). By row reduction,(
2 2 0
2 2 0
)
R2 = R2 −R1−−−−−−−−−−−−−−→
(
2 2 0
0 0 0
)
and then, back substitution gives ker(A) = span (v1) where v1 =
(−1
1
)
. The set of eigenvectors
corresponding to the eigenvalue 0 is then{
t
(−1
1
)
: t 6= 0
}
.
For λ2 = 4, the required eigenvectors are ker(A− 4I) (with 0 deleted) where
A− 4I =
( −2 2
2 −2
)
.
Solving (A− 4I)v = 0 in the same way, we find that v2 =
(
1
1
)
is a basis for ker(A− 4I) and the
set of eigenvectors corresponding to the eigenvalue 4 is then{
t
(
1
1
)
: t 6= 0
}
.
[Note that the scalar field is assumed to be C, so t ∈ C.] ♦
Example 8. Find eigenvalues and eigenvectors of
A =
(
2 1
−1 4
)
.
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.1. DEFINITIONS AND EXAMPLES 143
Solution. The eigenvalues are solutions of the characteristic equation∣∣∣∣ 2− λ 1−1 4− λ
∣∣∣∣ = (2− λ)(4 − λ) + 1 = (λ− 3)2 = 0.
Hence, there is one eigenvalue λ = 3 with multiplicity 2.
Eigenvectors. The eigenvectors are vectors v 6= 0 ∈ ker(A− 3I), where
A− 3I =
( −1 1
−1 1
)
.
On solving (A−3I)v = 0, we find that the only solution is v = t
(
1
1
)
for t ∈ C. A matrix with
fewer linearly independent eigenvectors than columns, as in this example, is called a defective
matrix (poor thing). ♦
As the next example shows, it is also possible to have a 2 × 2 matrix A with one eigenvalue
(with multiplicity 2) and two linearly independent eigenvectors.
Example 9. The matrix
A =
(
3 0
0 3
)
has eigenvalue λ = 3 (with multiplicity 2) and ker(A − 3I) is span
{(
1
0
)
,
(
0
1
)}
. ♦
Example 10. Find all eigenvalues and eigenvectors of the matrix
A =
(
1 2
−2 1
)
.
Solution. As usual the eigenvalues are solutions of the characteristic equation det(A − λI) = 0,
that is, of ∣∣∣∣1− λ 2−2 1− λ
∣∣∣∣ = λ2 − 2λ+ 5 = 0.
In this case the roots of the quadratic are the complex numbers
λ1 = 1 + 2i and λ2 = 1− 2i.
The eigenvectors for λ1 = 1 + 2i,
A− (1 + 2i)I =
( −2i 2
−2 −2i
)
.
An equivalent row-echelon form is
U =
( −2i 2
0 0
)
,
and the eigenvectors are v = t
(−i
1
)
with t ∈ C, t 6= 0.
c©2020 School of Mathematics and Statistics, UNSW Sydney
144 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
For λ2 = 1− 2i the eigenvectors are v = t
(
i
1
)
with t ∈ C, t 6= 0. ♦
The characteristic polynomial of a real 2× 2 matrix has real coefficients, so has two real roots,
one real root with multiplicity 2, or a pair of distinct conjugate complex roots, so the matrix has
two real eigenvalues, one real eigenvalue with multiplicity 2, or two distinct conjugate complex
eigenvalues. Examples 7 – 10 above show all these possibilities.
If A is a complex 2 × 2 matrix, its characteristic polynomial has complex coefficients, and
either two distinct complex roots or one complex root with multiplicity 2, and the matrix has two
eigenvalues or one eigenvalue with multiplicity 2.
For each eigenvalue the space spanned by its corresponding eigenvector(s) is called the eigenspace
for that eigenvalue. Thus, when we write down the eigenvectors for a given eigenvalue, we are really
recording the basis vectors for the corresponding eigenspace.
8.2 Eigenvectors, bases, and diagonalisation
In the examples of the preceding section, we have seen that, with one exception (Example 8), we
obtain two linearly independent eigenvectors for a 2× 2 matrix. Since a 2× 2 matrix A represents
a linear map whose domain is C2, these two eigenvectors form a basis for the domain. The matrix
of Example 8 has one independent eigenvector, and it does not form a basis for the domain.
These results can be generalised to matrices of arbitrary size.
Theorem 1. If an n × n matrix has n distinct eigenvalues then it has n linearly independent
eigenvectors.
[X] Proof. Let the set of n distinct eigenvalues of the n×n matrix A be {λ1, . . . , λn} and let vi be
a corresponding eigenvectors for λi, 1 6 i 6 n. We shall now prove that
S = {v1, . . . ,vn}.
is linearly independent.
Suppose
µ1v1 + · · ·+ µnvn = 0. (#)
We show that µ1 = 0. In similar fashion, µ2 = · · · = µn = 0, so v1, · · · ,vn are linearly independent.
Apply the matrix (A− λ2I) (A− λ3I) · · · (A− λnI) to both sides of #, then we have If j 6= 1,
(A− λ2I) · · · (A− λnI)vj = (λj − λ2) (λj − λ3) · · · (λj − λn)vj = 0 if j 6= 1,
(A− λ2I) · · · (A− λnI)v1 = (λ1 − λ2) (λ1 − λ3) · · · (λ1 − λn)v1 6= 0.
So µ1(λ1 − λ2) · · · (λ1 − λn)v1 = 0, and that is µ1 = 0.
Note. Even if the n × n matrix does not have n distinct eigenvalues, it may have n linearly
independent eigenvectors.
In Examples 2 and 6 of Section 8.1, we have seen that it is very easy to write down eigenvalues
and eigenvectors of diagonal matrices. The next theorem shows that it is sometimes possible to
find an equivalent diagonal matrix for a given matrix.
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.2. EIGENVECTORS, BASES, AND DIAGONALISATION 145
Theorem 2. If an n × n matrix A has n linearly independent eigenvectors, then there exists an
invertible matrix M and a diagonal matrix D such that
M−1AM = D.
Further, the diagonal elements of D are the eigenvalues of A and the columns of M are the eigen-
vectors of A, with the jth column of M being the eigenvector corresponding to the jth element of
the diagonal of D.
Conversely if M−1AM = D with D diagonal then the columns of M are n linearly independent
eigenvectors of A.
[X] Proof. Let the n linearly independent eigenvectors of A be {v1, . . . ,vn}. We now form the
matrix M with these vectors as its columns, i.e.,
M =
(
v1 v2 · · · vn
)
.
Then, from the usual rules of matrix multiplication, we have
AM =
(
Av1 Av2 · · · Avn
)
,
and from Avi = λivi we have
AM =
(
λ1v1 λ2v2 · · · λnvn
)
.
Following the usual rules of matrix multiplication, we can rewrite this equation in the matrix form
AM =
(
v1 v2 · · · vn
)


λ1 0 · · · 0
0 λ2 · · · 0
...
...
. . .
...
0 0 · · · λn

 =MD,
where
D =


λ1 0 · · · 0
0 λ2 · · · 0
...
...
. . .
...
0 0 · · · λn


is the diagonal matrix of eigenvalues. Thus, AM = MD. Further, since the columns of M are a
basis for Cn, the equation Mx = b has a unique solution for all b ∈ Cn, and hence M is invertible.
Then, on multiplying the equation AM =MD on the left by M−1, we have M−1AM = D.
Conversely if M−1AM = D then AM =MD and M is invertible. Let
M =
(
v1 v2 · · · vn
)
and D =

λ1 . . .
λn


then from the first columns of the matrix products on the two sides of AM = MD, we have
Av1 = λ1v1. Similarly Avi = λivi, 1 6 i 6 n.
Finally the columns of an invertible matrix are linearly independent.
c©2020 School of Mathematics and Statistics, UNSW Sydney
146 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
Definition 1. A square matrix A is said to be a diagonalisable matrix if there
exists an invertible matrix M and diagonal matrix D such that M−1AM = D.
Example 1. Show that the matrix
A =
(
3 2
2 3
)
is diagonalisable and find an invertible matrix M and diagonal matrix D such that M−1AM = D.
Solution. We first find the eigenvalues and eigenvectors of A in the usual way. The eigenvalues
are λ1 = 5, λ2 = 1 and corresponding eigenvectors are v1 =
(
1
1
)
and v2 =
(
1
−1
)
. Clearly, v1
and v2 are linearly independent. (Theorem 1 guarantees this, since λ1 6= λ2.) Thus we may apply
Theorem 2, letting D be a diagonal matrix with the eigenvalues as its diagonal elements, and M
be the matrix with corresponding eigenvectors as its columns. For example,
D =
(
5 0
0 1
)
and M =
(
1 1
1 −1
)
.
are the required diagonal matrix D and a suitable invertible matrix M . ♦
Note.
1. The results can be checked by direct multiplication of M−1AM . In the above example, we
readily obtain
M−1 =
(
1
2
1
2
1
2 −12
)
,
and then
M−1AM =M−1
(
3 2
2 3
)(
1 1
1 −1
)
=
(
1
2
1
2
1
2 −12
)(
5 1
5 −1
)
=
(
5 0
0 1
)
= D.
2. The choice of D and M is not unique. For example, we could reverse the order of the
eigenvalues and set
D =
(
1 0
0 5
)
, M =
(
1 1
−1 1
)
.
Also non-zero multiples of eigenvectors are eigenvectors, so multiplying any column of M by
a non-zero scalar would produce another valid diagonalising matrix.
8.3 Applications of eigenvalues and eigenvectors
Some important practical applications have already been noted at the beginning of this chapter.
Many of these applications arise from the study of dynamical systems. A dynamical system is
essentially any system which changes in time. Some examples of such systems include an electrical
power network, a bridge oscillating in a wind, the population of a city or country, an ant colony, a
forest, the Australian economy, an atom, an atomic nucleus.
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 147
8.3.1 Powers of A
A typical problem in, for example, the study of dynamical systems is to find Ak for positive integers
k. There are two results which enable us to easily solve this problem.
Proposition 1. Let D be the diagonal matrix
D =


λ1 0 . . . 0
0 λ2 0
...
. . .
...
0 0 . . . λn

 .
Then, for k > 1,
Dk =


λk1 0 . . . 0
0 λk2 0
...
. . .
...
0 0 . . . λkn

 .
Proof. The proof is by induction.
The result is obviously true for k = 1.
Now assume that the result is true for k = m. Then, on multiplying out,
Dm+1 = DDm =


λ1 0 . . . 0
0 λ2 0
...
. . .
...
0 0 . . . λn




λm1 0 . . . 0
0 λm2 0
...
. . .
...
0 0 . . . λmn


=


λm+11 0 . . . 0
0 λm+12 0
...
. . .
...
0 0 . . . λm+1n

 .
Hence, if the result is true for m it is also true for m+ 1. But, we have already seen that it is true
for m = 1, and hence it is true for all positive integers k.
The second result that we need is as follows:
Proposition 2. If A is diagonalisable, that is, if there exists an invertible matrix M and diagonal
matrix D such that M−1AM = D, then
Ak =MDkM−1 for integer k > 1.
Proof. The proof is by induction.
On multiplying M−1AM = D on the left by M and on the right by M−1, we obtain
A =MDM−1,
c©2020 School of Mathematics and Statistics, UNSW Sydney
148 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
and hence the statement of the proposition is true for k = 1.
Now suppose the statement of the proposition is true for k = m. Then
Am+1 = AAm =MDM−1MDmM−1 =MDDmM−1 =MDm+1M−1,
and hence the statement of the proposition is also true for m + 1. Thus, the statement of the
proposition is true for all positive integers k.
Example 1. Find Ak for
A =
(
3 2
2 3
)
.
Solution. The first step is to check that A is diagonalisable, and, if it is, to find the matrix M
of eigenvectors and diagonal matrix D of eigenvalues such that A =MDM−1. From Example 1 of
Section 8.2, suitable matrices are:
D =
(
5 0
0 1
)
; M =
(
1 1
1 −1
)
; M−1 =
(
1
2
1
2
1
2 −12
)
.
Then,
Ak =MDkM−1 =
(
1 1
1 −1
)(
5k 0
0 1k
)
M−1
=
(
5k 1
5k −1
)( 1
2
1
2
1
2 −12
)
=
(
1
2(5
k + 1) 12(5
k − 1)
1
2(5
k − 1) 12(5k + 1)
)
.
As a check on this solution, note that we obtain I if we substitute k = 0, and A if we substitute
k = 1. ♦
Note. [X] Given a diagonalisable matrix A, we can give meaning to its exponential, using the
power series expansion of ex. We substitute A into the expansion
ex = 1 + x+
1
2!
x2 +
1
3!
x3 + ...
replacing 1 by I.
Since A is diagonalisable, we can write A = PDP−1, with D diagonal, and a simple calculation
shows that
I + PDP−1 +
1
2!
(PDP−1)2 +
1
3!
(PDP−1)3 + ... = P (I +D +
1
2!
D2 +
1
3!
D3 + ...)P−1.
We define this to be the exponential of the matrix.
In the case of a 2×2 matrix with distinct eigenvalues λ1, λ2, by adding the entries in the matrix,
we have
eA = P
(
1 + λ1 +
1
2!λ
2
1 +
1
3!λ
3
1 + ... 0
0 1 + λ2 +
1
2!λ
2
2 +
1
3!λ
3
2 + ...
)
P−1
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 149
= P
(
eλ1 0
0 eλ2
)
P−1.
In a similar way, one can define the sine and cosine (etc) of a matrix.
8.3.2 Solution of first-order linear differential equations
A typical problem in many applications is to find the solution of a pair of first-order linear differential
equations with constant coefficients of the form
dy1
dt
= a11y1 + a12y2
dy2
dt
= a21y1 + a22y2,
with initial conditions y1(0) and y2(0) given.
If t represents time, this pair of equations represents a simple “continuous-time dynamical
system.” For example, in a model of a population, y1(t) might be the number of females at time t
and y2(t) the number of males at time t. The system of equations then describes how the numbers
of females and males change with time.
One method of solution of this system is as follows. We first write the equations in matrix form,
with
y =
(
y1
y2
)
, A =
(
a11 a12
a21 a22
)
,
and obtain
dy
dt
= Ay, with y(0) =
(
y1(0)
y2(0)
)
.
In this matrix form it is clear that there is no real restriction on the number of components of
the vector y. Equations of this type are important in the study of dynamical systems, where they
are given the special name of state-space equations. The vector y in the equation is then called
the state vector and t represents time.
This type of equation is a generalisation of the one-dimensional, first-order, linear differential
equation with constant coefficients that you have met in calculus, and which is of the form
dy
dt
= ay; y(0) = y0 = constant.
This equation has a solution
y(t) = y0e
at.
It is therefore plausible to guess that the n- dimensional equation will have a similar exponential
type of solution. We therefore guess an exponential solution of the form:
y = u(t) = veλt,
where λ is a constant scalar and v is a constant vector. On substituting this guess or “trial solution”
into the matrix equation we obtain
dy
dt
= λveλt = Ay = Aveλt,
c©2020 School of Mathematics and Statistics, UNSW Sydney
150 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
which can be rearranged to give
eλt(Av − λv) = 0.
Now, eλt 6= 0 for all t, and hence our guess is actually a solution only if (A − λI)v = 0. We
therefore arrive at the result:
Proposition 3. y(t) = veλt is a solution of
dy
dt
= Ay
if and only if λ is an eigenvalue of A and v is an eigenvector for the eigenvalue λ.
Example 2. Find solutions of
dy
dt
= Ay where A =
(
3 2
2 3
)
.
Solution. We first find the eigenvalues and eigenvectors of A. We have obtained these previously,
and they are:
λ1 = 5, v1 =
(
1
1
)
and
λ2 = 1, v2 =
(
1
−1
)
.
Hence, two solutions of the equation are
u1(t) = e
5t
(
1
1
)
and u2(t) = e
t
(
1
−1
)
.

The next point to notice is that the linearity of the differential equation leads to the following
proposition.
Proposition 4. If u1(t) and u2(t) are two solutions of the equation
dy
dt
= Ay,
then any linear combination of u1 and u2 is also a solution.
Proof. Let
y(t) = α1u1(t) + α2u2(t),
where α1 and α2 are scalars. Then
d
dt
(
α1u1(t) + α2u2(t)
)
= α1
du1
dt
+ α2
du2
dt
= α1Au1 + α2Au2
= A(α1u1 + α2u2),
and the result is proved.
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 151
Example 2 (continued). In our example, we therefore have that
y(t) = α1e
5t
(
1
1
)
+ α2e
t
(
1
−1
)
.
is a solution of the linear differential equation.
Although we have not proved it, the above solution is the general solution of the original
differential equation, that is, every solution of the differential equation is of the above form.
Now, since there are two unknown scalars in the general solution, two extra conditions must be
specified in order to completely determine the solution. Typical conditions are that the value of
y(t) =
(
y1(t)
y2(t)
)
is given at some t, for example, at t = 0. ♦
Example 2 (continued). Find the solution of
dy
dt
= Ay for A =
(
3 2
2 3
)
, given that y(0) =
(
1
−2
)
.
Solution. On substituting t = 0 into our general solution of the differential equation, and equating
y(0) to the given vector, we obtain
y(0) = α1
(
1
1
)
+ α2
(
1
−1
)
=
(
1
−2
)
.
We can now obtain α1 and α2 by solving this pair of linear equations in the usual way. We find
α1 = −12 and α2 = 32 , and hence the solution of the differential equation is
y(t) = −1
2
e5t
(
1
1
)
+
3
2
et
(
1
−1
)
.

One reason for considering these linear first-order systems of differential equations is that every
linear differential equation can be written as a system of linear first-order differential equations.
We will illustrate the method with an example.
Example 3. Convert the second-order differential equation
d2y
dt2
+ 4
dy
dt
− 5y = 0
to a system of first-order differential equations.
Solution. First define new variables by
y1 = y and y2 =
dy1
dt
=
dy
dt
.
Then, on differentiating y2 and using the differential equation, we find
dy2
dt
=
d2y
dt2
= 5y − 4dy
dt
= 5y1 − 4y2.
c©2020 School of Mathematics and Statistics, UNSW Sydney
152 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
The original second-order equation is therefore equivalent to the pair of first-order equations
dy1
dt
= y2
dy2
dt
= 5y1 − 4y2.
This pair of equations can then be rewritten in matrix form as
dy
dt
= Ay, where A =
(
0 1
5 −4
)
.

It is useful to compare the matrix method of solution of a second-order linear differential
equation with the method of solution usually taught in calculus courses. The final results obtained
by the two methods are, of course, the same.
Example 4 (Matrix method). Guess a solution of form y(t) = u(t) = eλtv and substitute in the
differential equation. Then, u is a solution if λ and v satisfy the eigenvector equation Av = λv.
The eigenvalues are solutions of the characteristic equation det(A− λI) = 0, that is, of
det(A− λI) =
∣∣∣∣−λ 15 −4− λ
∣∣∣∣ = λ2 + 4λ− 5 = 0.
The roots of the quadratic give the eigenvalues λ1 = −5 and λ2 = 1. A solution to (A+5I)v1 = 0
is the eigenvector v1 =
(−1
5
)
, and a solution to (A− I)v2 = 0 is the eigenvector v2 =
(
1
1
)
. The
general solution is therefore
y(t) = α1e
−5t
( −1
5
)
+ α2e
t
(
1
1
)
.
Since y(t) = y1(t), the solution for y(t) in the original second-order equation is
y(t) = y1(t) = −α1e−5t + α2et.

Example 5 (Calculus method). We first guess a solution y(t) = eλt, and substitute in the original
second-order differential equation to obtain the so-called characteristic equation
λ2 + 4λ− 5 = 0.
Note that this characteristic equation is identical to the characteristic equation det(A− λI) = 0 of
the matrix method. See question 24 of the problems for a generalisation of this result.
The roots of the quadratic are λ1 = −5 and λ2 = 1, and hence the general solution is
y(t) = α1e
−5t + α2et,
which is identical to the solution from the matrix method. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 153
In the above example, it is clear that the calculus method gives a much quicker solution than
the matrix method. However, the matrix method has the great advantage that it works for a much
larger class of differential equations than does the calculus method. One reason that the matrix
method works for a larger class of differential equations is that any single higher-order differential
equation can be easily converted into a system of first-order equations, but it is extremely difficult
to convert a given system of first-order equations into a single higher-order differential equation. It
should also be pointed out that the matrix method described above will not work if the matrix A
is not diagonalisable. However, an extension of the matrix method which uses “Jordan forms” can
be developed to handle this case.
Example 6. The atoms in a laser can exist in two states, an “excited state” and a “ground state”.
The laser is initially pumped so that it has 80% of its atoms in the excited state and the remaining
20% in the ground state. When the laser is operating, 70% of the excited atoms decay to the
ground state per second, whereas 40% of the ground state atoms are pumped up to the excited
state per second.
Find the percentage of atoms in each state at a time t seconds after the laser starts to operate.
Solution. Let
x1(t) = % of atoms in excited state at time t
x2(t) = % of atoms in ground state at time t.
During operation the laser is described by the pair of differential equations
dx1
dt
= −70x1(t) + 40x2(t)
dx2
dt
= 70x1(t)− 40x2(t)
That is, in matrix form,
dx
dt
= Ax(t), where A =
( −70 40
70 −40
)
.
The eigenvalues of A are λ1 = 0 and λ2 = −1.1, and corresponding eigenvectors are v1 =
(
4
7
)
and v2 =
(−1
1
)
. The general solution is therefore
x(t) = α1
(
4
7
)
+ α2e
−1.1t
( −1
1
)
.
The initial condition of the laser is given as x(0) =
(
80
20
)
, and hence the values of α1 and α2
can be determined from the equations
x(0) =
(
80
20
)
= α1
(
4
7
)
+ α2
( −1
1
)
,
for which the solution is α1 = 9
1
11 and α2 = −43 711 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
154 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
Thus, the complete solution is
x1(t) =
1
11
(
400 + 440e−1.1t
)
x2(t) =
1
11
(
700 − 440e−1.1t)
Note that, as t → ∞, e−1.1t → 0, and hence the laser settles into a “steady-state” operation in
which there are 400/11 = 36 411% of the atoms in the excited state and 700/11 = 63
7
11% of the atoms
in the ground state. The “steady-state” solution for large t is a scalar multiple of the eigenvector
v1 =
(
4
7
)
corresponding to the eigenvalue λ1 = 0. ♦
8.3.3 [X] Markov chains
Matrices are very useful in studying many discrete-time dynamical systems. Dynamical systems
are ones where the state of the system at stage k + 1 depends solely on the state at stage k.
Example 7. In a certain experiment, a psychologist was testing the learning abilities of rats by
getting them to run a maze. The experimenter started with 100 rats, none of which had previously
run the maze. She then set each of the 100 rats in turn at the maze and noted whether it successfully
ran the maze. She then repeated the process several more times.
She found that, on average, 10% of the rats which failed at one attempt were successful on their
next attempt, whereas 95% of the rats which were successful at one attempt were also successful
at their next attempt. (These numbers are meant for illustration only. They are not taken from
actual experimental data).
For this experiment, calculate the approximate number of rats which successfully run the maze
on the 3rd run, the 20th run and the 50th run.
Solution. Let
x1(k) = number of rats successfully completing the maze at the kth run
x2(k) = number of rats failing the maze at the kth run.
Then, in the (k + 1)th run, we have
x1(k + 1) = 0.95x1(k) + 0.10x2(k)
x2(k + 1) = 0.05x1(k) + 0.90x2(k),
which can be written in matrix form as
x(k + 1) = Ax(k), where A =
(
0.95 0.10
0.05 0.90
)
.
We note that the unique solution of the equation is
x(k) = Akx(0),
as can easily be checked by direct substitution in x(k + 1) = Ax(k).
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.3. APPLICATIONS OF EIGENVALUES AND EIGENVECTORS 155
In our problem, x(0) =
(
0
100
)
, since at the beginning there were 100 rats, none of which had
successfully run the maze.
We now calculate Ak to complete the solution. The eigenvalues of A are λ1 = 1 and λ2 = 0.85,
and corresponding eigenvectors are v1 =
(
2
1
)
and v2 =
(−1
1
)
. Thus, A is diagonalisable and
suitable choices for M , D and M−1 are
M =
(
2 −1
1 1
)
, D =
(
1 0
0 0.85
)
, and M−1 =
(
1
3
1
3
−13 23
)
.
Thus,
x(k) =MDkM−1x(0) =
(
2 −1
1 1
)(
1k 0
0 (0.85)k
)( 1
3
1
3
−13 23
)(
0
100
)
=
100
3
(
2
(
1− (0.85)k)(
1 + 2(0.85)k
)
)
.
Note: As a check, if we substitute k = 0 in this expression, we obtain x(0) =
(
0
100
)
. Further, for
k = 1, we have
x(1) =
100
3
(
2(1− 0.85)
(1 + 2(0.85))
)
=
(
10
90
)
,
which equals
Ax(0) =
(
0.95 0.10
0.05 0.90
)(
0
100
)
=
(
10
90
)
.
Then, for k = 3 the solution is x(3) = (25.72, 74.28), and hence approximately 26 rats will
successfully complete the maze on the third run. For k = 20, the solution is x(0) =
(
64.08
35.92
)
, and
hence approximately 64 rats will successfully complete the maze on the 20th run. For k = 50, the
solution is x(50) =
(
66.65
33.35
)
, and hence approximately 67 rats will successfully complete the maze
on the 50th run. Note that, for large values of k, x(k) is approximately equal to
(
6623
3313
)
, which
corresponds to approximately 67 rats successfully completing the maze on a given run. Note that
this solution for large values of k is a scalar multiple of the eigenvector
(
2
1
)
corresponding to the
eigenvalue λ1 = 1, which is the eigenvalue of A with largest magnitude. ♦
Systems such as the one in Example 7 are called Markov chains. In these systems the objects
or individuals can be in one of a certain number of states and the system is modelled by the matrix
equation
x(k + 1) = Ax(k)
c©2020 School of Mathematics and Statistics, UNSW Sydney
156 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
where x(k) =

x1(k)...
xn(k)

 gives the number of individuals in each of the n states at time k. The n×n
matrix A has the property that all its entries are non-negative, and, for j = 1, . . . , n,
∑n
i=1 aij = 1.
In other words, each column sums to 1. The number aij is the probability that an individual changes
from state j to state i. Usually what we are interested in is finding the long term behaviour of
such a system. That is, how Ak behaves as k → ∞. It turns out that the behaviour exhibited in
Example 7 is typical of these systems. For any such matrix, λ = 1 is an eigenvalue, and indeed
is the eigenvalue of largest magnitude. In almost all cases, Akx(0) converges to a multiple of the
eigenvector corresponding to λ = 1. The limit vector in these cases depends only on the number of
individuals involved, and not on the initial distribution of the individuals into the particular states.
We conclude this section by proving that λ = 1 is always an eigenvalue of such matrix, i.e. the
columns sums are all one.
Lemma 5. If λ is an eigenvalue of A, then λ is also a eigenvalue of AT .
Proof. Question 13 in the problems for this chapter.
Theorem 6. Suppose that A is n × n matrix and that the sum of each of the columns of A is 1.
Then A has 1 as an eigenvalue.
Proof. The hypothesis on A = (aij) is that
a11 + a21 + · · ·+ an1 = 1
a12 + a22 + · · ·+ an2 = 1
...
...
a1n + a2n + · · · + ann = 1
or equivalently,
AT


1
1
...
1

 =


1
1
...
1


Thus, 1 is an eigenvalue of AT , and by the preceding lemma, is thus an eigenvalue of A. (Note that,
in general, (1 1 · · · )T will not be its eigenvector.)
8.4 Eigenvalues and MAPLE
The LinearAlgebra package in Maple has procedures for doing all the calculations described in
this section. The command
with(LinearAlgebra):
loads the LinearAlgebra commands. If A is a square matrix then
Determinant(A-t);
c©2020 School of Mathematics and Statistics, UNSW Sydney
8.4. EIGENVALUES AND MAPLE 157
produces the characteristic polynomial for A. Actually, Maple has a command which will also do
this directly. A slight complication here is that
CharacteristicPolynomial(A,t);
gives the polynomial det(tI−A). This of course has the same roots as the polynomial we use. You
can (and should at least once!) use solve or fsolve to find the roots of this equation, or you can use
the eigenvalues command directly. You can then use kernel to find the eigenvectors.
evals:=Eigenvalues(A);
NullSpace(A-evals[1]);
This will give you a set containing a basis for the eigenspace. Use op to strip off the braces if
necessary.
You can also get Maple to do all this for you at once, and more. The command
EV:=Eigenvectors(A);
returns a sequence with two elements. The first is a Vector with the eigenvalues as entries and the
second is a Matrix whose columns are the eigenvectors in the same order. This matrix is thus a
diagonalising matrix for A, if one exists. Thus if you then do
EV[2]^(-1).A.EV[2]
you will get a diagonal matrix — the same matrix that
DiagonalMatrix(EV[1]);
would give.
c©2020 School of Mathematics and Statistics, UNSW Sydney
158 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
Problems for Chapter 8
Problems 8.1 : Definitions and examples
1. [R] Let
A =
(
3 0
0 −4
)
, B =
(
2 0
0 2
)
, C =
( −3 0
0 0
)
,
and let e1 and e2 be the standard basis vectors for R
2.
a) Write down the eigenvalues and eigenvectors of A, B and C.
b) Draw a sketch of e1, Ae1, e2, Ae2. Then, for some vector x which is not parallel to
either e1 or e2 draw a sketch of x and Ax.
c) Repeat part (b) for the matrix B. Comment on any differences you observe between
the results for A and B.
d) Repeat part (b) for the matrix C. Again comment on any differences you observe
between the results for A and C.
e) For x 6= 0, prove algebraically that Ax is parallel to x if and only if x is parallel to
either e1 or e2, that Bx is parallel to x for all x and that Cx is parallel to e1 for all
x.
2. [R] Show that the vector
(
1
1
)
is an eigenvector of the matrix
(
5 −3
2 0
)
and find the corre-
sponding eigenvalue.
3. [X] Let A be a fixed 3× 3 matrix and define a linear map T :M33 →M33 by T (X) = AX. If
λ is a real eigenvalue of T corresponding to an invertible eigenvector X, find λ in terms of
det(A).
4. [H] Let T be the linear map which reflects vectors in R2 about the line y = x.
a) Explain why
(
1
1
)
and
(
1
−1
)
are eigenvectors of T and give their corresponding
eigenvalues.
b) Find the matrix A such that Tx = Ax for all x ∈ R2.
5. [R] Find the eigenvalues and eigenvectors for
a) A =
(
6 −2
6 −1
)
, b) A =
(−5 2
−6 3
)
.
6. [X] For each of the matrices in the preceding question find two independent eigenvectors v1
and v2. On one diagram sketch the lines
ℓ1 = {x : x = µv1, µ ∈ R}
ℓ2 = {x : x = µv2, µ ∈ R}
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 8 159
and the parallelogram
P = {x : x = µ1v1 + µ2v2, for 0 6 µ1 6 1, 0 6 µ2 6 1}.
Then identify and sketch (on a separate diagram)
{y : y = Ax, x ∈ ℓ1}
{y : y = Ax, x ∈ ℓ2}
{y : y = Ax, x ∈ P}.
Describe the linear mapping Tx = Ax geometrically.
7. [R] Find the eigenvalues and eigenvectors of the following matrices. In each case, note if the
eigenvalues are real, occur in complex conjugate pairs, or are general complex numbers.
Also note if the eigenvectors form a basis for C2.
a)
(
1 2
2 1
)
, b)
(
2 1
0 2
)
, c)
(
3 5
0 −6
)
,
d)
(
0 −2
1 2
)
, e)
(
4 2i
2i 6
)
, f)
(
4 −2i
2i 6
)
.
8. [H] Show that the eigenvalues of a square row-echelon form matrix U are equal to the diag-
onal elements of the matrix. (A square row-echelon form matrix is also called an upper
triangular matrix).
9. [R] Find the eigenvalues and eigenvectors of the row-echelon matrix
U =


2 −4 1 3
0 −2 1 −3
0 0 3 3
0 0 0 5

 .
10. [R] Find the eigenvalues and eigenvectors of the following matrices.
a) A =

1 3 02 2 0
0 0 6

. b) B =

3 0 00 −4 −1
0 6 3

.
Problems 8.2 : Eigenvectors, bases, and diagonalisation
11. [R] For each of the matrices in Questions 7, 9 and 10, decide if the matrix is diagonalisable,
and if it is find an invertible matrix M and a diagonal matrix D such that D =M−1AM .
12. [H] Show that if λ is an eigenvalue of A then λ is also an eigenvalue of the matrix A′ = B−1AB,
where B is any invertible matrix. Also show that if v is an eigenvector of A for eigenvalue
λ then B−1v is an eigenvector of A′ for eigenvalue λ.
c©2020 School of Mathematics and Statistics, UNSW Sydney
160 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
13. [H] Show that if λ is an eigenvalue of A then λ is also an eigenvalue of AT .
HINT: Use the characteristic equation and the properties of determinants.
14. [X] Let A be an n× n matrix. Let TA : Cn → Cn be the linear transformation defined by
TA(x) = Ax for x ∈ Cn.
Let the columns of an n× n matrix B be an ordered basis for Cn. Show that the matrix
representing TA with respect to the basis formed by the columns of B is B
−1AB.
HINT. The method used in Example 3 of Section 7.6 might be helpful.
NOTE. Modern methods of finding eigenvalues search for a change of basis which makes
A′ = B−1AB into an upper triangular matrix. As shown in question 4 the eigenvalues
are then the diagonal elements of the upper triangular matrix. The actual algorithms for
finding the change of basis are complicated.
15. [X] Let T : V → V be linear. Show that if B is any basis for V and A is the matrix representing
T with respect to the basis B in both domain and codomain of T then the eigenvalues of
T and A are the same. What is the relation between the eigenvectors of T and A?
Problems 8.3 : Applications of eigenvalues and eigenvectors
16. [R] Let A =
(
0 6
1 −1
)
. Diagonalise A and hence find A5.
17. [R] Let A =
(
0 3
8 2
)
.
a) Find the eigenvalues and eigenvectors of A.
b) Find matrices P and D such that
A = PDP−1.
c) Write down an expression for An in terms of P and D. Hence evaluate AnP.
18. [R] A first-order linear difference equation (often called a first-order linear recurrence relation)
is an equation of the form
xk+1 = Axk, where k = 0, 1, 2, . . . ,
and where A is a fixed matrix.
The solution of this equation is
xk = A
kx0,
as you can check by direct substitution. For the diagonalisable matrices of questions 7 a,c,
find Ak and hence evaluate xk.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 8 161
19. [R] For each of the diagonalisable matrices of questions 5, 9 and 10 find general solutions of
the differential equations
dy
dt
= Ay.
20. [R] a) Find the eigenvalues and eigenvectors of
(
2 3
1 4
)
.
b) Hence solve the system of differential equations:

dx1
dt
= 2x1 + 3x2,
dx2
dt
= x1 + 4x2.
21. [R] Solve the following systems of differential equations, given that x(0) = y(0) = 100.
a)


dx
dt
= 5x− 8y
dy
dt
= x− y
b)


dx
dt
= 3x− 15y
dy
dt
= x− 5y
22. [R] Solve the following second-order linear differential equations with constant coefficients by
the “calculus method” and by the matrix method and compare your answers.
a) 5
d2y
dt2
− 6dy
dt
+ y = 0.
b)
d2y
dt2
− 16y = 0.
23. [R] What happens if you try to solve the second-order equation
d2y
dt2
+ 4
dy
dt
+ 4y = 0
by the matrix method?
24. [X] Consider the second-order linear differential equation
a
d2y
dt2
+ b
dy
dt
+ cy = 0,
where a, b, c ∈ R and a 6= 0.
a) Assume that the solutions to the characteristic equation aλ2 + bλ + c = 0 for this
second-order differential equation are distinct. By making the substitutions y1 = y
and y2 =
dy1
dt
, convert the differential equation into a system of first-order linear
differential equations
dy
dt
= Ay, where y =
(
y1
y2
)
.
c©2020 School of Mathematics and Statistics, UNSW Sydney
162 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
b) Using matrix methods, show that the general solution of this system is
y = α1e
λ1t
(
1
λ1
)
+ α2e
λ2t
(
1
λ2
)
.
Compare this solution with that obtained using the usual “calculus method” of solving
the original second-order linear differential equation.
25. [X] A radioactive isotope A decays at the rate of 2% per century into a second radioactive
isotope B, which in turn decays at a rate of 1% per century into a stable isotope C.
a) Find a system of linear differential equations to describe the decay process. If we
start with pure A, what are the proportions of A, B, and C after 500 years, after
1000 years, and after 1000000 years?
First solve this problem using matrix methods, and then try to solve the problem
directly by solving the original two differential equations in the right order.
b) Explain how the problem would be different if the rates of decay of A and B were
both 2% per century.
26. [X] There are 3 mathematics lecturers A, B and C who are teaching parallel streams in al-
gebra to a total of 900 students. At the first lecture equal numbers go to each lecture
group. After each lecture a certain percentage of the students in each group decide to stay
with the same lecturer while the remaining percentage divide evenly among the other two
lecturers for the next lecture. If 98% of A’s students stay with A each time, 96% of B’s
students stay with B and 94% of C’s students stay with C, find the numbers of students
in each group in the 12th lecture and in the 24th lecture. Make the assumption that no
students stop attending lectures.
HINT. Set up a model as a difference equation of the type given in Question 18. You may
use MAPLE to find all eigenvalues and eigenvectors. Alternatively, if you wish to solve
the problem by hand calculations, you will need to know that one of the eigenvalues is 1.
NOTE. This problem is an example of a Markov chain process. Markov chain processes are
important in many areas of mathematics and its applications, such as statistics, psychol-
ogy, finance, economics, operations research, queueing theory, inventory theory, diffusion
processes, theory of epidemics etc.
27. [X] Repeat the previous question on the following assumptions. After each lecture, 1% of each
group stop attending lectures altogether, and the remaining percentage either stay with
the same lecturer or divide equally among the other two lecturers for the next lecture. If
97% of A’s students stay with A each time, 95% of B’s students stay with B and 93% of
C’s students stay with C, find the numbers of students in each group in the 12th lecture
and in the 24th lecture. Also find the total number of students attending lectures in the
12th lecture and in the 24th lecture.
28. [X] Consider a modified version of the population dynamics model of Example 9 of Section 7.5,
in which all females are assumed to die at age 74 instead of at age 89, as in the model
given. Use eigenvalues and eigenvectors to solve this modified model, given that there are
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 8 163
one million females in each age group at January 1, 1970. What happens to the population
for large values of k?
NOTE. You will need to use Maple to find the eigenvalues and eigenvectors of the matrix
A, and you may also use Maple, if you wish, to carry out all other matrix manipulations
required to solve the problem.
29. [X] Let A be a 2 × 2 matrix with the property that all its entries are non-negative and both
its columns sum to 1. Show that λ1 = 1 is always an eigenvalue for A, and that if λ2 is
another eigenvalue of A then −1 ≤ λ2 6 1.
Problems 8.4 : Eigenvalues and MAPLE
30. [M] Show that the matrix of the original population dynamics model of Example 9 of Section 7.5
is not diagonalisable.
HINT. Use Maple to find the eigenvalues, and then show that the eigenvalue λ = 0 has
multiplicity 2 and that dim(ker(A)) = 1, i.e., a basis for the kernel of A − 0I consists of
one vector.
NOTE. The original population dynamics model can be solved by a generalisation of the
eigenvalue-eigenvector methods which makes use of “Jordan forms”, and is covered in our
second year linear algebra subjects.
31. [M] Using the following Maple session,
> with(LinearAlgebra):
> M:=<<6|2|2>,<-2|8|4>,<0|1|7>>;
M :=

 6 2 2−2 8 4
0 1 7


> I3:=IdentityMatrix(3);
I3 :=

 1 0 00 1 0
0 0 1


> p:=Determinant(M-t*I3);
p := 336 − 146 t+ 21 t2 − t3
> solve(p,t);
6, 7, 8
> NullSpace(M-6*I3); 


 1−1
1




> NullSpace(M-7*I3); 


20
1




c©2020 School of Mathematics and Statistics, UNSW Sydney
164 CHAPTER 8. EIGENVALUES AND EIGENVECTORS
> NullSpace(M-8*I3); 


21
1




a) state the eigenvalues and corresponding eigenvectors for the matrix M ,
b) find a matrix A such that A−1MA is a diagonal matrix D and write down D,
c) calculate A−1 and hence find an explicit formula for Mk where k is a positive integer.
c©2020 School of Mathematics and Statistics, UNSW Sydney
165
Chapter 9
INTRODUCTION TO
PROBABILITY AND STATISTICS
“What IS the use of repeating all that stuff?”
the Mock Turtle interrupted . . .
Lewis Carroll, Alice in Wonderland.
This chapter introduces mathematical probability, random variables, and probability distri-
butions. The concepts, methods, and applications are required in statistics courses that include
MATH2801/MATH2901 – (Higher) Theory of Statistics, a core subject for the mathematics and
statistics majors, and MATH2089/MATH2099, which are compulsory courses for many second year
engineering students.
Statistics is the science of turning raw data into reliable information on which decisions can be
made, given randomness or variation in the original data. As a science, it aims to uncover patterns
in observations that can be described by mathematical or heuristic models. It is also concerned
with formulating and testing various hypotheses about the context from which the data are drawn.
For instance in order to predict voting patterns in an election, opinions are sought from voters
by carrying out opinion polls. It might not be possible to obtain the views of all voters, so the
preferences of a relatively small sample of voters are obtained. As a result, the opinion poll can
only provide an estimate of the true proportion of voters who favour a particular political party. It
is important to quantify how accurate this estimate can be expected to be.
How many voters must be polled in order for this estimate to be reasonably accurate? How
big is the measurement error? Statistical science has the answers to these frequently asked and
important questions. These answers have had immediate utility. It traditionally cost time and
resources to poll voters, so it mattered whether 1,000 voters formed an adequate sample or whether
10,000 voters were required.
This simple but typical example illustrates nicely the three essential aspects of statistical science:
data production, data analysis, and statistical inference.
Data production
How many sample units should be taken, how should they be selected, and what data should
be measured on each unit? An important part of data production is controlling the measurement
error that invariably arises.
c©2020 School of Mathematics and Statistics, UNSW Sydney
166 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Data Analysis
In order to be presented transparently, data must be organised into easily understood forms,
often as graphics, tables, and summary “statistics”, such as sample means or sample proportions.
For opinion polling, a simple summary of the sample proportions provide the information sought.
Statistical Inference
This is the process of drawing valid conclusions about a whole population based on information
obtained from a part of the population. An essential ingredient here is a random sample from
the population. Given the data obtained from a random sample of voters, what can one infer about
the general voting patterns?
The transition from population to random sample is one instance in which the notion of proba-
bility becomes important. To create a random sample, we must know the probability that a given
member of the population will be selected in the sample. Conversely, probability allows us to model
and forecast real-world behaviour in terms of random processes. In these ways, probability theory
and statistics play important roles in countless contexts, such as clinical trials, weather forecasting,
finance, or traffic control, to name just a few.
First, let us recollect some background set theory and notation.
9.1 Some Preliminary Set Theory
A probability model consists of two components:
1. A set of possible outcomes;
2. The probability of each outcome or set of outcomes.
In this section, we present basic set theory as background material for the first of these components.
The second component will be addressed in Section 9.2.
Definition 1. A set is a collection of objects. These objects are called elements.
We write x ∈ A to express that x is an element of a set A. If x is not an element of A, then we
write x /∈ A.
Example 1. The set A = {1, 2, 3} has elements 1, 2, and 3. Thus, 1 ∈ A but 4 /∈ A, say. ♦
The above definition is circular and imprecise. For instance, it is vulnerable to Russell’s Paradox
(briefly discussed in MATH1081 Discrete Mathematics). One could improve the definition by
insisting that each set must have the property that each conceivable element is either completely
in the set or completely outside of the set, but not both. However, one must improve the definition
further in order to guard it from contradiction, and this is in fact difficult. Fortunately for our
purposes, the above naive definition suffices.
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.1. SOME PRELIMINARY SET THEORY 167
Definition 2.
• A set A is a subset of a set B (written A ⊆ B) if and only if
each element of A is also an element of B; that is, if x ∈ A, then x ∈ B.
• The power set P(A) of A is set of all subsets of A.
• The universal set S is the set that denotes all objects of given interest.
• The empty set ∅ (or {}) is the set with no elements.
Example 2. The set A = {1, 2, 3} has eight subsets. For instance, {2, 3} ⊆ A. The power set of
A is the set of these eight subsets, namely
P(A) = {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} . ♦
Example 3. For problems in 3-dimensional vector geometry, the universal set is usually S = R3.
Points, lines, and planes are then subsets of S. ♦
Definition 3. A set S is countable if its elements can be listed as a sequence.
More formally, S is countable if and only if there is a one-to-one function from S to N.
Example 4.
• Every finite set is countable.
• The integers are countable since we can list them as follows: 0, 1,−1, 2,−2, . . ..
• The rationals are countable. (Challenge: can you list them as a sequence?)
• The reals are not countable; this can be shown by a simple and elegant proof known as
Cantor’s Diagonal Argument.
Sets are often visualised by a Venn diagram as regions in the plane. For instance, here is a
Venn diagram of a universal set S containing a set A:
S
A
Definition 4. For all subsets A,B ⊆ S, define the following set operations:
• complement of A: Ac = {x ∈ S : x /∈ A}
• intersection of A and B: A ∩B = {x ∈ S : x ∈ A and x ∈ B}
• union of A and B: A ∪B = {x ∈ S : x ∈ A or x ∈ B}
• difference: A−B = {x ∈ S : x ∈ A but x /∈ B} = A ∩Bc
c©2020 School of Mathematics and Statistics, UNSW Sydney
168 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Following mathematical convention, “or” in the union definition means “one or the other or both”.
Venn diagrams describing the above set operations are given below.
S
A
Ac
B
S
A
A ∩B
B
S
A
A ∪B
B
S
A
A−B
Example 5. Let S be all students enrolled in MATH1231, let A be those who are 20 years or
older, and let B be those who are engineering students. Then
Ac are MATH1231 students at most 19 years old
A ∩B are MATH1231 students studying engineering and who are 20 years or older
A ∪B are MATH1231 students who study engineering or are 20 years or older
A−B are MATH1231 students who do not study engineering but who are 20 years or older.♦
Definition 5. Sets A and B are disjoint (or mutually exclusive) if and only if
A ∩B = ∅
A Venn diagram showing two disjoint sets A and B is given below:
B
S
A
A ∩B = ∅
Example 6. Let S be the set of people in Australia, let A be the set of people enrolled in
MATH1231, and let B be the set of people aged under 10. Then A and B are disjoint. ♦
Definition 6. Disjoint subsets A1, . . . , Ak partition a set B if and only if
A1 ∪ · · · ∪Ak = B
Note that A and Ac partition the universal set S for each subset A of S.
Example 7. The sets A1 = {1, 3}, A2 = {2}, A3 = {4, 5} partition the set B = {1, 2, 3, 4, 5}. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.1. SOME PRELIMINARY SET THEORY 169
The following simple result will often be used in the rest of the chapter, sometimes implicitly.
Lemma 1. If A1, . . . , An partition S and B is a subset of S, then A1 ∩B, . . . , An ∩B partition B.
This result is illustrated below.
S
B
A1 · · · An
There are many laws governing set operations. Here are just as few:
Distributive Laws A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)
De Morgan’s Laws (A ∪B)c = Ac ∩Bc
(A ∩B)c = Ac ∪Bc
These laws can be proved by logical arguments or by sketching the Venn diagrams for the left-hand
and right-hand sides of the identities. Venn diagrams for De Morgan’s Laws are given here:
B
S
A
(A ∪B)c = Ac ∩Bc
B
S
A
(A ∩B)c = Ac ∪Bc
Definition 7. If A is a set, then |A| is the number of elements in A.
Note that if A and B are disjoint, then
|A ∪B| = |A|+ |B|
The Inclusion-Exclusion Principle. |A ∪B| = |A|+ |B| − |A ∩B|
This result is clear once a Venn diagram is drawn.
The Inclusion-Exclusion Principle may be extended to any finite number of sets. For instance,
|A ∪B ∪C| = |A|+ |B|+ |C| − |A ∩B| − |A ∩ C| − |B ∩ C|+ |A ∩B ∩ C| .
Note that for any subset A of S, we have S = A∪Ac and so |Ac| = |S|−|A|. Hence, for example,
|(A ∪B)c| = |S| − |A ∪B|. The following example makes use of this idea.
c©2020 School of Mathematics and Statistics, UNSW Sydney
170 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Example 8. Of 20 music students, 7 play guitar, 8 play piano, and 3 play both guitar and piano.
How many play neither guitar nor piano?
Solution. Let S be the set of all the music students; let G be the set of students who play guitar;
and let P be the set of students who play piano. By the information given,
|S| = 20 |G| = 7 |P | = 8 |P ∩G| = 3 .
By the Inclusion-Exclusion Principle, the number of students who play neither piano nor guitar is
|(G ∪ P )c| = |S| − |G ∪ P |
= |S| − (|G| + |P | − |G ∩ P |) = 20− (7 + 8− 3) = 8 .
Alternatively, we can draw a Venn diagram of the problem and deduce the answer by filling in the
number of elements in each region:
P G
3 54
8

9.2 Probability
The notion of luck is ancient and has often been seen as an inherent quality that individuals or
objects might possess and whose nature is determined by Fate, whims of the Gods, karmic justice,
mana-like association with other instances of luck, and many other mechanisms. By associating
with lucky individuals or objects, by acting righteously, or by appealing to the Gods, one might
improve one’s luck during one’s present life. Gambling is the competitive realisation of this belief
in influencing one’s luck, and it too is ancient. Good gamblers have appeared throughout history,
and many prominent and talented mathematicians have focused much of their work on gambling
problems and strategies, particularly in the 16-18th centuries. However, most of this work addressed
specific problems and was stunted by incorrect intuitions and by an unfortunate focus on ratios and
odds. This focus is still present in gambling today, where odds are given, rather than percentages.
Apart from a few important exceptions, it was only relatively recently, in the first half of the
20th century, that the notion of luck was treated rigorously and systematically by mathematicians.
Of note, A. Kolmogorov put forth a set of axioms in 1933 that provided a solid framework for
dealing mathematically with the notion of luck, or in mathematical terms: probability.
9.2.1 Sample Space and Probability Axioms
In order to develop a framework for probability, we will first think of any given situation that leads
randomly to a set of outcomes as an experiment. Thus, the roll of a die is seen as an experiment,
as is the Melbourne Cup; countless other such experiments abound, including financial markets,
the weather, election outcomes, or what grade you might get for this course.
Definition 1. A sample space of an experiment is a set of all possible outcomes.
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.2. PROBABILITY 171
Outcomes are also called sample points.
Example 1. Tossing a coin may be seen as an experiment. An appropriate sample space is the
set S = {H,T} where H (“head”) and T (“tail”) are the two possible outcomes.
Example 2. Tossing a coin 3 times can be seen as another experiment. If the object of the
experiment is to determine the resulting coin-flip sequence, then an appropriate sample space is
S1 = {HHH,HHT,HTH, THH,HTT, THT, TTH, TTT} .
On the other hand, if the object of the experiment were to determine the number of resulting heads,
then an appropriate sample space is
S2 = {0, 1, 2, 3} .
Thus, the experiment and its sample space depends on the type of data that we wish to observe.
It is often useful to consider sets of outcomes, particularly if the number of outcomes is large.
This leads to the next definition.
Definition 2. An event is a subset of a sample space.
Note that the set of all events in a sample space S is exactly the power set P(S).
Note also that the empty set ∅ and the whole space S are events.
Example 3. Toss a coin 3 times and consider the event A that we toss 2 heads. This is the subset
of the sample space S1 of Example 2 given by
A = {HHT,HTH, THH} .
Note that each of the outcomes in A forms an event by itself: {HHT}, {HTH}, {THH}.
In each of the above examples, each possible outcome has equal probability. This is not generally
true, so we must define probability in full generality.
Definition 3. A probability P on a sample space S is any real function on P(S)
that satisfies the following conditions:
(a) 0 6 P (A) 6 1 for all A ⊆ S;
(b) P (∅) = 0;
(c) P (S) = 1;
(d) If A and B are disjoint, then P (A ∪B) = P (A) + P (B).
c©2020 School of Mathematics and Statistics, UNSW Sydney
172 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Example 4. Toss a coin and observe whether H or T is tossed. The appropriate sample space is
the set S = {H,T}. Define the probability P on S as follows, for each event A ⊆ S:
P (A) =
|A|
2
.
Then P ({H}) is the probability of tossingH, namely P ({H}) = 12 |{H}| = 12 . Similarly, P ({T}) = 12 .
Note that the probability of tossing neither H nor T is P (∅) = 0, and that the probability of tossing
either H or T is P (S) = 1.
This probability is exactly the probability that one would usually think of when tossing a coin.
However, there are many other possible probabilities. For instance, let p be some real number
between 0 and 1, and define the function Q on S by
Q(∅) = 0 Q({H}) = p Q({T}) = 1− p Q(S) = 1 .
It is easy to verify that Q is a probability on S. To find a physical interpretation of this probability,
one could think of a coin that is twisted or bent, so that the probability of tossingH is not necessarily
the same as that of tossing T . Bear in mind, however, that a probability is a mathematical object
that need not always model a real-world phenomenon.
Example 5. Toss a coin 3 times and observe the resulting (ordered) sequence of H and T , as
in Example 2 above. Let S be the natural sample space consisting of all 8 such sequences. The
appropriate probability P is then given as follows, for each event A ⊆ S:
P (A) =
|A|
8
.
For instance, consider the event A that we toss 2 heads. The probability of this happening is
P (A) =
|A|
8
=
|{HHT,HTH, THH}|
8
=
3
8
.
Now consider the event B = {HHH} that we toss 3 heads. Since |B| = 1, we see that P (B) = 18 .
The probability of tossing at least 2 heads is P (A∪B) which, since A and B are disjoint, equals
P (A ∪B) = P (A) + P (B) = 3
8
+
1
8
=
1
2
.
Example 6. Roll a die and observe the resulting number. An appropriate sample space is then
S = {1, . . . , 6}. The appropriate probability P is given as follows, for each event A ⊆ S:
P (A) =
|A|
6
.
For example, consider the event A that we roll an even number. The probability of this occurring is
P (A) =
|A|
6
=
|{2, 4, 6}|
6
=
3
6
=
1
2
.
Theorem 1. Let P be a probability on a sample space S, and let A be an event in S.
1. If S is finite (or countable), then P (A) =

a∈A
P ({a}) .
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.2. PROBABILITY 173
2. If S is finite and P ({a}) is constant for all outcomes a ∈ S, then P (A) = |A||S| .
3. If S is finite (or countable), then

a∈S
P ({a}) = 1 .
Note that if S is finite, then P (A) may be seen as size, or ratio, of A compared to S. In general,
P (A) may be seen as a measure of how large A is compared to S. Outcomes whose probabilities
are all equal are often referred to as “equally likely”.
Proof. The finite case of statement 1 follows by induction using the additive condition (d) in the
definition of a probability. We will ignore the general case of statement 1 but note that it is often
given as an axiom for probabilities. Statement 3 follows immediately from statement 1 and by
noting that P (S) = 1. Let us now prove the statement 2, so suppose that S is finite and that
P ({a}) is equal to the constant p for all outcomes a ∈ S. By statement 3,
1 =

a∈S
P ({a}) =

a∈S
p = |S|p ,
so p =
1
|S| . By statement 1,
P (A) =

a∈A
P ({a}) =

a∈A
1
|S| =
|A|
|S| . ♦
Example 7. The natural probabilities P in Examples 4–6 may each be expressed as
P (A) =
|A|
|S|
where A is any event in S. This reflects the fact that each outcome is equally likely.
Example 8. Pick a ball at random from a bag containing 3 red balls and 7 blue balls. If each ball
has the same chance as being picked as any other ball, then the chance of picking a red ball is 310 .
Let us express this in mathematical terms. Let S be the sample space consisting of all 10 balls.
Next, let A be the event that a red ball is chosen; A is then the set containing the three red balls.
Since the probability of each outcome is the same, the probability of picking a red ball is
P (A) =
|A|
|S| =
3
10
,
as expected. ♦
The definition of probability only states what is required of a probability; it does not help
us decide upon an appropriate probability for a given experiment. This sort of decision is called
“allocating the probabilities” and is generally based on one of the following three methods:
Method 1. Allocate the probabilities on the basis of any inherent symmetry in the situation.
This is what is applied in games of chance, as illustrated by the die-rolling or coin-tossing exam-
ples that we have seen. It is how you calculated with probability at school by using counting,
permutations, and combinations with equally likely outcomes.
It is also used to allocate probabilities in the following sort of experiment. A “wheel of fortune”
wheel is spun. The probability that it points to some region which subtends an angle θ is θ360 . This
is an example of an experiment with a sample space that is not countable, let alone finite.
c©2020 School of Mathematics and Statistics, UNSW Sydney
174 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Method 2. Allocate the probabilities on the basis of experience or large amounts of data.
This is what actuaries regularly do when creating life tables. The probability of a typical Australian
male aged 60 living to age 65 is 95.4%, based on past history of lots of males aged 60 years.
Method 3. Guess or use intuition.
This is common in society but should be avoided by non-experts when dealing with serious issues.
Indeed, the gambling industry, insurances, supermarket prices, and so on, and even much of politics
and lawmaking, all rely on the common person’s inability to properly understand probabilities,
particularly when it comes to odds of winning or risk of injury. Actuaries and mathematicians
educate themselves to avoid falling for common misconceptions; however, even these experts should
not rely heavily on their intuition of probability.
9.2.2 Rules for Probabilities
Theorem 2. Let A and B be events of a sample space S.
1. P (A ∪B) = P (A) + P (B)− P (A ∩B) (Addition Rule)
2. P (Ac) = 1− P (A)
3. If A ⊆ B, then P (A) 6 P (B).
Proof.
1. This result is connected to the Inclusion-Exclusion Principle.
2. The sets A and Ac partition S, so P (A) + P (Ac) = P (A ∪Ac) = P (S) = 1.
3. The sets A and B −A partition B, so P (A) 6 P (A) + P (B −A) = P (B).
Example 9. What is the probability that at least two of n people share the same birthday?
Solution. Ignoring leap years, let Y be the set of the 365 days of the year. An experiment could
here be to discover the n birthdays, and an associated sample space is
Sn = {(b1, . . . , bn) : b1, . . . , bn ∈ Y } .
We wish to calculate P (An) for the event An ⊆ Sn that at least two of the n people share the same
birthday. Assume that the probability of each person being born on a given date does not depend
on the person or on the date. The outcomes of the sample space then have constant probability,
namely 1|Sn| =
1
365n . Therefore, P (An) = 1− P (Acn) = 1− |A
c
n|
|Sn| . Now, A
c
n is the event that none of
the n birthdays are the same. Therefore,
P (An) = 1− P (Acn) = 1−
|Acn|
|Sn| = 1−
365 × 364× · · · (365 − n+ 1)
365n
.
It is thus slightly more likely than not that at least two of 23 people have the same birthday
(P (A23) = 50.7%), and it is highly likely that at least two of 57 people share the same birthday
(P (A57) = 99.01%). Of course, there will always be at least two people with the same birthday
whenever there are more people than days in the year, and this is expressed by the probability
P (An) = 1 for n > 365. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.2. PROBABILITY 175
Example 10. In some town, 80% of the population has comprehensive car policies, 60% has house
cover, and 10% has neither. What percent has both covers?
Solution. Let A be the event “a person has comprehensive car cover” and let B be the event
“a person has house cover”. For any random person,
P (A) = 0.8 , P (B) = 0.6 , and P (Ac ∩Bc) = 0.1 .
Hence, P (A ∪B) = 1− P ((A ∪B)c) = 1− P (Ac ∩Bc) = 1− 0.1 = 0.9. Therefore,
P (A ∩B) = P (A) + P (B)− P (A ∪B) = 0.8 + 0.6 − 0.9 = 0.5 .
In other words, 50% of the population has both covers. ♦
9.2.3 Conditional Probabilities
We now consider what happens if we restrict the sample space from S to some event in S.
Example 11. In Example 10 above, 80% of people have comprehensive car cover.
However, of those people who have house cover, the percentage who also have comprehensive car
cover is
50
60
= 0.833 or 83.3% .
Thus when we restrict our sample space to those having house cover, the percentage of those
having comprehensive cover changes. We say that the conditional probability of a person having
comprehensive cover given that they have house cover is 0.833 . ♦
Definition 4. The conditional probability of A given B is denoted and defined
by
P (A|B) = P (A ∩B)
P (B)
provided that P (B) 6= 0 .
Lemma 3. For any fixed event B, the function P (A|B) is a probability on S.
Proof. Check that the probability conditions are satisfied for P (A|B).
Since P (S) = 1, we can write, for each event A of S,
P (A) =
P (A ∩ S)
P (S)
= P (A|S) .
Just as P (A) can be seen as a measure of A compared to S, P (A|B) = P (A∩B)P (B) can be seen as a
measure of A (or the part of A that is contained in B) compared to B. This is illustrated by the
following Venn diagrams:
S
B
A
B
A ∩B
c©2020 School of Mathematics and Statistics, UNSW Sydney
176 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Example 12. We roll a die and let A and B be the events that we roll a six and that we roll an
even number, respectively. Then P (A) = 16 and P (B) =
3
6 =
1
2 . Since in this case A ∩B = A,
P (A|B) = P (A ∩B)
P (B)
=
P (A)
P (B)
=
(
1
6
)(
1
2
) = 1
3
.
In other words, given that we rolled an even number, the probability of having rolled a six is 13 .
This is as one would expect since there are 3 even rolls (2,4,6) of which one is six.
In contrast, the probability of rolling an even number, given that we rolled a six, is
P (B|A) = P (B ∩A)
P (A)
=
P (A)
P (A)
= 1 . ♦
Example 13. Consider a bag containing 3 red balls and 3 blue balls. First draw one ball from the
bag and then draw another. Let Ri be the events that a red ball is chosen on the ith draw, where
i = 1, 2, and define B1(= R
c
1) and B2(= R
c
2) similarly for the blue balls. The probability of first
drawing a red ball or a blue is the same, namely P (R1) = P (B1) =
3
6 =
1
2 .
Now, suppose that we first draw a red ball. The bag then contains 2 red balls and 3 blue balls,
so the probability of drawing a red ball on the second draw is P (R2|R1) = 25 . We can also calculate
this probability from the definition of conditional probability. Of the 6×52 = 15 ways of choosing
two of the six balls without order, three of ways give us two red balls. Therefore, we see that
P (R1 ∩R2) = 315 = 15 , so
P (R2|R1) = P (R1 ∩R2)
P (R1)
=
(
1
5
)(
1
2
) = 2
5
.
In contrast, P (R2) = P (B2) =
1
2 since there are equally many red and blue balls to begin with.
Rearranging the terms in the definition of conditional probability yields the following identities.
Multiplication Rule P (A ∩B) = P (A|B)P (B) = P (B|A)P (A)
Example 14. Consider the bag of red and blue balls in Example 13. We saw that the probability
of drawing a red ball on the second draw, given that we first drew a red ball, is P (R2|R1) = 25 .
Similarly, it is easy to see that P (R2|B1) = 35 . To calculate P (R2) without using the symmetry
argument in Example 13, first note that R2 is partitioned by R2∩R1 and R2∩B1: either a red ball
is first drawn, followed by another red ball, or a blue ball and then a red ball are drawn. Hence by
the Multiplication Rule,
P (R2) = P ((R2 ∩R1) ∪ (R2 ∩B1)) = P (R2 ∩R1) + P (R2 ∩B1)
= P (R2|R1)P (R1) + P (R2|B1)P (B1) = 2
5
× 1
2
+
3
5
× 1
2
=
1
2
,
as expected. ♦
Conditional probabilities are implicitly used whenever a tree diagram is drawn. Thus in a
typical two stage experiment we have the following tree diagram
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.2. PROBABILITY 177
b
A
Ac
B
Bc
B
Bc
P (A)
P (Ac)
P (B|A)
P (Bc|A)
P (B|Ac)
P (Bc|Ac)
Example 15. Consider 3 urns containing red and blue balls:
• Urn 1 contains 10 balls, of which 3 are red and 7 are blue;
• Urn 2 contains 20 balls, of which 4 are red and 16 are blue; and
• Urn 3 contains 10 balls, of which 0 are red and 10 are blue.
First, an urn is chosen at random; then a ball is chosen from it at random.
(a) What is the probability that a red ball is chosen from urn 2?
(b) What is the probability of choosing a red ball?
(c) If a red ball were chosen, what is then the probability that it came from Urn 2?
Solution. Assume that we are equally likely to choose any urn and, given an urn, are equally
likely to choose any ball in it. Let U1, U2, and U3 denote the event of choosing Urn 1, 2, and 3,
respectively, and let R and B denote the event of then choosing a red or blue ball, respectively.
Then P (U1) = P (U2) = P (U3) =
1
3 and
P (R|U1) = 3
10
P (R|U2) = 4
20
=
1
5
P (R|U3) = 0
P (B|U1) = 7
10
P (B|U2) = 16
20
=
4
5
P (B|U3) = 1 .
(a) Therefore, P (R ∩ U2) = P (R|U2)P (U2) = 1
5
× 1
3
=
1
15
.
(b) Similarly, P (R ∩ U1) = 3
10
× 1
3
=
1
10
and P (R ∩ U3) = 0
10
× 1
3
= 0.
Now, R = (R ∩ U1) ∪ (R ∩ U2) ∪ (R ∩ U3) and the three terms are disjoint, so
P (R) = P (R ∩ U1) + P (R ∩ U2) + P (R ∩ U3) = 1
10
+
1
15
+ 0 =
1
6
.
(c) Finally, P (U2|R) = P (U2 ∩R)
P (R)
=
(
1
15
)(
1
6
) = 2
5
.

The above example shows one instance of a multi-stage experiment. The conditional probabil-
ities and the multiplication rule for these sort of experiments can be illustrated and applied using
tree diagrams. Such diagrams illustrate all possible outcomes of the multi-stage experiment as
well as the way in which one is able to arrive at those outcomes via the various stages. The branches
carry the conditional probability of the right hand node given that you are at the left node. The
probability of getting from one node to a node to its right is obtained by multiplying the probability
on the connecting branches. A typical sequence of branches is of the form
c©2020 School of Mathematics and Statistics, UNSW Sydney
178 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
b b b b
Start A B
really B ∩A
C
really C ∩B ∩A
P (A) P (B|A) P (C|B ∩A)
Here,
P (C ∩B ∩A) = P (C|B ∩A)P (B ∩A) = P (C|B ∩A)P (B|A)P (A) .
Example 16. The tree diagram for the multi-stage experiment given in Example 15 is as follows:
b
U1
U2
U3
B
R
B
R
B
R
1
3
1
3
1
3
1
0
4
5
1
5
7
10
3
10
← P (B ∩ U3) = 13 × 1 = 13
← P (R ∩ U3) = 13 × 0 = 0
← P (B ∩ U2) = 13 × 45 = 415
← P (R ∩ U2) = 13 × 15 = 115
← P (B ∩ U1) = 13 × 710 = 730
← P (R ∩ U1) = 13 × 310 = 110
Since sequences of branches represent disjoint events,
P (R) = P (R ∩ U1) + P (R ∩ U2) + P (R ∩ U3) = 1
10
+
1
15
+ 0 =
1
6
. ♦
Tree diagrams are very useful for visualising and calculating problems involving small numbers
of conditional probabilities. However, tree diagrams are infeasible when these numbers are large or
only implicitly given. We now derive a mathematical rule that enables us to deal with such cases.
In particular, suppose that the n events A1, . . . , An partition the sample space S:
S
B
A1 · · · An
Since the sets A1 ∩B, . . . , An ∩B partition the event B, we see that
P (B) = P (A1 ∩B) + · · ·+ P (An ∩B) .
Applying the Multiplication Rule to each of these n terms yields the following rule.
Total Probability Rule
If A1, . . . , An partition S and B is an event, then P (B) =
n∑
i=1
P (B|Ai)P (Ai) .
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.2. PROBABILITY 179
Note that we have already used this rule implicitly, for instance to calculate P (R2) in Example 13,
and to calculate P (R) in part (b) of Example 15.
Although the Total Probability Rule is a very simple and almost obvious result, it is very useful.
Furthermore, it implies Bayes’ Rule, which is non-trivial and which often offers surprising results.
Bayes’ Rule
If A1, . . . , An partition S and B is an event, then P (Aj |B) = P (B|Aj)P (Aj)n∑
i=1
P (B|Ai)P (Ai)
.
Proof. Apply the Total Probability Rule to the identity P (Aj |B) = P (Aj ∩B)
P (B)
.
Example 17. Consider the urns and the ball of Example 15 above, and suppose that we drew a
blue ball from one of the urns. What is then the probability P (U1|B) that we drew it from Urn 1?
Solution. Now, U1, U2, U3 partition S since we must choose exactly one urn. Thus by Bayes’ Rule,
P (U1|B) = P (B|U1)P (U1)
P (B|U1)P (U1) + P (B|U2)P (U2) + P (B|U3)P (U3) =
7
10 × 13
7
10 × 13 + 45 × 13 + 1× 45
=
7
25

This example illustrates how Bayes’ Rule allows reverse-inference. To some, this can seem
counter-intuitive and has even caused contention and controversy. Nevertheless, Bayes’ Rule re-
mains a very useful statistical tool that is widely used in medical trials, court cases, and elsewhere.
Example 18. A certain diagnostic test for a disease X indicates with 99% accuracy that a person
has X when that person actually has it. Similarly, the test indicates with 98% accuracy that
someone does not have X when they do not in fact have it. In medical terms, the test is “positive”
if it indicates that a person has the disease and is “negative” otherwise. Suppose that 2% of the
population has the disease.
Find the probability of a false positive, namely that a person without X still tests positive.
Solution. One might guess that this probability would be very small since the test seems so
accurate. Let us calculate whether or not this is true. Thus, let D be the event that the person has
disease X, let Tp be the event that the test shows positive, and set Tn = T
c
p . The tree diagram is
b
D
Dc
Tp
Tn
Tp
Tn
.02
.98
.99
.01
.02
.98
so by Bayes’ Rule,
P (Dc|Tp) = P (Tp|D
c)P (Dc)
P (Tp|Dc)P (Dc) + P (Tp|D)P (D) =
.02× .98
.02 × .98 + .99× .02 = 0.497 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
180 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Thus, almost 50% positives are false, which might seem outrageously inaccurate. Fortunately, the
probability of a false negative, that is, someone with X testing negative, is almost negligible:
P (D|Tn) = P (Tn|D)P (D)
P (Tn|D)P (D) + P (Tn|Dc)P (Dc) =
.01× .02
.01× .02 + .98 × .98 = .000208 .
Thus, if someone is tested for disease X and the test shows that they do not have X, then they
almost certainly do not have it. On the other hand, if the test is positive, then they might possibly
have X, and more accurate (and presumably more expensive or time-consuming) tests can then
made made to determine whether they do in fact have X. ♦
Example 19. We modify Example 18 slightly by supposing that fraction x of population has
disease X and that 5% test positive when a large random sample is tested.
What percentage of the population has disease X?
Solution. Here, the tree diagram is
b
D
Dc
Tp
Tn
TP
Tn
x
1− x
.99
.01
.02
.98
By the Total Probability Rule,
0.05 = P (Tp) = P (Tp|D)P (D) + P (Tp|Dc)P (Dc) = .99x + .02(1 − x) = .97x+ .02 .
Therefore, x = 0.030.97 ≈ .031 = 3.1% of the population has the disease. ♦
9.2.4 Statistical Independence
Intuitively, two events A and B are mutually independent if one does not influence the probability of
the other. This can be expressed as P (A|B) = P (A) and P (B|A) = P (B). That is, the probability
of A does not depend on whether or not B is given, and the same is true for the probability of B.
Since P (A∩B) = P (A|B)P (B) = P (B|A)P (A), we can express this independence quite elegantly:
Definition 5. Events A and B are (statistically) independent if and only if
P (A ∩B) = P (A)P (B)
Note that in contrast to conditional probability, this definition allows all probabilities, including 0.
To visualise statistical independence, let A and B be independent events with P (B) 6= 0. Then,
since P (S) = 1,
P (A)
P (S)
=
P (A ∩B)
P (B)
.
Thus, the probability measure of A is just as great in comparison to the whole sample space S as
is A ∩B, the part of A in B, when compared to B. The following Venn diagram illustrates this:
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.2. PROBABILITY 181
S B
A A ∩B
Note that independence and disjointness are not the same concept, as might be supposed.
Indeed, these two concepts are almost opposite in nature: non-empty events A and Ac are disjoint
but are strongly dependent, for if A occurs, then Ac cannot (independently) occur, and vice versa.
Example 20. We roll a die and let A and B be the events that we roll a six and that we roll an
even number, respectively. Then P (A) = 16 and P (B) =
3
6 =
1
2 . Since in this case A ∩B = A,
P (A ∩B) = P (A) = 1
6
and P (A)P (B) =
1
6
× 1
2
=
1
12
,
so P (A ∩B) 6= P (A)P (B). Therefore, A and B are not independent; that is, they are dependent.
This is as one would expect since if A occurs, then B must necessarily occur.
Now, define A′ to be the event that we roll either a five or a six. Then since A′ ∩B = A,
P (A′) =
2
6
=
1
3
and P (A′ ∩B) = P (A) = 1
6
,
so P (A′)P (B) = 13× 12 = 16 = P (A′∩B). Therefore, A′ and B are independent. In other words, the
probability of rolling an even number is the same, namely 12 , whether or not one of the numbers
five and six is rolled, and the converse is equally true. ♦
Example 21. Roll a dice twice and, for each i = 1, . . . , 6, let Xi and Yi denote the events that we
get i on the first roll and second roll, respectively. Under usual conditions, we may assume that
the first and second throws have no influence on each other. Therefore, Xi and Yj are independent
for all i, j, so
P (Xi ∩ Yj) = P (Xi)P (Yj) = 1
6
× 1
6
=
1
36
.
This identity allows us to calculate more complicated events, such as the event S4 that the sum of
the two rolls is 4. Since S4 is partitioned by X1 ∩ Y3, X2 ∩ Y2, X3 ∩ Y1, we see that
P (S4) = P ((X1 ∩ Y3) ∪ (X2 ∩ Y2) ∪ (X3 ∩ Y1))
= P (X1 ∩ Y3) + P (X2 ∩ Y2) + P (X3 ∩ Y1) = 1
36
+
1
36
+
1
36
=
1
12
.
This result could also have been obtained by viewing this experiment as having a sample space of
the 36 equally likely outcomes
S = {(1, 1), (1, 2), . . . , (6, 6)}
where, for instance, (3, 5) denotes that we first rolled three and then five. Then
P (S4) =
|{(1, 3), (2, 2), (3, 1)}|
|S| =
3
36
=
1
12
,
as before. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
182 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
We now consider the statistical independence of any finite number of events.
Definition 6. Events A1, . . . , An are mutually independent if and only if, for
any Ai1 , . . . , Aik of these,
P (Ai1 ∩ · · · ∩Aik) = P (Ai1)× · · · × P (Aik) .
Example 22. Events A, B, C are mutually independent if and only if these four identities all hold:
P (A ∩B) = P (A)P (B)
P (A ∩ C) = P (A)P (C)
P (B ∩ C) = P (B)P (C)
P (A ∩B ∩B) = P (A)P (B)P (C) ♦
In general for n events to be mutually independent, they must satisfy non-trivial 2n − n − 1
identities such as the ones above, one for each subset of {1, . . . , n} with at least two elements. None
of these many identities imply any of the others in general, so we cannot make do with a smaller
set of identities. This is illustrated by the following example.
Example 23. Draw a ball from a bag containing four balls marked 0 to 3. For i = 0, . . . , 3, let Ai
be the event that ball i is drawn, and let Bi = A0 ∪Ai be the event that ball 0 or ball i is drawn.
Then for all distinct i, j = 1, 2, 3,
P (Bi) = P (A0 ∪Ai) = 2
4
=
1
2
and P (Bi ∩Bj) = P (A0) = 1
4
.
Hence, P (Bi ∩Bj) = P (Bi)P (Bj), so Bi and Bj are independent. In contrast,
P (B1 ∩B2 ∩B3) = P (A0) = 1
4
6= 1
8
= P (B1)P (B2)P (B3) ,
so B1, B2, B3 are not mutually independent. ♦
If events A and B are independent, then A and Bc are also independent:
P (A ∩Bc) = P (A)− P (A ∩B) = P (A)− P (A)P (B) = P (A)(1 − P (B)) = P (A)P (Bc) .
By modifying these calculations slightly and using induction, we can proof the more general result:
Theorem 4. If events A1, . . . , An are mutually independent and Bi is either Ai or A
c
i for each
i = 1, . . . , n, then B1, . . . , Bn are also mutually independent.
Suppose that events A, B, and C are mutually independent. Then by Theorem 4,
P (A ∩ (B ∪ C)) = P (A ∩ ((B − C) ∪ (B ∩ C) ∪ (C −B)))
= P (A ∩B ∩ Cc) + P (A ∩B ∩ C) + P (A ∩ C ∩Bc)
= P (A)P (B ∩ Cc) + P (A)P (B ∩ C) + P (A)P (C ∩Bc)
= P (A)(P (B − C) + P (B ∩ C) + P (B − C))
= P (A)P (B ∪ C) .
We see that A and B ∪ C are also independent. By generalising the above calculations, one may
prove the following result.
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.3. RANDOM VARIABLES 183
Theorem 5. If events A1,1, . . . , A1,n1 , A2,1, . . . , Am,nm are mutually independent and for each
i = 1, . . . ,m, the event Bi is obtained from Ai,1, . . . , Ai,ni by taking unions, intersections, and
complements, then B1, . . . , Bn are also mutually independent.
Example 24 (A reliability example).
A 3-engine plane has a central engine and two wing engines. The plane will crash if the central
engine and at least one of the wing engines fail. On any given flight, the central engine fails with
probability 0.005 , and each wing engine fails with probability 0.008 . Assuming that the three
engines fail mutually independently, find the probability that the plane will crash during a flight.
Solution. Let A be the event that the port engine fails, let B be the event that the starboard
engine fails, and C be the event that the central engine fails. Then P (A) = P (B) = 0.008 and
P (C) = 0.005. Let D denote the event that the plane crashes and note that D = C∩(A∪B). Since
A, B, and C are mutually independent, C and A ∪B are independent by Theorem 5. Therefore,
P (D) = P (C ∩ (A ∪B) = P (C)P (A ∪B) = P (C)[P (A) + P (B)− P (A ∩B)]
= P (C)[P (A) + P (B)− P (A)P (B)]
= 0.005[0.008 + 0.008 − 0.008 × 0.008] = 0.00007968 .
(Note: There are other ways to do this problem.)
Under our assumptions, the plane will crash on a given flight with probability slightly less than
eighty in one million. These are dangerous assumptions, however, since it is highly optimistic to
hope that the engines will fail independently of each other. For instance, if engine failure is caused
by volcanic ash, then all three engines will be at risk of failure, and there are countless other
factors, like shared electric wiring, that might similarly introduce dependence. In view of this, the
real probabilities might be considerably higher than stated. ♦
9.3 Random Variables
It can often be useful to label the outcomes of an experiment by numbers. This often makes event
notation more flexible, and it allows us to perform arithmetic on the outcomes.
Definition 1. A random variable is a real function defined on a sample space.
Example 1. Toss a coin and let S = {H,T} be the associated sample space.
Two random variables X and Y on S are given as follows, for each outcome s ∈ S:
X(s) =
{
1 if s = H
0 if s = T
Y (s) =
{
1 if s = H
−1 if s = T
Example 2. Roll a die and let S = {1, . . . , 6} be the associated sample space.
Two random variables X and Y on S are given as follows, for each outcome s ∈ S:
X(s) = s Y (s) =
{
−1 if s is odd
1 if s is even
c©2020 School of Mathematics and Statistics, UNSW Sydney
184 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Definition 2. For a random variable X on some sample space S, define for all
subsets A ⊆ S and real numbers r ∈ R,
• {X ∈ A} = {s ∈ S : X(s) ∈ A}
• {X = r} = {s ∈ S : X(s) = r}
• {X 6 r} = {s ∈ S : X(s) 6 r}
• ... and so on.
We suppress the curly brackets when expressing the probability of these events. For instance,
P ({X = r}) is written as P (X = r).
Example 3. Roll a die and let X and Y be the random variables defined in Example 2. Then
P (X > 4) = P ({5, 6}) = 1
3
P (Y = 1) = P ({2, 4, 6}) = 1
2
P (Y = π) = P (∅) = 0 .
Example 4. Toss a coin three times and let the random variable X count the number of heads
tossed. Then X(S) = {0, 1, 2, 3} and
P (X = 0) = P ({TTT}) = 1
8
P (X = 1) = P ({HTT, THT, TTH}) = 3
8
P (X = 2) = P ({HHT,HTH, THH}) = 3
8
P (X = 3) = P ({HHH}) = 1
8
.
Definition 3. The cumulative distribution function of a random variable X
is given by
FX(x) = P (X 6 x) for x ∈ R .
We often refer to FX(x) as just F (x). Note that F is non-decreasing and that if a 6 b, then
P (a < X 6 b) = F (b)− F (a) and
0 = lim
x→−∞F (x) 6 F (a) 6 F (b) 6 limx→∞F (x) = 1 .
Example 5. Toss a coin three times and let random variable X be the number of heads tossed,
as in Example 4. Then
F
(3
2
)
= P
(
X 6
3
2
)
= P (X = 0) + P (X = 1) =
1
8
+
3
8
=
1
2
.
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.3. RANDOM VARIABLES 185
9.3.1 Discrete Random Variables
The image of a function is the set of its function values.
Definition 4. A random variable X is discrete if its image is countable.
The random variables in Examples 1–5 are each discrete since their images are finite and thus
countable. We shall for now only consider discrete random variables but will consider certain
non-discrete random variables in Section 9.5, namely those that are continuous.
Definition 5. The probability distribution of a discrete random variable X is
some description of all the probabilities of all events associated with X.
We sometimes write the probabilities as pk = P (X = xk).
Note that for a discrete random variable X, the cumulative distribution function F (x) is
F (x) =

k6x
pk .
Thus, in practice to show that {pk}k>0 is a probability distribution, we need to show that:
(i) pk > 0 and
(ii)

k
pk = 1.
Example 6. Roll a die and define random variables X and Y as in Example 2. The probability
distributions of X and Y can for instance be represented as
P (X = x) =
{
1
6 if x ∈ {1, . . . , 6}
0 otherwise
and
yk −1 1
pk = P (Y = yk)
1
2
1
2
Clearly in each case pk > 0 and for the random variable X, we have

k
pk = 6 × 1
6
= 1, while
for the random variable Y ,

k
pk =
1
2
+
1
2
= 1. Hence, these are probability distributions for the
random variables X and Y respectively. ♦
Example 7. Roll a die twice and let S = {(i, j) : i, j = 1, . . . , 6} be the associated sample space.
Let X be the random variable defined by X(i, j) = i + j, that is, X is the sum of the numbers
showing. The probability distribution of X is
xk 2 3 4 5 6 7 8 9 10 11 12
pk
1
36
2
36
3
36
4
36
5
36
6
36
5
36
4
36
3
36
2
36
1
36
Clearly, pk > 0. Also,∑
k
pk =
1
36
(1 + 2 + 3 + 4 + 5 + 6 + 5 + 4 + 3 + 2 + 1) = 1.
c©2020 School of Mathematics and Statistics, UNSW Sydney
186 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Hence pk is a probability distribution. ♦
Note that, in the above example, we did not actually need to specify the sample space; it would
suffice to define X to be the sum of rolls. Indeed, we often specify the probability distribution
without defining the sample space or even, at times, a random variable with that distribution.
Example 8. The probability distribution of discrete random variable X is given as follows:
xk 0 1 2 4 7
pk = P (X = xk) 0.2 0.3c 0.2 c
2 0.5
(a) Find the value of c.
(b) Find P (X ≥ 4).
Solution.
(a)

pk = c
2 + 0.3c + 0.9 = 1, or c2 + 0.3c− 0.1 = 0.
Solving this gives c = −0.5 or c = 0.2. Since 0.3c = p1 ≥ 0, we conclude that c = 0.2.
(b) By (a), the probability distribution is
xk 0 1 2 4 7
pk 0.2 0.06 0.2 0.04 0.5
Hence, P (X ≥ 4) = P (X = 4) + P (X = 7) = 0.04 + 0.5 = 0.54 . ♦
9.3.2 The Mean and Variance of a Discrete Random Variable
As we have seen in the above examples, the use of random variables can simplify the description
of events and their probabilities. We will now see how they also enable us to perform arithmetic
on outcomes. In particular, we can calculate the weighted averages of outcome values; this is the
expected value, or mean. We can also measure the average of the squares of the distances from the
mean to the outcome values; this is called the variance.
Definition 6. The expected value (or mean) of a discrete random variable X
with probability distribution pk
E(X) =

all k
xkpk .
The expected value E(X) is often denoted by µ or µX .
Example 9. Toss a coin three times and let X count the number of heads tossed as in Example 4.
The probability distribution of X is
xk 0 1 2 3
pk
1
8
3
8
3
8
1
8
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.3. RANDOM VARIABLES 187
so the expected value of X is
E(X) =

k
pk xk = 0× 1
8
+ 1× 3
8
+ 2× 3
8
+ 3× 1
8
=
3
2
.
This agrees with our intuition: on average, half of the throws will be heads.
Example 10. Roll a die twice and let the random variable X be the sum of the rolls. Since a die
roll i is as likely as the roll 7− i, we see that X and 14−X have the same probability distribution,
so the expected value of X is E(X) = 142 = 7. Let us check this using the definition of E(X) and
the probabilities P (X = x) given in Example 10:
E(X) = 2× 1
36
+ · · · + 7× 6
36
+ · · ·+ 12× 1
36
= 7 .

Theorem 1. Let X be a discrete random variable with probability distribution pk = P (X = xk).
Then for any real function g(x), the expected value of Y = g(X) is
E(Y ) = E(g(X)) =

k
g(xk)pk .
[X] Proof. Let {yj} = {g(xk)} be the set of function values of Y , and note that
P (Y = yj) = P ({s ∈ S : g(X(s)) = yj}) = P
( ⋃
k : g(xk)=yj
{s ∈ S : X(s) = xk}
)
=

k : g(xk)=yj
P (X = xk) .
By changing order of summation, we therefore see that
E(Y ) =

j
yjP (Y = yj) =

j
yj

k : g(xk)=yj
P (X = xk) =

j

k : g(xk)=yj
g(xk)pk
=

k

j : yj=g(xk)
g(xk)pk
=

k
g(xk)pk .
The final equality is valid because the second sum only sums over a single element j.
Example 11. Toss a coin three times and let X be the number of heads tossed, as in Examples 4
and 9. By Theorem 1, the expected value of X2 is
E(X2) =

k
pk x
2
k = 0
2 × 1
8
+ 12 × 3
8
+ 22 × 3
8
+ 32 × 1
8
= 3 .

The expected value of a random variable X describes where the values of X are centred. We can
also measure how widely the values of X spread, namely by the average distance (squared) between
the values and the mean.
c©2020 School of Mathematics and Statistics, UNSW Sydney
188 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Definition 7. The variance of a discrete random variable X is
Var(X) = E
(
(X − E(X))2) .
The standard deviation of X is SD(X) =

Var(X) .
The standard deviation is often denoted by σ or σX , and the variance is often written as σ
2 or σ2X .
Theorem 2. Var(X) = E(X2)− (E(X))2.
This formula is often useful in hand calculations.
Proof. Write µ for E(X). By the definition of the variance we have,
Var(X) =

k
(xk − µ)2pk
=

k
(x2k − 2xkµ+ µ2)pk
=

k
x2kpk − 2µ

k
xkpk + µ
2

k
pk = E(X
2)− 2µ2 + µ2 = E(X2)− (E(X))2 .
Example 12. Toss a coin three times and let X be the number of heads tossed. We saw in
Examples 9 and 11 that E(X) = 32 and E(X
2) = 3. Thus by Theorem 2, the variance of X is
Var(X) = E(X2)− (E(X))2 = 3−
(3
2
)2
=
3
4
.

Example 13. Consider a random variable X with probability distribution given below:
xk 0 1 2 4 7
pk 0.2 0.06 0.2 0.04 0.5
The expected values of X and X2 are
E(X) =

xk pk = 0× 0.2 + 1× 0.06 + 2× 0.2 + 4× 0.04 + 7× 0.5 = 4.12
E(X2) =

x2k pk = 0
2 × 0.2 + 12 × 0.06 + 22 × 0.2 + 42 × 0.04 + 72 × 0.5 = 26.0
and the variance of X is
Var(X) = E(X2)− (E(X))2 = 26− (4.12)2 = 9.256 .
Thus, the average root mean square distance of the values of X to the mean E(X) is roughly√
9.256 ≈ 3. ♦
There is generally no easily-described relationship between Var(Y ) and Var(X) when Y = g(X).
However, if Y = aX +B is a linear function of X, then we have the following simple identities:
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.4. SPECIAL DISTRIBUTIONS 189
Theorem 3. If a and b are constants, then
E(aX + b) = aE(X) + b
Var(aX + b) = a2Var(X)
SD(aX + b) = |a|SD(X) .
Proof. Write µ for E(X), we have
E(aX + b) =

(axk + b)pk = a

xkpk + b

pk = aE(X) + b .
By Theorem 2 and the above identity,
Var(aX + b) = E(aX + b− E(aX + b))2 = E(aX − aE(X))2
= a2E(X −E(X))2 = a2Var(X) .
The third statement follows by definition from the second.
The variance σ2 = Var(X) of a random variable X gives a measure of the average square
distance from the expected value E(X) to the values of X.
9.4 Special Distributions
Often, statistical models incorporate specific classes of probability distributions whose expected
value and variance are known. In this section, we shall consider two such classes, namely the
Binomial distributions and the Geometric distributions. These both involve Bernoulli trials and
Bernoulli processes. A Bernoulli trial is an experiment with two outcomes, often “success” and
“failure”, or Y(es) and N(o), or {1, 0}, where P (Y ) and P (N) are denoted by p and q = 1 − p,
respectively. A Bernoulli process is an experiment composed of a sequence of identical and mutually
independent Bernoulli trials. More particularly, the events Ai, denoting the success of the ith trial,
are mutually independent. We have already seen examples of Bernoulli processes in previous
sections, such as tossing a coin repeatedly and considering head-outcomes (p = 12); rolling a die
multiple times to obtain sixes (p = 16 ); or asking each of several people whether it is their birthday
(p = 1365).
Example 1. Tossing a coin three times is a Bernoulli process with n = 3 identical trials that each
result in either H or T , with probabilities p = q = 12 . The trials are mutually independent since
the coin tosses do not influence each other. Let us formally verify this claim. Let A1, A2, and A3
to be the events that H is tossed on the 1st, 2nd, and 3rd toss, respectively. Then
P (A1) = P ({HTT,HTH,HHT,HHH}) = 4
8
=
1
2
.
Similarly, P (A2) = P (A3) =
1
2 . Therefore,
P (A1 ∩A2) = P ({HHT,HHH}) = 2
8
=
1
4
= P (A1)P (A2) ,
and, similarly, P (A2 ∩A3) = P (A2)P (A3) and P (A1 ∩A3) = P (A1)P (A3). Finally,
P (A1 ∩A2 ∩A3) = P ({HHH}) = 1
8
= P (A1)P (A2)P (A3) .
We see that the events A1, A2, A2 are indeed mutually independent. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
190 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Throughout the remainder of this section, let p be a real number with 0 < p < 1 and let
q = 1− p.
9.4.1 The Binomial Distribution
Recall that the expression
(n
k
)
denotes the number of ways to select k objects from n distinct
objects, with order unimportant, and is given by
n!
k!(n− k)! .
Definition 1. The Binomial distribution B(n, p) for n ∈ N is the function
B(n, p, k) =
(
n
k
)
pk(1− p)n−k where k = 0, 1, . . . , n .
Note that B(n, p, k) is a probability distribution. To see this, we can use the Binomial Theorem:

k
B(n, p, k) =
n∑
k=0
B(n, p, k) =
n∑
k=0
(
n
k
)
pk qn−k = (p + q)n = 1n = 1 .
Since B(n, p, k) is nonnegative, we conclude that 0 6 B(n, p, k) 6 1 for all k.
Theorem 1. If X is the random variable that counts the successes of some Bernoulli process with
n trials having success probability p, then X has the binomial distribution B(n, p).
We write X ∼ B(n, p) to denote that X is a random variable with this distribution.
Proof. The variable X can assume values k = 0, 1, . . . , n so we must calculate pk = P (X = k) for
these values. Suppose that the first k trials each results in Y(es) and the rest each results in N(o):
Y · · ·Y︸ ︷︷ ︸
k
N · · ·N︸ ︷︷ ︸
n−k
The trials are independent, so this outcome has probability pk(1− p)n−k. In general, there are(
n
k
)
=
n!
k!(n − k)!
ways for precisely n trials with k Y’s to occur. Therefore,
pk = P (X = k) =
(
n
k
)
pk(1− p)n−k = B(n, p, k) .
Example 2. Toss a coin n = 3 times and let X be the random variable counting the number of
resulting heads (H). The tosses are identical and mutually independent with probability p = 12 of
resulting in H. Thus, Theorem 1 implies that X ∼ B(3, 12 ). This tells us everything about the
probabilities of X; for instance, P (X = 2) =
(3
2
) (
1
2
)2 (1
2
)3−2
= 38 . ♦
Probabilities such as P (X > t), P (X > t), and P (|X − E(X)| > t) are each referred to as a
tail probability.
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.4. SPECIAL DISTRIBUTIONS 191
Example 3. Roll a die n = 12 times and let X be the number of resulting sixes. The rolls are
identical and mutually independent with probability p = 16 of resulting in a six, so by Theorem 1,
X ∼ B(12, 16). Thus for instance, we can calculate the following tail probability:
P (X > 9) = P (X = 10) + P (X = 11) + P (X = 12)
=
(
12
10
)(
1
6
)10(5
6
)2
+
(
12
11
)(
1
6
)11(5
6
)
+
(
1
6
)12
≈ 7.86× 10−7 .
We see that the likelihood of rolling more than nine sixes is less than one in a million. ♦
Example 4. Ask n = 40 people whether today is their birthday and let X count the Yes-answers.
Assume that these questions form identical and mutually independent trials with probability p = 1365
of resulting in a Yes-answer. By Theorem 1, X ∼ B(40, 1365 ). The likelihood of today being the
birthday of at least one of these people is
P (X > 1) = 1− P (X = 0) = 1−
(
364
365
)40
≈ 10.4% .

Theorem 2. If X is a random variable and X ∼ B(n, p), then
• E(X) = np ;
• Var(X) = npq = np(1− p).
Proof. First note that for k ≥ 1,
k
(
n
k
)
= k
n!
k!(n− k)! = n
(n − 1)!
(k − 1)!(n − 1− (k − 1))! = n
(
n− 1
k − 1
)
.
Hence,
E(X) =
n∑
k=0
kpk =
n∑
k=0
k
(
n
k
)
pkqn−k =
n∑
k=1
k
(
n
k
)
pkqn−k
= n
n∑
k=1
(
n− 1
k − 1
)
pkqn−k
= n
n−1∑
j=0
(
n− 1
j
)
pj+1 qn−1−j
= np
n−1∑
j=0
(
n− 1
j
)
pj qn−1−j = np
n−1∑
j=0
B(n− 1, p, j) = np .
See Problem 36 for the second half of the proof.
Example 5. Toss a coin n = 3 times and let X count the number of ensuing heads. By Example 1,
X ∼ B(3, 12), so by Theorem 2, the expected number of resulting heads is E(X) = np = 32 and the
average square distance between the number of heads and E(X) is Var(X) = npq = 34 .
If we now toss the coin n = 12 times, then X ∼ B(12, 12 ), so the expected number of resulting
heads is E(X) = np = 122 = 6, and Var(X) = npq = 3. ♦
c©2020 School of Mathematics and Statistics, UNSW Sydney
192 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Example 6. If we roll a die n = 12 times, then one might intuitively expect to roll 12× 16 = 2 sixes
on average. This is also what we find by the following calculations. If X is the random variable
counting the number of sixes rolled, then by Example 3, X ∼ B(12, 16). Thus by Theorem 2, the
expected number of resulting sixes is E(X) = np = 12× 16 = 2, as we expected. ♦
The distributions B(12, 12) and B(12,
1
6) appearing in Example 2 and 6 are illustrated below.
Note that the function values B(12, 12 , k) are centered symmetrically around E(X) = 6 and spread
out gradually to the extremities k = 0, 12. In contrast, the function values B(12, 16 , k) are clustered
asymmetrically about the expected value E(X) = 2 and taper rapidly off, so that B(12, 16 , k) is
nearly zero for k > 7. Thus, it is extremely highly unlikely that we would roll at least seven sixes
when rolling a die twelve times.
0 6 12
0
0.15
0.30
k
pk
B(12, 12)
0 6 12
0
0.15
0.30
k
pk
B(12, 16 )
9.4.2 Geometric Distribution
Definition 2. The Geometric distribution G(p) is the function
G(p, k) = (1− p)k−1p = qk−1p where k = 1, 2, . . . .
Note that G(p, k) is a probability distribution since 0 6 G(p, k) 6 1 for all k and since

k
G(p, k) =
∞∑
k=1
G(p, k) =
n∑
k=1
qk−1 p = p
n∑
k=0
qk = p
1
1− q = p
1
p
= 1 .
Theorem 3. Consider an infinite Bernoulli process of trials each of which has success probability p.
If the random variable X is the number of trials conducted until success occurs for the first time,
then X has the geometric distribution G(p).
We write X ∼ G(p) to denote that X has this distribution. Note that it is theoretically possible
for a success never to happen (X =∞); however, this has zero probability. We therefore omit the
all-failure outcome from the sample space so that X is a well-defined finite number.
Proof. The variable X can assume values k = 1, 2, . . . , so we must find pk = P (X = k) for these
values. The event {X = k} consists of the outcome in which the first k − 1 trials each result in
N(o) and the kth trial results in Y (es):
N · · ·N︸ ︷︷ ︸
k−1
Y
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.4. SPECIAL DISTRIBUTIONS 193
The trials are independent, so this outcome has probability
pk = P (X = k) = (1− p)k−1p = G(p, k) .
Example 7. Toss a coin until H(ead) is tossed and let X count the number of these tosses. The
tosses are identical and mutually independent with probability p = 12 of resulting in H. Theorem 3
then implies that X ∼ G(12 ). Thus, the likelihood of having to toss the coin seven times before
tossing H is
P (X = 7) =
(
1− 1
2
)7−1 1
2
=
1
27
≈ 0.8% . ♦
Tail probabilities are very easily expressed for geometrically distributed random variables:
Theorem 4. If X ∼ G(p) and n is a positive integer, then P (X > n) = (1− p)n = qn.
Proof. P (X > n) =
∞∑
k=n+1
P (X = k) =
∞∑
k=n+1
qk−1p = qn
∞∑
k=1
qk−1p = qn
∞∑
k=1
G(p, k) = qn .
Theorem 4 gives us a simple expression for the cumulative distribution function F (x) of X:
Corollary 5. If X ∼ G(p), then the cumulative distribution function F is given by F (x) = P (X 6
x) = 1− (1− p)⌊x⌋ = 1− q⌊x⌋ for x ∈ R.
Note that ⌊x⌋ denotes the largest integer less or equal to x.
Example 8. Roll a die until six is rolled, and let X count the number of these rolls. The rolls are
identical and mutually independent with probability p = 16 of resulting in six. Theorem 3 implies
that X ∼ G(16 ). By Theorem 4, the likelihood of rolling a six within at most four rolls is
F (4) = P (X 6 4) = 1− P (X > 4) = 1−
(
1− 1
6
)4 ≈ 52% ,
or close to half. Similarly, F (6) = P (X 6 6) ≈ 23 and P (X > 7) ≈ 28%. ♦
Theorem 6. If X is a random variable and X ∼ G(p), then
• E(X) = 1p ;
• Var(X) = 1−p
p2
.
Proof. First note that for x 6= 0, using Power Series results from Calculus,
∞∑
k=0
kxk−1 =
d
dx
∞∑
k=0
xk =
d
dx
1
1− x =
1
(1− x)2 .
Hence,
E(X) =
∞∑
k=1
kqk−1p =
p
(1 − q)2 =
p
p2
=
1
p
.
The second part of the proof is left as an exercise.
c©2020 School of Mathematics and Statistics, UNSW Sydney
194 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Example 9. Toss a coin until H(ead) is tossed and let X count the number of these tosses.
As seen in Example 7, X ∼ G(12 ). Thus, we must expect to on average have to toss the die
E(X) = 1p = (
1
2 )
−1 = 2 times in order to toss a head; this is presumably what most would expect
intuitively. Note that this is a relatively accurate estimate of the average since the average squared
distance from E(X) = 2 to the infinitely many values of X is only Var(X) = 1−p
p2
= (1− 12 )/(12 )2 = 2.
Indeed by Theorem 4, the likelihood that we must toss the coin more than three times to get a
head is P (X > 3) = (12)
3 = 18 = 12.5%, which is relatively small. ♦
Example 10. Roll a die until six is rolled, and let X count the number of these rolls as in
Example 7. In that example, we saw that X ∼ G(16 ), so we must expect on average to have to toss
the die E(X) = 1p = (
1
6)
−1 = 6 times in order to toss a head. As in the coin-tossing example above,
this expected value is what one might guess intuitively. However, in contrast to that example, the
present expected value E(X) = 6 is not a particularly precise estimate of an average tossing count.
In particular, the average squared distance from E(X) = 2 to the infinitely many values of X is
Var(X) = 1−p
p2
= (1− 16)/(16 )2 = 30. Indeed by Example 7, the likelihood of requiring at most four
rolls in order to roll a six is (slightly) more than half, and the likelihood of requiring at least eight
rolls is more than a quarter. ♦
The distributions G(12 ) and G(
1
6 ) appearing in Example 8 and 10 are illustrated below. Note
that the function values G(12 , k) are large to begin with but very quickly decrease, which is reflected
by the small expected value and variance that both equal E(X) = Var(X) = 2. In contrast, the
function values G(16 , k) are small to begin with but only gradually decrease. This is indicated by
the expected value E(X) = 6 and the large variance Var(X) = 30. Thus, it is very likely that we
would toss a head after only a few tosses of a coin; whereas it would not be possible to form such
a good estimate about the number of die-rolls required to roll a six.
1 2 3 4 5 6 7 8 9 10 11 12 −→
0
0.25
0.50
k
pk
G(12 )
1 2 3 4 5 6 7 8 9 10 11 12 −→
0
0.25
0.50
k
pk
G(16 )
9.4.3 Sign Tests
Often, we have a sample of data consisting of independent observations of some quantity of interest,
and it might be of interest to see whether the observed values differ systematically from some fixed
and pre-determined value.
Example 11. Crop research shows that a new variety of corn yields, in bushels per acre, for 15
plots of land:
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.5. CONTINUOUS RANDOM VARIABLES 195
138.0 139.1 113.0 132.5 140.7
109.7 118.0 134.8 109.6 127.3
115.6 130.4 130.2 117.7 105.5
A variety of corn currently used yields 110 bushels per acre. We want to know whether the new
variety improves on the existing one – that is, are the above values centred around a true value for
yield of 110, or are they systematically different from the value 110?
To answer this question, one may use a “sign test” approach as follows:
1. Count the number of observations that are strictly greater than the target value (“+”).
2. Count the total number of observations that are either strictly greater (“+”) or strictly smaller
(“–”) than the target value.
3. Calculate the tail probability that measures how often one would expect to observe as many
increases (“+”) as were observed, if there were equal probability of “+” and “–”.
Using this approach, we can now determine whether the new variety of corn has a higher yield
than the current variety.
1. The yield was strictly greater (“+”) than 110 bushels/acre in 12 plots.
2. The yield was either strictly greater (“+”) than or strictly smaller (“-”) than 110 bushels/acre
in all 15 plots.
3. Assuming that probabilities of greater-than (“+”) yield probabilities for each plot are identical
and mutually independent, we can model these yields binomially. In particular, let X be the
random variable that counts the yields exceeding the average yield of 110 bushels per acre;
then X ∼ B(15, 12). The probability p = 12 is set under the assumption that smaller-than
(“-”) yields are as likely as greater-than (“+”) yields and that no equal-to yields occur; this
assumes that the new crop has the same yield as the old one. The tail probability that 12 or
more plots have a greater-than (“+”) yield is then
P (X > 12) =
15∑
k=12
(
15
k
)(
1
2
)k (1
2
)15−k
= 1.76% .
Under the assumption that the average yield for the new variety has not improved, it is quite
unlikely (less than 2%) that we would have observed 12 of the 15 yields above the old average yield.
We therefore conclude that the new variety has improved yield.
In this course, we will say that if the tail probability is less than 5% then we will regard this as
significant.
9.5 Continuous random variables
In the previous sections, we considered discrete random variables. These can assume countably
many values assigned to the outcomes of an experiment. Although there may be infinitely many of
these values, this is not sufficient to model many real-life experiments in which outcomes may be
assigned any real values from some interval, such as the height or weight of individuals, or the half-
life of a radioactive isotope. We will therefore now consider a type of non-discrete random variable,
c©2020 School of Mathematics and Statistics, UNSW Sydney
196 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
called a continuous variable X. In contrast to the discrete random variables, these cannot be
defined by probabilities P (X = x) of single values x, since these probabilities each equal 0. Instead,
we will define continuous random variables in terms of the cumulative distribution function
F (x) = FX(x) = P (X 6 x) for x ∈ R .
Definition 1. Random variable X is continuous if and only if FX(x) is continuous.
Strictly speaking, FX(x) must actually be piecewise differentiable, which means that FX(x) is
differentiable except for at most countably many points. However, the above definition is good
enough for our present purposes.
Example 1. At a random point during the day, we take note of the time, ignoring the date and
number of hours. This gives us a real number (of minutes) that we can represent by a random
variableX with function values lying in the interval [0, 60). If x is one of these values, then, assuming
that any time is as likely as another, our intuition tells us that F (x) = P (X 6 x) = P (0 6 X 6 x)
is the size of the interval [0, x] compared the size of [0, 60); measured in lengths, this is x60 .
We therefore find that F (x) is the function
F (x) = P (X 6 x) =


0 if x 6 0
x
60 if 0 < x 6 60
1 if x > 60
F (x)
0 60
x
1
This is a continuous function, so X is a continuous random variable. ♦
For discrete random variables X, the cumulative distribution function F (x) is a sum over prob-
ability distribution values pk = P (X = xk). For continuous random variables, F (x) is an integral
over continuous function analogues of the discrete probability distributions. These analogues are
given in the following definition.
Definition 2. The probability density function f(x) of a continuous random
variable X is defined by
f(x) = fX(x) =
d
dx
F (x) , x ∈ R
if F (x) is differentiable, and lim
x→a−
d
dx
F (x) if F (x) is not differentiable at x = a.
Since F (x) is non-decreasing and lim
x→∞F (x) = 1, the probability density function satisfies
f(x) > 0 for all x and
∫ ∞
−∞
f(x)dx = 1 .
Theorem 1. F (x) =
∫ x
−∞
f(t)dt.
Proof. This follows from the Fundamental Theorem of Calculus since lim
x→−∞F (x) = 0.
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.5. CONTINUOUS RANDOM VARIABLES 197
Note that if a 6 b, then
P (a 6 X 6 b) = P (a < X 6 b) = F (b)− F (a) =
∫ b
a
f(x)dx .
Example 2. At a random point during the day, we take note of the time as in Example 1, and
let X again be the continuous random variable X that tell us how far past the hour the time is.
The density function f(x) is calculated by differentiating the cumulative distribution function F (x)
found in Example 1:
f(x) =
d
dx
F (x) =
{
1
60 , 0 < x 6 60
0 , otherwise .
b bc
bc b
f(x)
0 60
x
1
60
or, more compactly written,
f(x) =
1
60
for x ∈ (0, 60] .
Here, we differentiated F (x) from the left at the points x = 0, 60.
The probability that we noted the time between a quarter past and half past the hour is
P (15 6 X 6 30) =
∫ 30
15
f(x)dx =
∫ 30
15
1
60
dx =
1
4
,
as we would intuitively expect. ♦
9.5.1 The mean and variance of a continuous random variable
The mean of continuous random variable is obtained (using the notion of Riemann Sums) by
replacing sums by integrals, and probability distributions by probability density functions:
Definition 3. The expected value (ormean) of a continuous random variable X
with probability density function f(x) is defined to be
µ = E(X) =
∫ ∞
−∞
xf(x)dx .
Here, and in the following, we assume that all improper integrals converge.
The following theorem is the continuous analogue of Theorem 1 in Section 9.3.2.
Theorem 2. If X is a continuous random variable with density function f(x), and g(x) is a real
function, then the expected value of Y = g(X) is
E(Y ) = E(g(X)) =
∫ ∞
−∞
g(x)f(x)dx .
The variance of a continuous random variable is defined exactly as for discrete random variables:
c©2020 School of Mathematics and Statistics, UNSW Sydney
198 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Definition 4. The variance of a continuous random variable X is
Var (X) = E((X − E(X))2) = E(X2)− (E(X))2 .
The standard deviation of X is σ = SD(X) =

Var (X).
Note that by Theorem 2,
E(X2) =
∫ ∞
−∞
x2 f(x) dx .
Example 3. At a random point during the day, we take note of the time as in Examples 1 and 2,
and let X again be the continuous random variable giving the number of minutes past the hour.
In Example 2, we found the density function f(x) to be f(x) = 160 for x ∈ (0, 60], so
E(X) =
∫ ∞
−∞
xf(x)dx =
∫ 0
−∞
0 dx+
∫ 60
0
x
60
dx+
∫ ∞
0
0 dx = 0 +
1
60
[1
2
x2
]60
0
= 30 ,
as we would expect. Similarly by Theorem 2,
E(X2) =
∫ ∞
−∞
x2f(x)dx =
∫ 60
0
x2
1
60
dx =
1
60
[1
3
x3
]60
0
= 1200 ,
so Var(X) = 1200 − 302 = 300 and SD(X) = √300 ≈ 17.3. ♦
The mean and variance have the same properties under linear scaling as in the discrete case.
Theorem 3. If a and b are constants, then
E(aX + b) = aE(X) + b
Var(aX + b) = a2Var(X)
SD(aX + b) = |a|SD(X) .
An immediate consequence of these properties is
Theorem 4. If E(X) = µ and Var(X) = σ2, and Z =
X − µ
σ
, then E(Z) = 0 and Var(Z) = 1.
The random variable Z = X−µσ is referred to as the standardised random variable obtained
from X. Note that this theorem holds for discrete and continuous random variables alike.
Proof. By Theorem 3,
E(Z) = E
(
X − µ
σ
)
=
1
σ
E(X)− µ
σ
=
µ
σ
− µ
σ
= 0 ;
Var(Z) = Var
(
X − µ
σ
)
=
1
σ2
Var(X) =
σ2
σ2
= 1 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 199
9.6 Special Continuous Distributions
In this section, we consider two well-known continuous probability distributions, namely the normal
and exponential distributions. It turns out that these are limiting cases of the discrete probability
distributions that we have already seen, namely the binomial and geometric distributions, respec-
tively, but we will not prove this here.
9.6.1 The Normal Distribution
A widely used probability distribution in statistics is the normal or Gaussian distribution.
Definition 1. A continuous random variableX has normal distributionN(µ, σ2)
if it has probability density
φ(x) =
1√
2πσ2
e−
1
2
(
x−µ
σ
)2
where −∞ < x <∞ .
We write X ∼ N(µ, σ2) to denote that X has the normal distribution N(µ, σ2). The normal
probability density is bell-shaped, symmetric about the value x = µ, and narrower for smaller σ.
The probability densities for N(0, 1), N(3, 1), and N(0, 4) are illustrated below.
N(3, 1)N(0, 1)
N(0, 4)
0.4
0 3
The distribution N(0, 1) is called the standard normal distribution.
The mean and variance of a random variable X ∼ N(µ, σ2) are simply µ and σ2:
Theorem 1. If X is a continuous random variable and X ∼ N(µ, σ2), then
• E(X) = µ
• Var(X) = σ2.
Proof. These are left as an exercise.
Theorem 2. If X ∼ N(µ, σ2), then X − µ
σ
∼ N(0, 1).
Proof. This (almost) follows from Theorem 1 above and Theorem 4 from the previous section, but
a proof is required that the new random variable is actually normal.
c©2020 School of Mathematics and Statistics, UNSW Sydney
200 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Note that when we standardise a normal random variable, the resulting distribution is also
normal.
To find a probability involves evaluating the integral of the density function which is very hard,
since this function does not have an elementary primitive. Thus, if X is normally distributed with
mean µ and standard deviation σ,
P (X 6 x) = FX(x) =
∫ x
−∞
1√
2πσ2
e−
1
2
( t−µσ )
2
dt .
To evaluate this integral, we convert to the standard normal distribution Z ∼ N(0, 1), using the
change of variable Z =
X − µ
σ
outlined above. This gives
P (Z 6 z) = FZ(z) =
∫ z
−∞
1√

e−
1
2
t2 dt .
The value of this integral for various z has been tabulated numerically and is available either via a
calculator or the table given on the following page. This table gives the values of this integral for
z in the range −3 to 3. For z less than −3, the value is essentially zero, while for z greater than 3,
the value is essentially 1.
Example 1. Suppose X is normally distributed with mean µ = 20 and standard deviation σ = 3.
Find P (X 6 24).
Solution. We change to the standard normal distribution, so
P (X 6 24) = P
(
X − µ
σ
6
24− 20
3
)
≈ P (Z 6 1.33) ≈ 0.9082 ,
from the tables. ♦
Example 2. Suppose that the weekly wages of secretaries are normally distributed with mean
$800 and standard deviation $50. What is the probability of a secretary having a weekly wage
higher than $900 and how many secretaries out of a group of 2000 randomly selected secretaries
would you expect to have a weekly wage greater than $900?
Solution. Let X denote the weekly wage of a secretary. Then the mean of X is µ = 800 and the
standard deviation of X is σ = 50. Since X ∼ N(µ = 800, σ2 = 502),
Z =
X − 800
50
∼ N(0, 1) and X = 900 when Z = 900− 800
50
= 2 .
Thus,
P (X > 900) = P (Z > 2) = 1− P (Z 6 2) = 1− 0.9772 = 0.0228 .
Therefore, in a group of 2000 secretaries we would expect .0228 × 2000 = 45.6 (i.e., about 46) of
them to have a weekly wage in excess of $900. ♦
In the above example, we used the fact that P (Z > a) = 1 − P (Z 6 a); this is true because
P (Z = a) = 0 since Z is continuous. Note also that for a 6 b, P (a 6 Z 6 b) = P (Z 6 b)−P (Z 6 a).
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 201
Standard normal probabilities P (Z 6 z)
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
−2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
−2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
−2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
−2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
−2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
−2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
−2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
−2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
−2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
−2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
−1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
−1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
−1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
−1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
−1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
−1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
−1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
−1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
−1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
−1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
−0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
−0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
−0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
−0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
−0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
−0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
−0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
−0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
−0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
−0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
c©2020 School of Mathematics and Statistics, UNSW Sydney
202 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
The normal distribution is used, among other things, to approximate the binomial distribution
B(n, p) when n grows large. Before the advent of powerful computers, calculations involving many
B(n, p) terms was very laborious or even impossible, so it was easier to approximate such calcula-
tions by ones involving integration of the probability density of the normal distribution N(µ, σ2)
with the same mean (µ = np) and variance (σ2 = np(1 − p)) as the binomial distribution. These
integrals can be evaluated by transforming to the standard normal distribution as we did above
and using tables.
These days, computers can calculate most probabilities involving binomial sums; however, the
normal distribution is occasionally still used instead.
Example 3. Toss a coin n = 10 times and let X be the random variable counting the number of
resulting heads. As we have seen previously, X ∼ B(10, 12). The probability of tossing at most four
heads is thus
P (X ≤ 4) = FX(4) =
4∑
k=0
(
10
k
)(1
2
)k(1
2
)10−k
=
193
512
≈ 37.7% .
We could also have approximated this probability by calculating FY (4) of a continuous random
variable Y with Y ∼ N(µ, σ2) where µ = E(X) = 5 and σ2 = Var(X) = 52 . To get an even better
approximation, we can calculate FY (4.5) since P (X 6 4) = P (X < 5) and 4.5 lies in the middle of
the interval from 4 to 5:
P (X ≤ 4) = FX(4) ≈ P (Y 6 4.5) ≈ P (Z 6 −0.32) ≈ 37.45% ,
which differs from the true value by only 0.25%.
Now toss a coin n = 50 times and let X again be the random variable counting the number of
resulting heads. Then X ∼ B(50, 12). The probability of tossing at most 23 heads is thus
P (X ≤ 23) = FX(23) =
23∑
k=0
(
50
k
)(1
2
)k(1
2
)50−k ≈ 33.59% .
The above calculation is obtained at once when using Maple but would be very time-consuming to
calculate by hand (however, can you find a simple short-cut to perform this particular calculation
with far less effort?). We could also have approximated this probability by calculating FY (23.5) of
a continuous random variable Y with Y ∼ N(µ, σ2) where µ = E(X) = 25 and σ2 = Var(X) = 252 :
P (X ≤ 23) ≈ P (Y 6 23.5) ≈ P (Z 6 −0.42) ≈ 33.72% ,
which only differs from the true value by 0.13%. ♦
The three figures below show how binomial distributions may be approximated by normal dis-
tributions. The first and third figures illustrate B(10, 12) and N(5,
5
2 ), and B(50,
1
2 ) and N(25,
25
2 ),
respectively, that appeared in the above example. Note that the first and second coordinates of
the three figures are differently scaled and truncated.
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 203
0
0.25
1 5 9
B(10, 12) and N(5,
5
2 )
0.18
4 10 16
B(20, 12) and N(10, 5)
0.11
16 25 34
B(50, 12) and N(25,
25
2 )
More generally, normal distributions are used to model experiments involving large numbers
of identical and independent trials that have several possible outcomes. Typical examples include
the final-grade distributions of the high-school graduates in a particular country in a given year; or
distributions of height, or of weight, or of IQ-test results, of the citizens of a country, and so on.
Example 4. A six-sided die, which is believed to be biased, is rolled 720 times and shows a ‘6’
100 times.
a. Write down the formula for the tail probability of getting 100 or less 6’s in 720 rolls of a fair
die.
b. Using the normal approximation to the binomial distribution, calculate the probability in
(i), giving your answer to 3 decimal places.
Solution: a. Let X be the number of sixes in 720 rolls of a fair die.
Then X is binomial with n = 720, p = 16 .
Hence
P (X 6 100) =
100∑
k=0
(
720
k
)(
1
6
)k (5
6
)720−k
.
b. Now X can be approximated by the continuous random variable Y ∼ N(µ, σ2), with
µ = E(X) = np = 720 × 1
6
= 120,
σ2 = Var(X) = np(1− p) = 720 × 1
6
× 5
6
= 100.
Then
P (X 6 100) ≈ P (Y 6 100.5) = P
(
Z 6
100.5 − 120
10
)
= P (Z 6 −1.95) = 0.026,
where Z = Y−µσ ∼ N(0, 1).
Since 0.026 = 2.6% < 5%, the tail probability is significantly low and so there is good evidence
that the die is biased.
9.6.2 [X] The Exponential Distribution
In the previous subsection, we saw that the binomial distribution can be approximated by the
normal distribution. Similarly, the geometric distribution has an analogous continuous probability
distribution, namely the exponential distribution.
c©2020 School of Mathematics and Statistics, UNSW Sydney
204 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Definition 2. A continuous random variable T has exponential distribution
Exp(λ) if it has probability density
f(t) =
{
λe−λt if t > 0
0 if t < 0 .
We write T ∼ Exp(λ) to denote that T has the exponential distribution Exp(λ). The probability
densities for Exp(12 ), Exp(1), and Exp(2) are illustrated below.
2
1
1
2
0
Exp(2)
Exp(1)
Exp(12 )
The mean and variance associated with the exponential distribution are given as follows:
Theorem 3. If T is a continuous random variable and T ∼ Exp(λ), then
• E(T ) = 1
λ
• Var(T ) = 1
λ2
.
Proof. Using integration by parts,
E(T ) =
∫ ∞
−∞
tf(t)dt =
∫ 0
−∞
0 dt+
∫ ∞
0
tλe−λtdt
= 0 +
[−te−λt]∞
0
+
∫ ∞
0
e−λtdt = 0− 0 + 1
λ
[−e−λt]∞
0
=
1
λ
(0− (−e0)) = 1
λ
.
The variance Var(T ) is calculated similarly.
The cumulative distribution function FT (t) of an exponentially distributed random variable
T ∼ Exp(λ) is easily expressed:
FT (t) = P (T 6 t) =
∫ t
−∞
f(x) dx =
{
1− e−λt , t > 0
0 , t < 0 .
If we set p = 1− e−λ and let n ∈ Z be an integer, then
FT (n) =
{
1− (1− p)n , n > 0
0 , n < 0 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.6. SPECIAL CONTINUOUS DISTRIBUTIONS 205
By Corollary 5 of Theorem 9.4.2, this is the value FX(n) of the cumulative distribution function
of a discrete random variable X that is geometrically distributed with parameter p = 1− e−λ. In
other words, the exponential distribution Exp(λ) is approximated by the geometric distribution
G(p):
1
1 − e

1
2
0 5 10
Exp(12) and G
(
1− e− 12 )
1
1 − e−1
0 5 10
Exp(1) and G
(
1− e−1)
1
1 − e−2
0 5 10
Exp(2) and G
(
1− e−2)
Conversely, the geometric distribution G(p) is interpolated by the exponential distribution Exp(λ)
where λ = ln 11−p .
Example 5. An insurance company has collected data on one of its insurance policies, and it turns
out that, on average, p = 0.0502 of these policies are claimed each year. For one of these policies,
find the
(a) probability that the first claim occurs within the first six years;
(b) probability that the first claim occurs within the first 6.5 years;
(c) probability that the first claim occurs during the first half of the sixth year;
(d) expected number of years until the first claim occurs.
Solution. We assume that claims occur independently of each other and with equal probability
p = 0.0502. If claims only occurred at the end of each year, then we could model the behaviour of
the first occurring claim by a discrete random variable X ∼ G(p) that counted the number of years
until that first claim occurred. However, claims might occur at any positive time from the policy’s
inception, so the discrete model does not suffice; instead, let T be continuous random variable that
gives the time until the first claim. Then T ∼ Exp(λ) where λ = ln 11−0.0502 ≈ 0.0515.
(a) The probability that the first claim occurs within the first six years is
P (X 6 6) = FX(6) = 1− (1− p)6 = 1− (1− 0.0502)6 ≈ 26.58% .
We could also have calculated this probability as follows:
P (T 6 6) = FT (6) = 1− e−λ×6 = 1− e−0.0515×6 ≈ 26.58% .
(b) The probability that the first claim occurs within the first 6.5 years is
P (T 6 6.5) = FT (6.5) = 1− e−λ×6.5 = 1− e−0.0515×6.5 ≈ 28.45% .
c©2020 School of Mathematics and Statistics, UNSW Sydney
206 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
(c) The probability that the first claim occurs during the first half of the sixth year is
P (5 < T 6 5.5) = P (T 6 5.5)− P (T 6 5) = FT (5.5) − FT (5)
= (1− e−λ×5.5)− (1− e−λ×5)
= e−0.0515×5 − e−0.0515×5.5
≈ 1.965% .
A second way to calculate this probability is as follows:
P (5 < T 6 5.5) =
∫ 5.5
5
λe−λtdt =
[−e−λt]5.5
5
= e−0.0515×5 − e−0.0515×5.5 ≈ 1.965% .
(d) The expected number of years until the first claim occurs is E(T ) = 1λ =
1
0.0515 ≈ 19.42.
This is roughly approximated by E(X) = 1p =
1
0.0502 ≈ 19.53.
These values are what we might intuitively estimate: “just under 10.05 = 20”. ♦
9.6.3 Useful Web Applets to Illustrate Probability Reasoning
Web applets for long-run frequency illustrations:
http://www.shodor.org/interactivate/activities/AdjustableSpinner/
http://socr.stat.ucla.edu/Applets.dir/DiceApplet.html
An applet to illustrate conditional probability and independence:
http://www.stat.berkeley.edu/users/stark/Java/Html/Venn.htm
An applet for Bayes’ Rule:
http://www.stat.berkeley.edu/users/stark/Java/Html/Venn.htm
An applet for the Birthday Problem:
http://www.mste.uiuc.edu/reese/birthday/
9.7 Probability and MAPLE
Most of the problems in this chapter reduce to sums of various probability expressions. Maple’s
ability to do summation symbolically is especially useful here, as is its ability to sum infinite
series. Even for finite sums, Maple can simplify the calculations significantly. For example, we can
implement the binomial distribution function B(n, p, k) =
(n
k
)
pk(1− p)n−k as follows
B := proc(n,p,k) binomial(n,k)*p^k*(1-p)^(n-k) end proc;
Maple allows us to check that the function values of B(n, p, k) sum to 1 (even though we have not
specified what n and p are!):
simplify(sum(B(n,p,k), k = 0..n));
Indeed, Maple allows us to perform many exact calculations on discrete distributions. For example,
it is somewhat time-consuming and awkward to calculate by hand and calculator the probability
that 1000 coin tosses result in between 510 and 530 heads. However, this calculation is very quick
and easy to conduct with Maple:
c©2020 School of Mathematics and Statistics, UNSW Sydney
9.7. PROBABILITY AND MAPLE 207
n := 1000;
p := 1/2;
sum(B(n,p,k), k = 510..530);
evalf(%);
c©2020 School of Mathematics and Statistics, UNSW Sydney
208 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Problems for Chapter 9
Problems 9.1 : Some Preliminary Set Theory
1. [R] Let A = {a, c, d, e} and B = {d, e, f}.
Suppose that the universal set is S = {a, b, c, d, e, f}.
Write down the following sets.
a) A−B, b) B −A, c) A ∪Ac, d) B ∩Bc,
e) A ∪Bc, f) Ac ∩B, g) (A ∪B)c, h) Ac ∩Bc.
2. [R] A survey was carried out in a new development area to gain data on home-delivered
newspapers. 110 homes were selected at random and the occupants asked whether they
had the daily paper or the weekend paper home delivered. 74 received the daily paper, 58
received the weekend paper and 10 received no paper at all. How many homes visited in
this survey received both the daily and weekend papers.
3. [R] Of the students taking PHYS1121 and MATH1131 in a hypothetical year, 90% passed
MATH1131, 85% passed PHSY1121 and 6% passed neither. What percentage passed
both? What percentage of those who passed PHYS1121 also passed MATH1131?
4. [R] A brewery brews one type of beer which is marketed under three different brands. In a
survey of 150 first year students, 58 drink at least brand A, 49 drink at least brand B and
57 drink at least brand C. 14 drink brand A and brand C, 13 drink brand A and brand
B and 17 drink both brand B and brand C. 4 students drink all three brands. How many
students drink none of these three brands?
5. [R] Suppose A, B and C represent three events. Using unions, intersections and comple-
ments, find expressions representing the events
a) only A occurs,
b) at least one event occurs,
c) at least two events occur,
d) exactly one event occurs,
e) exactly two events occur.
Problems 9.2 : Probability
6. [R] Two fair dice are thrown.
a) What is the probability that the sum of the two numbers obtained is 6?
b) What is the probability that both dice show the same number?
c) What is the probability that at least one of the dice shows an even number?
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 9 209
7. [R] Suppose that 30% of computer users use a Macintosh, 50% use a Microsoft Windows PC
and that 20% use Linux. Also suppose that 60% of the Macintosh users have succumbed to
a computer virus, 80% of the Windows PC users get the virus and 10% of the Linux users
get the virus. A computer user is selected at random and it is found that her computer
was infected with the virus. What is the probability that she is a Windows PC user?
8. [R] Employment data at a large company reveal that 72% of the workers are married, that
44% are university graduates and that half of the university graduates are married. What
is the probability that a randomly chosen worker
a) is neither married nor a university graduate?
b) is married but not a university graduate?
c) is married or is a university graduate?
9. [R] On the basis of the health records of a particular group of people, an insurance company
accepted 60% of the group for a 10 year life policy. Ten years later it examined the survival
rates for the whole group and found that 80% of those accepted for the policy had survived
the 10 years, while 50% of those rejected had survived the 10 years. What percentage of
the group did not survive 10 years? If a person did survive 10 years, what is the probability
that they had been refused cover?
10. [R] Urn 1 contains 2 red balls and 3 black balls. Urn 2 contains 4 red balls and 5 black
balls.
a) If an urn is randomly selected and a ball drawn at random from it, what is the
probability that the ball is red?
b) Suppose a ball is drawn at random from Urn 1 and placed into Urn 2. If a ball is
then drawn at random from the 10 balls in Urn 2, what is the probability that it is
red?
c) In the previous part, given that the ball drawn from Urn 2 is red, what is the proba-
bility that the ball transferred from Urn 1 was black?
11. [R] Down’s syndrome is a disorder that affects 1 in 270 babies born to mothers aged 35
or over. A new blood test for the condition has a sensitivity (i.e. the probability of a
positive test result given the Down’s syndrome is present) of 89%. The specificity (i.e.
the probability of a negative test result given that Down’s syndrome is absent) of the new
test is 75%.
a) What proportion of women over age 35 would test positive on this new blood test?
b) A mother over age 35 receives a positive test result. What is the chance that Down’s
syndrome is actually present?
c) A mother over age 35 receives a negative test result. What is the chance that Down’s
syndrome is actually present?
12. [R] The following is a table of the annual promotion probabilities at a particular workplace,
broken down by gender.
c©2020 School of Mathematics and Statistics, UNSW Sydney
210 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Promoted Not promoted Total
Male 0.17 0.68 0.85
Female 0.03 0.12 0.15
Is there gender bias in promotion?
13. [R] A system has n independent components and each fail with probability p. Calculate
the probability that the system will fail when
a) the components are in parallel, so the system fails only when all of the components
fail;
b) the components are in series, so the system fails if any one of the components fail;
14. [X] Tom and Bob play a game by each tossing a fair coin. The game consists of tossing
the two coins together, until for the first time either two heads appear when Tom wins, or
two tails appear when Bob wins.
a) Show that the probability that Tom wins at or before the nth toss is
1
2
− 1
2n+1
.
b) Show that the probability that the game is decided at or before the nth toss is 1− 1
2n
.
15. [X] Extend the Multiplication Rule of section 9.2.3 to 3 events A1, A2, A3 and show that
P (A1 ∩A2 ∩A3) = P (A3|A1 ∩A2)P (A2|A1)P (A1).
The same pattern applies to higher numbers of events. Write this down.
This law is particularly useful when we have a sequence of dependent trials. To gain
entry to a selective high school students must pass 3 tests. 20% fail the first test and are
excluded. Of the 80% who pass the first, 30% fail the second and are excluded. Of those
who pass the second, 60% pass the third.
What proportion of students pass the first two tests? Use the multiplicative law to answer
this question.
What proportion of students gain entry to the selective high school?
What proportion pass the first two tests, but fail the third?
16. [X] Use the additive law of probability to establish, using mathematical induction, Boole’s
Law:
P (A1 ∪A2 ∪ · · · ∪An) ≤ P (A1) + P (A2) + · · ·+ P (An)
17. [X] Establish, using mathematical induction, Bonferoni’s inequality:
P (A1 ∩A2 ∩ · · · ∩An) ≥ 1− [P (Ac1) + P (Ac2) + · · · + P (Acn)]
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 9 211
Problems 9.3 : Random Variables
18. [R] Show that each of the following sequences pk satisfies pk ≥ 0 and
∑∞
k=0 pk = 1. Note
that in the distributions below pk = P (X = k) where X is the random variable under
consideration.
a) Uniform Distribution.
pk =
1
n
for 1 ≤ k ≤ n
and 0 otherwise. Here, n is a fixed positive integer.
b) Binomial Distribution.
pk = B(n, p, k) =
(
n
k
)
pk(1− p)n−k for 0 ≤ k ≤ n,
and 0 otherwise. Here, p is a constant with 0 < p < 1.
c) Geometric Distribution.
pk = G(p, k) = (1− p)k−1p for 1 ≤ k <∞,
where p is a constant with 0 < p < 1.
d) Poisson Distribution.
pk = e
−λλ
k
k!
for 0 ≤ k <∞,
where λ > 0 is a constant. To solve this question you will need the Maclaurin series
for eλ.
19. [R] A box contains four red and two black balls. Two balls are drawn from the box. Let
X be the number of red balls obtained. Find the probability distribution for X.
20. [R] A busy switchboard receives 150 calls an hour on average. Assume that the probability,
pk, of getting k calls in a given minute is
pk = e
−λλ
k
k!
where λ the average number of calls per minute. (This is called a Poisson distribution.)
a) Find the probability of getting exactly 3 calls in a given minute.
b) Find the probability of getting at least 2 calls in a given minute.
21. [R] In a biased lottery with tickets numbered 1 to 50, the probability that ticket number
n wins is
pn =
n
1275
for n = 1, 2, 3, . . . , 50.
What is the probability that the winning ticket bears a number less than or equal to 25?
c©2020 School of Mathematics and Statistics, UNSW Sydney
212 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
22. [H] Let X be a random variable with probability distribution
P (X = k) =
c
k!
, for k = 0, 1, 2, . . ..
a) Determine the value of c.
b) Calculate P (X = 2).
c) Calculate P (X < 2).
d) Calculate P (X > 4).
23. [X] In a biochemical experiment, n organisms are placed in a nutrient medium, and the
number of organisms X which survive for a given period is recorded. The probability
distribution of X is assumed to be given by
P (X = k) =
2(k + 1)
(n+ 1)(n + 2)
for 0 ≤ k ≤ n,
and 0 otherwise.
a) Check that
n∑
k=0
P (X = k) = 1.
b) Calculate the probability that at most a proportion α of the organisms survive, and
deduce that for large n this is approximately α2.
c) Find the smallest value of n for which the probability of there being at least one
survivor among the n organisms is at least 0.95.
24. [H] A genetic experiment on cell division can give rise to at most 2n cells. The probability
distribution of the number of cells X recorded is
P (X = k) =
θk(1− θ)
1− θ2n+1 for 0 ≤ k ≤ 2n,
where θ is a constant with 0 < θ < 1.
What are the probabilities that
a) an odd number of cells is recorded,
b) at most n cells are recorded?
25. [R] Let X be a discrete random variable with the following probability distribution
k : 0 1 2 3 4
P (X = k) = pk : 0.1 2c 0.2 0.1 4c
a) Find the value of c.
b) Calculate E(X) and Var(X).
c) Let Y = 1− 4X. Calculate E(Y ) and Var(Y ).
26. [R] Find the mean and variance for the uniform distribution in Question 18(a).
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 9 213
27. [X] LetX be a random variable with the Poisson probability distribution given in Question
18(d). Find E
(
(1 +X)−1
)
.
28. [X] Assuming that one can differentiate a power series term by term, one obtains from
the formula ∞∑
k=0
xk =
1
1− x, |x| < 1
the formulas
∞∑
k=1
kxk−1 =
1
(1− x)2 ;
∞∑
k=2
k(k − 1)xk−2 = 2
(1− x)3 , |x| < 1.
(You will see that this is justified in your MATH1231/41 Calculus lectures).
From these formulas, show that
∞∑
k=0
kxk =
x
(1− x)2 ;
∞∑
k=0
k2xk =
x(x+ 1)
(1− x)3 , |x| < 1.
and hence calculate the mean and the variance for geometric distribution in Question
18(c).
Problems 9.4 : Special Distributions
29. [R] A coin is tossed 50 times. What is the probability of it coming down heads exactly 25
times?
30. [R] A test paper contains 8 multiple choice questions, each with 4 potential answers to
choose from. A correct answers gains 1 mark, a wrong answer 0 marks and 4 is the pass
mark. If a student simply guesses, what is probability that she will pass?
31. [R] The probability of dying from a particular disease is 0.3. 10 people in a hospital are
suffering from the disease. Find the probability that at least 8 survive.
32. [R] How many times must a coin be tossed until the probability of getting 2 or more heads
exceeds 0.99? (You need to try different n values after an initial guess.)
33. [X] For the B(n, p) distribution, by considering
pk
pk−1
, show that pk is largest when
k = ⌊(n + 1)p⌋. This k is called the “mode” of the distribution.
34. [R] Consider the game of “rock, scissors, paper” in which two players instantaneously
choose one of “rock”, “scissors” or “paper”. If both players pick the same item, they
play again; if the two players make different choices one of them wins (rock beats scissors,
scissors beats paper and paper beats rock).
Let X be the number of times the game is played until someone wins. Find the probability
distribution for X when each player chooses randomly from rock, scissors or paper.
c©2020 School of Mathematics and Statistics, UNSW Sydney
214 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
35. [H] A die is rolled repeatedly.
a) Find the probability that the third time a ‘6’ shows is on the 20th roll.
b) [X]Generalise to the kth time on the nth roll. This is an example of the “negative
binomial” distribution, which generalises the geometric distribution.
36. [X] a) Show that for non-negative integers k,m, n,(
k
m
) (
n
k
)
=
(
n
m
) (
n−m
k −m
)
b) Show that
n∑
k=m
(
k
m
)(
n
k
)
pk (1− p)n−k =
(
n
m
)
pm.
c) By considering the cases m = 1 and m = 2 in the preceding formula, prove the
variance formula for the Binomial distribution, as stated in Theorem 2 of Section 9.4.
37. [R] A certain type of car is known to sustain damage 25% of the time in 15 km/hr crash
tests. A modified bumper design has been proposed in an effort to decrease this percentage.
In a trial, cars with the new type of bumper were damaged in one of 15 test crashed.
a) What distribution could be used to model the number cars damaged in trials, if the
new bumpers perform no better than the old ones?
b) Calculate a tail probability that measures how unusual it would be to observe as few
as one damaged car, under the assumptions of part (a).
c) Does the trial indicate that the new bumpers protect cars from damage better than
the old bumpers?
38. [R] An extensive study was conducted in the northern hemisphere of butterfly distribu-
tions, comparing where butterfly species are presently found with where they were found
a century ago (Parmesan et al 1999, Nature 399, 579-583). It was hypothesised that due
to climate change, more species would have shifted northwards in distribution than shifted
southwards.
It was found that of the 23 butterfly species whose distribution had shifted, 22 had shifted
northwards in distribution, and 1 had shifted southwards in distribution.
a) What distribution could be used to model the number of butterfly species moving
northward, if they are just as likely to move north as south (that is, if there is no
influence of climate change on butterfly species)?
b) Calculate a tail probability that measures how unusual it would be to observe as
many as 22 butterfly species moving northwards, under the assumptions of part (a).
c) Do these data provide evidence that climate change has affected the distribution of
butterfly species?
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 9 215
39. [R] Sydney’s dam levels recently reached historic lows, which may in part be due to lower
than average rainfall over recent years. The following data are total annual rainfall in
Sydney for eight recent calendar years.
Year 2000 2001 2002 2003 2004 2005 2006 2007
Annual rainfall 812.6 1359.0 860.0 1207.6 995.2 816.0 994.0 1499.2
Historic rainfall levels prior to the year 2000 was 1302.2mm per year. We will use a sign
test approach to see how much evidence there is that the recent Sydney rainfall is less
than historic levels (that is, less than 1302.2mm).
It is reasonable to assume that annual rainfall is independent across years.
a) In how many years was the annual rainfall less than 1302.2mm?
b) What distribution could be used to model the number of years in which rainfall was
less than 1302.2mm rather than being greater than this value, if both outcomes were
equally likely?
c) Calculate a tail probability that measures how unusual it would be to observe as many
years with a total rainfall less than 1302.2mm as was observed, if annual rainfall was
just as likely to be greater than 1302.2 as less than 1302.2.
d) Do you think there is evidence that Sydney rainfall has decreased in recent years?
(That is, that the values are systematically smaller than 1302.2mm?)
40. [R] Do ravens intentionally fly towards gunshot sounds (to scavenge on the carcass they
expect to find)? Crow White addressed this question by counting raven numbers at a loca-
tion, firing a gun, then counting raven numbers 10 minutes later (Ecology 2005, 86:1057-
1060). He did this in 12 locations. Results:
location 1 2 3 4 5 6 7 8 9 10 11 12
before 1 0 1 0 0 0 0 5 1 1 2 0
after 2 3 2 2 1 1 2 2 4 2 0 3
We would like to find out if there is evidence that ravens fly towards the location of
gunshots.
a) In how many locations was there an increase in number of ravens, after the gunshot?
b) In how many locations was there a change in number of ravens after the gunshot?
c) What distribution could be used to model the number of locations in which there was
an increase in the number of ravens rather than a decrease, if both outcomes were
equally likely?
d) Calculate a tail probability that measures how unusual it would be to observe as
many locations with an increase in number of ravens as was observed, if increases
and decreases were equally likely.
e) Do you think there is evidence that the ravens fly towards gunshot sounds? (That
is, was there a systematic increase in the number of ravens present after the gunshot
sound?)
c©2020 School of Mathematics and Statistics, UNSW Sydney
216 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Problems 9.5
41. [R] Verify that the following functions f are probability densities, that is, that
f(x) > 0 and
∫ ∞
−∞
f(x) dx = 1.
Also sketch the graph of each function.
a) Uniform Distribution (Assume a < b).
f(x) =


1
b− a for a 6 x 6 b
0 otherwise.
b) Pareto Distribution. For k > 0,
f(x) =


k
xk+1
for x > 1
0 otherwise.
c) Gamma Distribution fn. For n > 0,
fn(x) =


1
n!
xne−x for x > 0
0 otherwise.
Note. The formula
∫ ∞
0
xne−x dx = n! will be useful. You can prove it using
integration by parts.
d) Laplace Distribution.
f(x) =
1
2
e−|x| for −∞ < x <∞.
42. [R] Calculate the means and variances for the distributions in the preceding question.
(Note that for the Pareto distribution the mean is only defined if k > 1 and the variance
is only defined if k > 2.)
43. [R] The probability density of the Cauchy Distribution is given by
f(x) =
α
1 + x2
for −∞ < x <∞.
a) Find the value of α.
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 9 217
b) If X has a Cauchy distribution, find a number c such that
P (X 6 c) = 0.25.
c) [H]What can be said about E(X) and Var(X) for the Cauchy distribution?
44. [R] X has the probability density function
f(x) =
{
1
8 x for 3 6 x 6 5
0 otherwise.
Calculate
a) P (X > 4), b) P (X 6 4), c) P (3.2 6 X 6 4.1), d) E(X).
45. [R] Y has the probability density function
f(y) =


c
y
for 10 6 y 6 100
0 otherwise.
a) Determine the value of the constant c.
b) Obtain the cumulative distribution function of Y .
c) Find b such that P (Y 6 b) = 0.50.
d) Find the expected value of E(Y ).
46. [R] Let F be the function defined by
F (x) =


0 for x < 2
1
4 (x− 2) for 2 6 x < 3
1
4 +
3
8 (x− 3) for 3 6 x < 5
1 for x > 5.
a) Sketch the graph of F .
b) Find a probability density function f which would have F as its cumulative distribu-
tion function. Sketch the graph of f .
c) Find E(X) for this probability density function.
47. [H] Suppose X has a probability density given by
f(x) =
{
αe−αx for x > 0
0 otherwise.
Find the mean and variance of Y = 2X + 3.
c©2020 School of Mathematics and Statistics, UNSW Sydney
218 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
Problems 9.6
48. [R] Suppose Z is the standard normal random variable. Use the standard normal tables
to write down
a) P (Z 6 1.23)
b) P (Z 6 −2.3)
c) P (Z > 0.36)
d) P (Z > −1.24)
e) P (1 6 Z 6 2)
f) P (−0.5 6 Z 6 0.5)
49. [R] Suppose X is normally distributed with mean µ and standard deviation σ. Use the
standard normal tables to find:
a) P (X 6 12) with µ = 10, σ = 2
b) P (X > 53) with µ = 50, σ = 4.5
c) P (X 6 47) with µ = 50, σ = 4.5
d) P (X > 32) with µ = 36, σ = 8
e) P (21 6 X 6 24) with µ = 20, σ = 3
f) P (15 6 X 6 20) with µ = 18, σ = 1.5
50. [R] Suppose X is normally distributed with mean µ and standard deviation σ. Use the
standard normal tables to find the value of c such that:
a) P (X 6 c) = 0.8238 with µ = 0, σ = 1
b) P (X > c) = 0.0495 with µ = 0, σ = 1
c) P (X 6 c) = 0.2514 with µ = 50, σ = 4.5
d) P (X > c) = 0.6915 with µ = 36, σ = 8
51. [R] The mean heights of men in a certain country is estimated to be 1.69m with a standard
deviation of 0.06m. We assume that the heights are normally distributed.
a) Find the approximate probability that a man chosen at random from this country
has a height between 1.60m and 1.74m.
b) If 400 men are chosen from this country, how many would you expect on average to
have heights greater than 1.74?
52. [R] The length of life of a particular make of T.V. is approximately normally distributed,
with a mean of 3.1 years and a standard deviation of 1.2 years. If this type of T.V. is
guaranteed for one year, what fraction of the T.V.s sold will require replacement under
the guarantee?
c©2020 School of Mathematics and Statistics, UNSW Sydney
PROBLEMS FOR CHAPTER 9 219
53. [R] Experience has shown that the I.Q. scores of University students are normally dis-
tributed with a mean of 112 and a standard deviation of 8. Calculate the percentage of
students who will have an I.Q. score
a) higher than 130 b) lower than 100 c) between 105 and 125.
54. [R] The lengths of studs turned out by a certain automatic machine are normally dis-
tributed with a mean of 3.220 cm and a standard deviation of 0.003cm. If the acceptable
length of a stud is between 3.226 and 3.212 cm, determine to one decimal place the per-
centage rejected as under size and over size respectively.
55. [R] An unbiased die is tossed 600 times. Use the normal approximation to the binomial
to find the approximate probability that a 6 appears more than 120 times.
56. [R] Olof Jonsson was a controversial psychic whose psychic abilities were tested in an
experiment. In the experiment a computer showed 4 cards to the subject and (randomly)
picked one of them, and the subject was to guess which card they thought the computer
had picked. This process was repeated 288 times, and Olof managed to pick the right card
88 times.
a) What distribution could be used to model the number of cards Olof correctly selects,
if he is not psychic (and so can do no better than random guessing)?
b) Write down an expression for a tail probability that measures how unusual it is to
correctly pick as many cards as Olof did.
c) Use the normal approximation to the binomial to find this probability, under the
assumption that Olof is not psychic.
d) Is this evidence that Olof was psychic?
57. [X] If X is a normal random variable with mean µ and variance σ2, show that
E(|X − µ|) =

2
π
σ.
58. [X] Find E(X) and Var(X) for the random variable X with probability density function
proportional to
e−x
2+x.
59. [X] Let T be a continuous random variable and T ∼ Exp(λ); that is T has an exponential
distribution with parameter λ. Prove that
Var(T ) =
1
λ2
.
60. [X] Suppose a continuous random variable T has an exponential distribution with param-
eter λ and
P (T 6 t) = p, 0 < p < 1.
c©2020 School of Mathematics and Statistics, UNSW Sydney
220 CHAPTER 9. INTRODUCTION TO PROBABILITY AND STATISTICS
a) Find t in terms of p.
b) Hence find the median m. In other words, find m such that P (T 6 m) = 0.5.
61. [X] (Memoryless property) A continuous random variable T has an exponential distribu-
tion and T ∼ Exp(λ). Prove that
P (T > t+ t0|T > t0),
is independent of t0 for positive t and t0.
62. [X] Suppose the time, in minutes, required by any particular student to complete a certain
two-hour examination has the exponential distribution for which the mean is 90 minutes.
The examination starts at 10:00 am.
a) What is the probability that a student completes the examination before 11:00 am?
b) What is the probability that a student completes the examination between 11:00 am
and 11:30 am?
c) What is the probability that a student could not complete the examination within 2
hours?
d) What is the median time for completing the examination?
63. [X] During a whale watch season in Sydney, the time, measured in hours from the moment
that a cruise enters the whale watch area, to spot the first whale can be modelled by the
exponential distribution with parameter λ = 0.4. A person joins a Sydney whale watch
tour during the season. After entering the area,
a) what is the probability that no whale can be spotted in the first hour?
b) what is the probability that the time to spot the first whale exceeds the mean by one
standard deviation?
c) The organiser claims a 90 % success rate of finding whales in a trip. How long should
the cruise stay in the whale watching area to achieve that?
64. [X] A system consists of 3 independent components connected in series. Hence the system
fails when at least one of the components fails. Suppose the lengths of life of the com-
ponents, measured in hours, have the exponential distribution Exp(0.001), Exp(0.003),
Exp(0.004). Find the probability that the system can last at least 100 hours.
65. [X] A system consists of n independent components connected in series. The lengths of
life of the components, in hours, have the exponential distributions Exp(λi), 1 6 i 6 n.
Let T be the random variable that gives the time until the system fails.
a) Find P (T 6 t) and hence write down the cumulative distribution function of the
random variable T .
b) Find the probability density function of T .
c) Name the probability distribution of T , and find the expected value and variance of
T .
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS TO SELECTED
PROBLEMS
Chapter 6
3. a)

12
0

 ,

11
4

 .
7. Axioms 1, 4, 5, 6, 10 are satisfied, others are not. It is not a vector space.
8. a) 2v = (1 + 1)v = 1v + 1v = v + v. Identify the axioms that have been used.
b) Use induction.
9. For part 5: If λv = µv then λv − µv = 0, so (λ − µ)v = 0. But v 6= 0 so, by part 4 of the
proposition, λ− µ = 0. So λ = µ.
11. The position vectors of the points on a plane which does not pass through the origin do not
form a vector subspace.
12. a) For example,

00
0

 ,

20
1

 ,

 1−2
−1

 are in S
c) The position vectors of the points on a plane which passes through the origin form a
vector subspace.
16. Column 1 belongs to S as A

10
0

 = (2
4
)
. Similar arguments apply for columns 2 and 3.
17. a) No. b) Yes. c) Yes.
20. W =




0
0
α
β
γ

 : α, β, γ ∈ R


, a copy of R3.
222 CHAPTER 6
23. No, S is not a subspace because the zero polynomial is not in it.
24. b) x3 + 3x2 + 3x+ 1.
25. HINT: Suppose that S1 =
{(
x
0
)
: x ∈ R
}
and S2 =
{(
0
y
)
: y ∈ R
}
. Show that S1 and S2
are subspaces of R2 but S1 ∪ S2 is not.
28. a) Yes, a = 2v1 − 3v2 + v3. b) Yes.
29. a) No. b) No, 3b2 − 2b3 = 0. span(v1,v2,v3) is a plane in R3.
30. a) Yes, a = v1 − v2 − 2v3
b) No, b ∈ span(v1,v2,v3) if and only if b1 − 2b2 + b4 = 0.
The span is a 3-dimensional subspace in R4.
31. Yes, b = −10v1 + 12v2 + 16v3.
32. No, span(v1,v2,v3) is plane in R
3 given by 5b1 − 4b2 + b3 = 0.
33. Yes. v = 3v1 − v2 − 2v3, where v1,v2,v3 are the columns of A.
34. No.
35. Yes.
38. Linearly dependent, coplanar.
39. Linearly independent, not coplanar.
40. Set containing 0 is linearly dependent as coefficient of zero vector can be varied to make linear
combination non-unique.
41. b) v3 = v1 + v2.
e) span(S) is plane through origin parallel to v1 and v2.
42. b) v4 = −2v1 + 2v2 + v3 e) span (S) = R3.
43. No. −2− x+ 5x2 = (1− x+ 2x2) + 3(−1 + x2).
44. No, it is impossible to return to origin.
45. Yes, it is possible to return to origin.
47. Yes, it is a basis.
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS 223
48. A basis for W is



12
3

 ,

 11
−1



. Dim(W ) = 2.
50. a) True. b) False. c) False. d) True. e) False, True, False, False.
f) False. g) True. h) False. i) True. j) True.
51. a) n 6 ℓ b) No relation. c) n > ℓ. d) n = ℓ.
53. b)



−11
0

 ,

−10
1



 .
54. A basis for col(A) is




1
0
0
0

 ,


−1
1
0
0

 ,


−2
4
2
0



. Dim(col(A)) = 3.
55. A basis for col(A) is




1
0
−1
1

 ,


0
−1
1
0

 ,


2
−2
4
4



. Dim(col(A)) = 3.
56. A basis is




1
2
3
4

 ,


3
5
5
5

 ,


−7
−8
−3
2

 ,


0
1
0
0



 .
57. a) B = {v1,v3} . b) x = −4v1 + 5v3, c) dim(col(A)) = 2.
59. Bases are {p1, p2, p4} or {p1, p3, p4} or {p2, p3, p4}. {p1, p2, p3} is not a basis; why not?
62. Coordinate vector is


−2
4
12
4

.
63. v =


18
7
11
19

.
64. v =

 11
10

.
c©2020 School of Mathematics and Statistics, UNSW Sydney
224 CHAPTER 6
65. a)

32
2

. b)

−a1 − a2 + 2a3a2
−a1 + a3

.
66. a)

 12−8
21

; b)

−21
−3

.
67. c) Coordinates are λ1 =

−13
4

 · v1 = −2√2, λ2 =

−13
4

 · v2 = 2√3,
λ3 =

−13
4

 · v3 = √6. Coordinate vector is

−2

2
2

3√
6

.
69. a) For example,
(
1 0
0 4
)
,
(
1 2
3 4
)
,
(−1 0
0 6
)
are in S.
b) S is not a subspace because the 2× 2 zero matrix is not in it.
70. a) For example,
(
1 0
0 −1
)
,
(
1 2
3 −1
)
,
(
0 0
0 0
)
are in S.
b) Use the Subspace Theorem to prove that S is a subspace.
74. a)


a11
a12
a21
a22

. b)


1
2(a11 + a22)
1
2(a12 + a21)
1
2i(−a12 + a21)
1
2(a11 − a22)

.
75. a)
( −4 2
−1 −3
)
= −4
(
1 0
−2 0
)
+ 2
(
0 1
3 0
)
− 3
(
0 0
5 1
)
. b) No.
79. Not a subspace.
85. Not a subspace.
87. No.
88. Let p ∈ P2 be p(z) = a0 + a1z + a2z2. Then the condition is −32a1 + a2 = 0. (An equivalent
condition is p′(−13 ) = 0.)
89. No for question 87. No for question 88.
90. No. p3 = −p1 + 3p2.
91. A basis is {p1, p2, p3, p4, 1, z}.
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS 225
92.

 2−1
0

.
93.

 91
17

.
94.

a0 − a1 + a2a0
a0 + a1 + a2

.
Chapter 7
1. S is not a linear map as the domain [−1, 1] is not a vector space.
2. a) Linear. b) Linear. c) Not Linear. d) Linear. e) Not Linear.
3. a) Domain C, codomain R, linear. b) Domain C, codomain R, linear.
c) Domain C, codomain R+ = {x ∈ R : x > 0}, not linear.
d) Domain C− {0}, codomain (−π, π], not linear. e) Domain C, codomain C, linear.
6. No.
7. T

 2−1
4

 =


21
4
−15
28

 and T

x1x2
x3

 =


x1 − 3x2 + 4x3
2x1
3x1 + x2 − 5x3
4x1 + 4x2 + 6x3

 .
9.

02
1

 =

12
3

+

−21
−4

+

 1−1
2

, and so T

02
1

 = (−1
1
)
.
11. a)


3 −1
2 4
−3 −3
0 1

. b)

−2 0 5 06 −8 0 2
−2 4 −3 0

. d)

 1 0 −2 −43 −4 −3 1
−2 2 4 0

.
12. Same as for 11.
13. a) Ae1 = 2e1, Ae2 = 0.7e2, Ab = 4e1 + 2.1e2. (e1 is stretched to twice its length, e2 is
compressed to 0.7 of its length and b is stretched and rotated.)
d) Notice that Ab = 3b. This means b is stretched to three times its length.
c©2020 School of Mathematics and Statistics, UNSW Sydney
226 CHAPTER 7
e) Notice that Ab = −2b. This means the direction of b is reversed and it is stretched to
twice its length.
15.
1
2
(
1 −√3√
3 1
)
.
16. x′ = T (x) =
(−x1
x2
)
, A =
(−1 0
0 1
)
.
17. x′ = T (x) =

 x1x2
−x3

, A =

1 0 00 1 0
0 0 −1

.
18. q = Ap, where diagonal entries of A are aii = −1+2d2i /|d|2 and off-diagonal entries of A are
aij = 2didj/|d|2.
19. T is linear. For T (x) = Ax, the matrix is
A =

 0 −b3 b2b3 0 −b1
−b2 b1 0

 .
20. T (a) = Aa, where A =
1
5

1 0 20 0 0
2 0 4

.
21. S is not linear.
23.

a′1a′2
a′3

 = Rα

a1a2
a3

 =

 a1 cosα+ a3 sinαa2
−a1 sinα+ a3 cosα

.
25. a) ker(A) = {0}, nullity(A) = 0, no basis.
b) Kernel:

λ

−1−3
1

 : λ ∈ R

 , nullity(B) = 1.
c) Kernel:

λ


2
−2
−1
1

 : λ ∈ R

 , nullity(C) = 1.
26. a)




1
−3
1
0



, nullity(D) = 1. b) {0}, nullity(E) = 0.
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS 227
27. For example A =
(
1 1 1 1
1 2 3 4
)
.
28. a) ker(A) = {0}, nullity(A) = 0.
b) ker(A) =

λ


5
4
2
1

 : λ ∈ R

 , nullity(A) = 1.
d) ker(A) =

λ


6
4
1
1

 : λ ∈ R

 , nullity(A) = 1.
29. For questions 16, 17 and 18, ker(T ) = {0} and nullity(T ) = 0.
For question 19, ker(T ) = {x ∈ R3 : x = λb for λ ∈ R}. Nullity(T ) = 1. Kernel is set of all
vectors parallel to b.
For question 20, ker(T ) = {x ∈ Rn : b · x = 0}. Nullity(T ) = n− 1 (Why?). Kernel is set of
all vectors orthogonal to b.
30. b) 1
31. a) b ∈ im(A) as A

 2−3
1

 =

1110
4

 .
b) b is not in im(A) as Ax =

 9−2
−4

 has no solution.
c) b ∈ im(A), since, for example, x =


−10
12
16
0

 is a solution of Ax = b.
32. a) No conditions, im(A) = R3. b) 3b2 − 2b3 = 0.
c) No conditions, im(A) = R3.
33. rank(A) = 3. Columns 1,2,3 of A form a basis for im(A).
rank(B) = 2. Columns 1,2 of B form a basis for im(B).
rank(C) = 3. Columns 1,2,3 of C form a basis for im(C).
rank(D) = 3. Columns 1,2,4 of D form a basis for im(D).
rank(E) = 3. Columns 1,2,3 of E form a basis for im(E).
35. rank(A) = 3. Columns 1,3,4 of A form a basis for im(A).
rank(B) = 3. Columns 1,3,4 of B form a basis for im(B).
c©2020 School of Mathematics and Statistics, UNSW Sydney
228 CHAPTER 7
36. One possible answer is



 12
−1

 ,

 2−4
2

 ,

01
0



.
37. One possible answer is




1
2
3
4

 ,


3
3
3
0

 ,


−1
1
4
8

 ,


1
0
0
0



 .
38. a)


3 4 −1 0
0 0 3 −3
0 0 0 0
0 0 0 1

. b)




3
0
0
0

 ,


−1
3
0
0

 ,


0
−3
0
1



, 3. c) 1. d) No.
46. T is linear.
50. T is a linear function.
53. S is not linear as Z is not a vector space.
T is not linear as, for example, T (1.5) + T (1.5) = 2 + 2 = 4 while T (1.5 + 1.5) = T (3) = 3.
54. yL(s) =
s2 + 9s+ 19
(s+ 3)2(s+ 1)
.
56. a) 7− 2x c)


2
1
1
−1

 .
57. T

 i2
−1

 = (2 + i) + (7− 4i)z + 2z2 − 3iz3,
T

x1x2
x3

 = (x1 − 2x3) + [(2 + i)x1 + (4− 3i)x2]z + x2z2 − 3x1z3.
58. b)


0 0 0 0
1 0 0 0
0 12 0 0
0 0 13 0
0 0 0 14

. c) {x, x2, x3, x4}. d) The empty set.
59. Let input vector be b =
(
b1 b2 b3 b4 b5
)T
, where b1, b2, b3, b4 and b5 are the amounts
of steel, plastics, rubber, glass and labour used respectively. Let the output vector be x =(
x1 x2 x3 x4
)T
, where x1, x2, x3, x4 are the numbers of station wagons, 4-wheel drives,
hatchbacks and sedans made. Then the factory is represented by the linear map
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS 229
TA : R
4 → R5, where TA(x) = b = Ax with
A =


1 1.5 0.8 0.9
0.5 0.6 0.7 0.6
0.1 0.2 0.2 0.25
0.2 0.15 0.3 0.3
1 1.5 1.1 0.9

 .
60. The matrix is
(
1 0
0 1
)
.
61. The matrix is
(−1 −3
2 4
)
.
62. −6 + 4x
63. For 48,


1 −3 0 0
0 0 2 −3
0 0 0 0
0 1 0 0
3 −1 2 4

. For 49,


3 4 0 0
0 3 8 0
0 0 3 12
0 0 0 3

.
For 50,


−8 0 0 0
0 −4 0 0
0 0 0 0
0 0 0 4

; For 51,


0 0 0 0
1 0 0 0
0 12 0 0
0 0 13 0
0 0 0 14


.
64. 48: im(T ) =
{
p ∈ P4(R) : p(z) = λ0 + λ1z + λ2z3 + λ3z4 for λ0, λ1, λ2, λ3 ∈ R
}
,
(note z2 is not in im(T )) rank = 4, ker(T ) = {0}, nullity(T ) = 0.
49: im(T ) = P3(R), rank(T ) = 4, ker(T ) = {0}, nullity(T ) = 0.
50: im(T ) =
{
p ∈ P3(R) : p(x) = λ0 + λ1x+ λ2x3 for λ0, λ1, λ2 ∈ R
}
, rank(T ) = 3,
ker(T ) = {p ∈ P3(R) : p(x) = λx2 for λ ∈ R}, nullity(T ) =1.
51: im(T ) =
{
p ∈ P4(R) : p(x) = λ1x+ λ2x2 + λ3x3 + λ4x4 for λ1, λ2, λ3, λ4 ∈ R
}
,
rank(T ) = 4, ker(T ) = {0}, nullity(T ) = 0.
65. The matrix A is diagonal with diagonal elements akk = (k − 1)(k − 2) − 3(k − 1) + 3 for
1 6 k 6 n + 1. The kernel is α1x + α2x
3 and the nullity is 2. Note that the kernel is the
solution of the homogeneous differential equation.
66. The matrix is


1√
2
0 1√
2
− 1√
2
0 1√
2
0 −1 0

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
230 CHAPTER 8
69. b) i)

1 1 −10 1 2
0 0 1

. ii)

1 2 00 1 4
0 0 1

.
70. b) λ
(
1 −3
−3 0
)
, λ ∈ F; 1. c) 3. d) No. e)

0 0 0 10 1 −1 0
3 1 0 0

.
71. a)




2
4
−2
4

 ,


1
2
−2
1

 ,


1
−3
2
−2



 . b) 3. c) a4 − a1 − a3 − a2 = 0.
d)




3
2
1
0
0
0

 ,


−12
0
−3
−3
1




. e) 3, 2.
Chapter 8
1. a) In each case, the eigenvalues are the diagonal entries and the respective eigenvectors are
te1 and te2 (t 6= 0).
For interpretations of (b), (c) and (d), see part (e) of question.
2. Eigenvalue is 2.
3. λ = (detA)1/3.
4. b)
(
0 1
1 0
)
.
5. a) λ = 2 with eigenvectors
{
t
(
1
2
)
: t 6= 0
}
and λ = 3 with eigenvectors
{
t
(
2
3
)
: t 6= 0
}
,
b) λ = −3 with eigenvectors
{
t
(
1
1
)
: t 6= 0
}
and λ = 1 with eigenvectors
{
t
(
1
3
)
: t 6= 0
}
.
7. a) λ = 3 with eigenvectors
{
t
(
1
1
)
: t 6= 0
}
and
λ = −1 with eigenvectors
{
t
( −1
1
)
: t 6= 0
}
.
b) Only one eigenvalue λ = 2 with multiplicity 2 and eigenvectors
{
t
(
1
0
)
: t 6= 0
}
.
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS 231
c) λ = 3 with eigenvectors
{
t
(
1
0
)
: t 6= 0
}
and
λ = −6 with eigenvectors
{
t
( −5
9
)
: t 6= 0
}
.
d) λ = 1± i with eigenvectors
{
t
(−1± i
1
)
: t 6= 0
}
.
e) λ = 5± i√3 with eigenvectors
{
t
(
±

3
2 +
1
2 i
1
)
: t 6= 0
}
.
f) λ = 5±√5 with eigenvectors
{
t
(
1
2 (1∓

5)i
1
)
: t 6= 0
}
.
9. The eigenvalues are the diagonal entries, 2, −2, 3, 5. Corresponding eigenvectors are
v1 =


1
0
0
0

 , v2 =


1
1
0
0

 , v3 =


1
1
5
0

 , v4 =


25
−3
21
14

 .
10. a) −1, 4, 6;

 −32
0

 ,

11
0

 ,

00
1

 . b) 2,−3, 3;

 0−1
6

 ,

 0−1
1

 ,

10
0

 .
11. In each of the following answers, the diagonal entries in D and the columns in M may be
rearranged in the same way and the answer is still correct. Also, any column in M may be
multiplied by a scalar and the new M is still correct.
For Question 7:
a) D =
(
3 0
0 −1
)
, M =
(
1 −1
1 1
)
.
b) The matrix is not diagonalisable.
c) D =
(
3 0
0 −6
)
, M =
(
1 −5
0 9
)
.
d) D =
(
1 + i 0
0 1− i
)
, M =
(
1 + i 1− i
1 1
)
.
e) D =
(
5 + i

3 0
0 5− i√3
)
, M =
(√
3
2 +
1
2 i −

3
2 +
1
2 i
1 1
)
.
f) D =
(
5 +

5 0
0 5−√5
)
, M =
(
1
2(1−

5)i 12(1 +

5)i
1 1
)
.
For Question 9:
D =


2 0 0 0
0 −2 0 0
0 0 3 0
0 0 0 5

 , M =


1 1 1 25
0 1 1 −3
0 0 5 21
0 0 0 14

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
232 CHAPTER 8
For Question 10:
a) D =

 −1 0 00 4 0
0 0 6

 , M =

 −3 1 02 1 0
0 0 1

 .
b) D =

 2 0 00 −3 0
0 0 3

 , M =

 0 0 1−1 −1 0
6 1 0

 .
15. If v is an eigenvector of T then the coordinate vector [v]B of v with respect to the basis B
is the corresponding eigenvector for the matrix A.
16. A5 =
( −78 330
55 −133
)
.
17. a) 6, t
(
1
2
)
, t 6= 0, t ∈ R; −4,
(−3
4
)
, t 6= 0, t ∈ R.
b) P =
(
1 −3
2 4
)
, D =
(
6 0
0 −4
)
. c)
(
6n (−3) (−4)n
2× 6n 4(−4)n
)
.
18. When A is diagonalisable, Ak = MDkM−1 and xk = Akx0. As a check, if you put k = 0 in
the answers below you should get A0 = I, whereas if you put k = 1 you should get A1 = A.
For Question 7:
a) Ak =
1
2
(
3k + (−1)k 3k + (−1)k+1
3k + (−1)k+1 3k + (−1)k
)
.
c) Ak =
1
9
(
3k+2 5(3k − (−6)k)
0 9(−6)k
)
.
19. α1, α2, α3 and α4 are arbitrary real numbers.
For Question 5:
a) y(t) = α1e
3t
(
2
3
)
+ α2e
2t
(
1
2
)
.
b) y(t) = α1e
t
(
1
3
)
+ α2e
−3t
(
1
1
)
.
For Question 9:
y(t) = α1e
2t


1
0
0
0

+ α2e−2t


1
1
0
0

+ α3e3t


1
1
5
0

+ α4e5t


25
−3
21
14

 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS 233
For Question 10:
a) y(t) = α1e
−t

−
3
2
1
0

+ α2e4t

11
0

+ α3e6t

00
1

 .
b) y(t) = α1e
2t


0
−16
1

+ α2e−3t

 0−1
1

+ α3e3t

10
0

 .
20. a) 1, λ
(
3
−1
)
, λ 6= 0; 5, λ
(
1
1
)
, λ 6= 0. b)
{
x1 = 3αe
t + βe5t,
x2 = −αet + βe5t.
21. a) x = 300et−200e3t, y = 150et−50e3t. b) x = −500+600e−2t , y = −100+200e−2t .
22. The solutions by the two methods are:
a) y(t) =
(
y(t)
y˙(t)
)
= α1e
t
(
1
1
)
+ α2e
t/5
(
1
1
5
)
, (matrix method)
y(t) = α1e
t + α2e
t/5. (calculus method)
b) y(t) =
(
y(t)
y˙(t)
)
= α1e
4t
(
1
4
)
+ α2e
−4t
(
1
−4
)
, (matrix method)
y(t) = α1e
4t + α2e
−4t. (calculus method)
23. The matrix method given in notes is not applicable as the matrix is not diagonalisable.
24. a) A =
(
0 1
− ca − ba
)
.
25. a) 500 years — 0.9048 : 0.0928 : 0.0024
1000 years — 0.8187 : 0.1722 : 0.0091
1000000 years — 1.4 × 10−87 : 7× 10−44 : 1
b) The associated matrix is not diagonalisable.
26. In the 12th:

0.98 0.02 0.030.01 0.96 0.03
0.01 0.02 0.94

11

300300
300

 ≈

378293
229

 ,
In the 24th:

0.98 0.02 0.030.01 0.96 0.03
0.01 0.02 0.94

23

300300
300

 ≈

426280
194

 .
27. In the 12th:

339262
205

 total = 806; In the 24th:

339222
154

 total = 715.
c©2020 School of Mathematics and Statistics, UNSW Sydney
234 CHAPTER 9
28. The population settles to the proportions 1.156 : 1.124 : 1.116 : 1.086 : 1 but eventually dies
out.
31. a) 6, t

 1−1
1

 , t ∈ R; 7, t

20
1

 , t ∈ R; 8, t

21
1

 , t ∈ R.
b) A =

 1 2 2−1 0 1
1 1 1

 , D =

6 0 00 7 0
0 0 8

 .
c) A−1 =

−1 0 22 −1 −3
−1 1 2

,
Mk =

−6k + 4× 7k − 2× 8k −2× 7k + 2× 8k 2× 6k − 6× 7k + 4× 8k6k − 8k 8k −2× 6k + 2× 8k
−6k + 2× 7k − 8k −7k + 8k 2× 6k − 3× 7k + 2× 8k


Chapter 9
1.
a) {a, c}, b) {f}, c) S, d) ∅,
e) {a, b, c, d, e}, f) {f}, g) {b}, h) {b}.
2. 32.
3. 81%, 95.3%.
4. 26.
5. a) A ∩Bc ∩ Cc, b) A ∪B ∪C, c) (A ∩B) ∪ (A ∩ C) ∪ (B ∩ C),
d) (A ∩Bc ∩ Cc) ∪ (Ac ∩B ∩Cc) ∪ (Ac ∩Bc ∩C),
e) (Ac ∩B ∩ C) ∪ (A ∩Bc ∩ C) ∪ (A ∩B ∩ Cc).
6. a)
5
36
. b)
1
6
. c)
3
4
.
7.
2
3
.
8. a)
3
50
, b)
1
2
, c)
47
50
.
9. 32%,
5
17
.
10. a)
19
45
, b)
11
25
, c)
6
11
.
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS 235
11. a) 25.24%, b) 0.0131, c) 0.000545.
12. No.
13. a) pn; b) 1− (1− p)n.
15. P (A1 ∩A2 ∩ · · · ∩An) = P (An|A1 ∩ · · · ∩An−1)P (An−1|A1 ∩ · · · ∩An−2) · · ·P (A2|A1)P (A1),
56%, 33.6%, 22.4%.
19.
x 0 1 2
P (X = x) 115
8
15
2
5
20. a) 0.214, b) 0.713.
21.
13
51
.
22. a) c =
1
e
. b) P (X = 2) =
1
2e
. c) P (X < 2) =
2
e
. d) P (X > 4) = 1− 8
3e
.
23. b)
(⌊αn⌋)2 + 3⌊αn⌋+ 2
n2 + 3n+ 2
, c) n = 5.
24. a)
θ(1− θ2n)
(1− θ2n+1)(1 + θ) , b)
1− θn+1
1− θ2n+1 .
25. a) c = 0.1, b) E(X) = 2.5, Var(X) = 2.05, c) E(Y ) = −9, Var(Y ) = 32.8.
26. µ =
n+ 1
2
; σ2 =
n2 − 1
12
.
28. µ =
α
1− α ; σ
2 =
α
(1− α)2 .
29. 0.1123.
30. 70p4q4+ 56p5q3 +28p6q2 +8p7q+ p8 where p = 14 , q =
3
4 . This evaluates to
7459
65536
≃ 0.1138.
31. 0.383.
32. 11.
34. P (X = k) =
2
3k
for k = 0, 1, 2, 3, . . ..
35. a)
(
19
2
)(
1
6
)3(5
6
)17
. b)
(
n− 1
k − 1
)(
1
6
)k (5
6
)n−k
.
c©2020 School of Mathematics and Statistics, UNSW Sydney
236 CHAPTER 9
37. a) B(15, 0.25). b) 0.08018. c) No.
38. a) B(23, 0.5). b) 2.86 × 10−6. c) Yes.
39. a) 6. b) B(8, 0.5) c) 0.1445. d) No.
40. a) 10. b) 12. c) B(12, 0.5). d) 0.01929. e) Yes.
42. a) µ =
1
2
(a+ b); σ2 =
1
12
(a− b)2.
b) µ =
k
k − 1 for k > 1; σ
2 =
k
(k − 1)2(k − 2) for k > 2.
c) µ = n+ 1, σ2 = n+ 1.
d) µ = 0, σ2 = 2.
43. a) α =
1
π
. b) c = −1. c) neither exists.
44. a)
9
16
, b)
7
16
, c) 0.4106, d)
49
12
.
45. a)
1
log 10
. b) F (y) =


0 y < 10
log(y/10)
log 10 10 6 y 6 100
1 y > 100
.
c) 103/2 ≈ 31.62. d) 90
log 10
≈ 39.09.
46. a) The graph of F (x) is . . .
0 1 2 3 4 5 6 7
1/8
1/4
3/8
1/2
5/8
3/4
7/8
1
F (x)
(5,1)
(3, 14 )
x
c©2020 School of Mathematics and Statistics, UNSW Sydney
ANSWERS 237
b) The function f(x) =


0 x 6 2
1
4 2 < x 6 3
3
8 3 < x 6 5
0 x > 5
will do. Its graph is shown below.
0 1 2 3 4 5 6 7
1/8
1/4
3/8
1/2
f(x)
x
c) E(X) = 358 .
47. a) E(Y ) =
2
α
+ 3; Var(Y ) =
4
α2
.
48. a) 0.8907. b) 0.0107. c) 0.3594 d) 0.8925. e) 0.1359 f) 0.3830
49. a) 0.8413. b) 0.2514. c) 0.2514 d) 0.6915. e) 0.2789 f) 0.8854
50. a) 0.93. b) -1.65. c) 47 d) 32
51. a) 0.7299. b) 81
52. 0.0401.
53. a) 1.2%, b) 6.7%, c) 75.9%.
54. 2.3% over, 0.4% under.
55. 0.0122
56. a) Binomial(288,0.25). b)
288∑
k=88
(
288
k
)(
1
4
)k (3
4
)288−k
. c) 0.017. d) Yes.
58. E(X) = Var(X) = 12 .
60. a) − 1λ log(1− p) b) 1λ log 2.
62. a) 0.487 b) 0.146 c) 0.264 d) 62.4 min.
63. a) 0.6703 b) 0.1353 c) more than 5.76 hours.
c©2020 School of Mathematics and Statistics, UNSW Sydney
238 CHAPTER 9
64. 0.4493
65. Let λ = λ1 + · · ·+ λn
a) P (T 6 t) = 1− e−λt,
{
1− e−λt, t > 0
0, t 6 0.
b)
{
λe−λt, t > 0
0, t 6 0.
c) The exponential distribution Exp(λ), mean = 1λ , variance =
1
λ2 .
c©2020 School of Mathematics and Statistics, UNSW Sydney
INDEX 239
Index
axiom, 3
basis, 35
ordered, 46
orthonormal, 38
basis by extension, 44
basis by reduction, 42
characteristic polynomial, 140
column space, 21
coordinate vector, 46
counterexample, 8
cumulative distribution function, 184
defective matrix, 143
diagonalisable matrix, 146
dimension, 40
Distribution
Exponential, 203
eigenvalue
linear map, 138
matrix, 138
eigenvector
linear map, 138
matrix, 138
exponential distribution
standard, 204
function, 63
addition, 63
codomain, 63
composition, 63
domain, 63
equality, 63
injective, 120
inverse, 121
multiplication, 63
multiplication by a scalar, 63
one-to-one, 120
onto, 120
range, 120
surjective, 120
image, 94
function, 120
linear map, 98
matrix, 98
kernel, 94
linear map, 95
matrix, 96
Laplace transform, 107
linear combination, 17
linear independence, 24
linear map, 80
addition condition, 80
and linear combination, 83, 84
inverse, 116
matrix representation, 86, 110
one-to-one, 116
onto, 116
scalar multiplication condition, 80
linear transformation, 80
linearly dependent set, 25
linearly independent set, 25
Markov chains, 154
normal distribution
standard, 199
null space, 94
nullity, 97
Phantasmagoria, 1
polynomial function, 56
polynomial, characteristic, 140
powers of matrix, 147
c©2020 School of Mathematics and Statistics, UNSW Sydney
240 INDEX
probability density function, 196
projection, 93
random variable
continuous, 195
rank, 100
Rank-Nullity Theorem, 119
linear map, 101
matrix, 101
rotation, 91
sets, 62
equality, 62
intersection, 63
union, 63
span, 17
spanning set, 18
standard deviation, 198
subset, 62
proper, 62
subspace, 11, 13
proper, 13
subspace theorem, 13
alternative, 49
variance, 198
vector
coordinate, 46
vector space
Mmn, 7, 50
Cn, 6
P, Pn, 7, 57, 58
Rn, 3
R[X], 8, 52
associative law, 3
cancellation property, 9
commutative law, 3
definition, 3
distributive law, 3
finite dimensional, 40

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468