辅导案例-MTH3320

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Lecture Notes for
MTH3320
Computational Linear Algebra
Tiangang Cui & Hans De Sterck
May 13, 2019
2
Contents
Preface vii
I Linear Systems of Equations 1
1 Introduction and Model Problems 3
1.1 A Simple 1D Example from Structural Mechanics . . . . . . 3
1.1.1 Discretising the ODE . . . . . . . . . . . . . . . 4
1.1.2 Formulation as a Linear System . . . . . . . . . 5
1.1.3 Solving the Linear System . . . . . . . . . . . . . 6
1.2 A 2D Example: Poisson’s Equation for Heat Conduction . . 7
1.2.1 Discretising the PDE . . . . . . . . . . . . . . . 8
1.2.2 Formulation as a Linear System . . . . . . . . . 8
1.2.3 Solving the Linear System . . . . . . . . . . . . . 10
1.3 An Example from Data Analytics: Netflix Movie Recommen-
dation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Movie Recommendation using Linear Algebra and
Optimisation . . . . . . . . . . . . . . . . . . . . 12
1.3.2 An Alternating Least Squares Approach to Solv-
ing the Optimisation Problem . . . . . . . . . . 14
2 LU Decomposition for Linear Systems 17
2.1 Gaussian Elimination and LU Decomposition . . . . . . . . 17
2.1.1 Gaussian Elimination . . . . . . . . . . . . . . . 17
2.1.2 LU Decomposition . . . . . . . . . . . . . . . . . 18
2.1.3 Implementation of LU Decomposition and Com-
putational Cost . . . . . . . . . . . . . . . . . . . 20
2.2 Banded LU Decomposition . . . . . . . . . . . . . . . . . . . 23
2.3 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 Definition of Matrix Norms . . . . . . . . . . . . 25
2.3.2 Matrix Norm Formulas . . . . . . . . . . . . . . 26
2.3.3 Spectral Radius . . . . . . . . . . . . . . . . . . 28
2.4 Floating Point Number System . . . . . . . . . . . . . . . . 29
2.4.1 Floating Point Numbers . . . . . . . . . . . . . . 29
2.4.2 Rounding and Unit Roundoff . . . . . . . . . . . 29
2.4.3 IEEE Double Precision Numbers . . . . . . . . . 30
2.4.4 Rounding and Basic Arithmetic Operations . . . 32
2.5 Conditioning of a Mathematical Problem . . . . . . . . . . . 32
i
ii Contents
2.5.1 Conditioning of a Mathematical Problem . . . . 32
2.5.2 Conditioning of Elementary Operations . . . . . 33
2.5.3 Conditioning of Solving a Linear System . . . . . 38
2.6 Stability of a Numerical Algorithm . . . . . . . . . . . . . . 40
2.6.1 A Simple Example of a Stable and an Unstable
Algorithm . . . . . . . . . . . . . . . . . . . . . . 40
2.6.2 Stability of LU Decomposition . . . . . . . . . . 42
3 Least-Squares Problems and QR Factorisation 45
3.1 Gram-Schmidt Orthogonalisation and QR Factorisation . . . 45
3.1.1 Gram-Schmidt Orthogonalisation . . . . . . . . . 45
3.1.2 QR Factorisation . . . . . . . . . . . . . . . . . . 47
3.1.3 Modified Gram-Schmidt Orthogonalisation . . . 47
3.2 QR Factorisation using Householder Transformations . . . . 48
3.2.1 Householder Reflections . . . . . . . . . . . . . . 49
3.2.2 Using Householder Reflections to Compute the
QR Factorisation . . . . . . . . . . . . . . . . . . 51
3.2.3 Computing Q . . . . . . . . . . . . . . . . . . . . 52
3.2.4 Computational Work . . . . . . . . . . . . . . . 53
3.3 Overdetermined Systems and Least-Squares Problems . . . . 54
3.3.1 The Normal Equations – A Geometric View . . . 55
3.3.2 The Normal Equations . . . . . . . . . . . . . . 55
3.3.3 Computational Work for Forming and Solving
the Normal Equations . . . . . . . . . . . . . . . 57
3.3.4 Numerical Stability of Using the Normal Equations 57
3.4 Solving Least-Squares Problems using QR Factorisation . . . 57
3.4.1 Geometric Interpretation in Terms of Projection
Matrices . . . . . . . . . . . . . . . . . . . . . . . 58
3.5 Alternating Least-Squares Algorithm for Movie Recommen-
dation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.5.1 Least-Squares Subproblems for Movie Recommen-
dation . . . . . . . . . . . . . . . . . . . . . . . . 60
4 The Conjugate Gradient Method for Sparse SPD Systems 63
4.1 An Optimisation Problem Equivalent to SPD Linear Systems 64
4.2 The Steepest Descent Method . . . . . . . . . . . . . . . . . 64
4.3 The Conjugate Gradient Method . . . . . . . . . . . . . . . 68
4.4 Properties of the Conjugate Gradient Method . . . . . . . . 70
4.4.1 Orthogonality Properties of Residuals and Step
Directions . . . . . . . . . . . . . . . . . . . . . . 70
4.4.2 Optimal Error Reduction in the A-Norm . . . . 73
4.4.3 Convergence Speed . . . . . . . . . . . . . . . . . 75
4.5 Preconditioning for the Conjugate Gradient Method . . . . . 77
4.5.1 Preconditioning for Solving Linear Systems . . . 77
4.5.2 Left Preconditioning for CG . . . . . . . . . . . 78
4.5.3 Preconditioned CG (PCG) Algorithm . . . . . . 79
4.5.4 Preconditioners for PCG . . . . . . . . . . . . . 82
4.5.5 Using Preconditioners as Stand-Alone Iterative
Methods . . . . . . . . . . . . . . . . . . . . . . 83
Contents iii
5 The GMRES Method for Sparse Nonsymmetric Systems 87
5.1 Minimising the Residual . . . . . . . . . . . . . . . . . . . . 87
5.2 Arnoldi Orthogonalisation Procedure . . . . . . . . . . . . . 88
5.3 GMRES Algorithm . . . . . . . . . . . . . . . . . . . . . . . 90
5.4 Convergence Properties of GMRES . . . . . . . . . . . . . . 92
5.5 Preconditioned GMRES . . . . . . . . . . . . . . . . . . . . 93
5.6 Lanczos Orthogonalisation Procedure for Symmetric Matrices 93
II Eigenvalues and Singular Values 97
6 Basic Algorithms for Eigenvalues 99
6.1 Example: Page Rank and Stochastic Matrix . . . . . . . . . 99
6.2 Fundamentals of Eigenvalue Problems . . . . . . . . . . . . . 104
6.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . 104
6.2.2 Eigenvalue and Eigenvector . . . . . . . . . . . . 105
6.2.3 Similarity Transformation . . . . . . . . . . . . . 106
6.2.4 Eigendecomposition, Diagonalisation, and Schur
Factorisation . . . . . . . . . . . . . . . . . . . . 107
6.2.5 Extending Orthogonal Vectors to a Unitary Matrix110
6.3 Power Iteration and Inverse Iteration . . . . . . . . . . . . . 112
6.3.1 Power Iteration . . . . . . . . . . . . . . . . . . . 112
6.3.2 Convergence of Power Iteration . . . . . . . . . . 113
6.3.3 Shifted Power Method . . . . . . . . . . . . . . . 115
6.3.4 Inverse Iteration . . . . . . . . . . . . . . . . . . 115
6.3.5 Convergence of Inverse Iteration . . . . . . . . . 116
6.4 Symmetric Matrices and Rayleigh Quotient Iteration . . . . 119
6.4.1 Rate of Convergence . . . . . . . . . . . . . . . . 119
6.4.2 Power Iteration and Inverse Iteration for Sym-
metric Matrices . . . . . . . . . . . . . . . . . . . 119
6.4.3 Rayleigh Quotient Iteration . . . . . . . . . . . . 121
6.4.4 Summary of Power, Inverse, and Rayleigh Quo-
tient Iterations . . . . . . . . . . . . . . . . . . . 123
7 QR Algorithm for Eigenvalues 125
7.1 Two Phases of Eigenvalue Computation . . . . . . . . . . . . 125
7.2 Hessenberg Form and Tridiagonal Form . . . . . . . . . . . . 127
7.2.1 Householder Reduction to Hessenberg Form . . . 129
7.2.2 Implementation and Computational Cost . . . . 131
7.2.3 The Symmetric Case: Reduction to Tridiagonal
Form . . . . . . . . . . . . . . . . . . . . . . . . 132
7.2.4 QR Factorisation of Hessenberg Matrices . . . . 134
7.3 QR algorithm without shifts . . . . . . . . . . . . . . . . . . 136
7.3.1 Connection with Simultaneous Iteration . . . . . 136
7.3.2 Convergence to Schur Form . . . . . . . . . . . . 139
7.3.3 The Role of Hessenberg Form . . . . . . . . . . . 140
7.4 Shifted QR algorithm . . . . . . . . . . . . . . . . . . . . . . 145
7.4.1 Connection with Inverse Iteration . . . . . . . . 145
7.4.2 Connection with Shifted Inverse Iteration . . . . 147
7.4.3 Connection with Rayleigh Quotient Iteration . . 147
iv Contents
7.4.4 Wilkinson Shift . . . . . . . . . . . . . . . . . . . 148
7.4.5 Deflation . . . . . . . . . . . . . . . . . . . . . . 148
8 Singular Value Decomposition 151
8.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . 151
8.1.1 Understanding SVD . . . . . . . . . . . . . . . . 151
8.1.2 Full SVD and Reduced SVD . . . . . . . . . . . 153
8.1.3 Properties of SVD . . . . . . . . . . . . . . . . . 155
8.1.4 Compare SVD to Eigendecomposition . . . . . . 156
8.2 Computing SVD . . . . . . . . . . . . . . . . . . . . . . . . . 158
8.2.1 Connection with Eigenvalue Solvers . . . . . . . 158
8.2.2 A Different Connection with Eigenvalue Solvers . 159
8.2.3 Bidiagonalisation . . . . . . . . . . . . . . . . . . 160
8.3 Low Rank Matrix Approximation using SVD . . . . . . . . . 164
8.4 Pseudo Inverse and Least Square Problems using SVD . . . 166
8.5 X-Ray Imaging using SVD . . . . . . . . . . . . . . . . . . . 170
8.5.1 Mathematical Model . . . . . . . . . . . . . . . . 170
8.5.2 Computational Model . . . . . . . . . . . . . . . 171
8.5.3 Image Reconstruction . . . . . . . . . . . . . . . 172
9 Krylov Subspace Methods for Eigenvalues 177
9.1 The Arnoldi Method for Eigenvalue Problems . . . . . . . . 177
9.2 Lanczos Method for Eigenvalue Problems . . . . . . . . . . . 183
9.3 How Arnoldi/Lanczos Locates Eigenvalues . . . . . . . . . . 185
10 Other Eigenvalue Solvers 191
10.1 Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . . . 191
10.2 Divide-and-Conquer . . . . . . . . . . . . . . . . . . . . . . . 194
A Appendices 197
A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
A.1.1 Vectors and Matrices . . . . . . . . . . . . . . . 198
A.1.2 Inner Products . . . . . . . . . . . . . . . . . . . 198
A.1.3 Block Matrices . . . . . . . . . . . . . . . . . . . 198
A.2 Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 200
A.2.1 Vector Norms . . . . . . . . . . . . . . . . . . . . 200
A.2.2 A-Norm . . . . . . . . . . . . . . . . . . . . . . . 200
A.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 202
A.4 Matrix Rank and Fundamental Subspaces . . . . . . . . . . 203
A.5 Matrix Determinants . . . . . . . . . . . . . . . . . . . . . . 204
A.6 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
A.6.1 Eigenvalues and Eigenvectors . . . . . . . . . . . 205
A.6.2 Similarity Transformations . . . . . . . . . . . . 206
A.6.3 Diagonalisation . . . . . . . . . . . . . . . . . . . 206
A.6.4 Singular Values of a Square Matrix . . . . . . . . 207
A.7 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . 208
A.8 Matrices with Special Structure or Properties . . . . . . . . 210
A.8.1 Diagonal Matrices . . . . . . . . . . . . . . . . . 210
A.8.2 Triangular Matrices . . . . . . . . . . . . . . . . 210
A.8.3 Permutation Matrices . . . . . . . . . . . . . . . 210
Contents v
A.8.4 Projectors . . . . . . . . . . . . . . . . . . . . . . 210
A.9 Big O Notation . . . . . . . . . . . . . . . . . . . . . . . . . 212
A.9.1 Big O as h→ 0 . . . . . . . . . . . . . . . . . . . 212
A.9.2 Big O as n→∞ . . . . . . . . . . . . . . . . . . 212
A.10 Sparse Matrix Formats . . . . . . . . . . . . . . . . . . . . . 213
A.10.1 Simple List Storage . . . . . . . . . . . . . . . . 213
A.10.2 Compressed Sparse Column Format . . . . . . . 213
Bibliography 215
vi Contents
Preface
This document contains lecture notes for MTH3320 – Computational
Linear Algebra. Since MTH3320 is offered for the first time in 2017 S2, the
notes will be built up as the term progresses.
• Part I of the unit covers numerical methods for solving linear systems
A~x = ~b (weeks 1-6).
• Part II of the unit covers numerical methods for computing eigenvalues
and singular values (weeks 7-12).
• The Appendix of the notes covers a brief and condensed review of back-
ground material in linear algebra (which may be reviewed in the lectures
with some more detail, as needed).
These notes are intended to be used in conjunction with the lectures. In their
first incarnation, these notes will be quite dense, and, depending on the topic,
more details and explanations may be provided in the lectures.
Useful reference books on numerical linear algebra include [Saad, 2003],
[Trefethen and Bau III, 1997], [Bjo¨rck, 2015], [Linge and Langtangen, 2016],
[Gander et al., 2014], [Demmel, 1997], [Saad, 2011], [Quarteroni et al., 2010],
[Ascher and Greif, 2011].
vii
viii Preface
Synopsis of MTH3320
The overall aim of this unit is to study the numerical methods for matrix com-
putations that lie at the core of a wide variety of large-scale computations and
innovations in the sciences, engineering, technology and data science. Students
will receive an introduction to the mathematical theory of numerical methods
for linear algebra (with derivations of the methods and some proofs). This will
broadly include methods for solving linear systems of equations, least-squares
problems, eigenvalue problems, and other matrix decompositions. Special at-
tention will be paid to conditioning and stability, dense versus sparse problems,
and direct versus iterative solution techniques. Students will learn to imple-
ment the computational methods efficiently, and will learn how to thoroughly
test their implementations for accuracy and performance. Students will work on
realistic matrix models for applications in a variety of fields. Applications may
include, for example: computation of electrostatic potentials and heat conduc-
tion problems; eigenvalue problems for electronic structure calculation; ranking
algorithms for webpages; algorithms for movie recommendation, classification of
handwritten digits, and document clustering; and principal component analysis
in data science.
Part I
Linear Systems of Equations

Chapter 1
Introduction and Model
Problems
Objectives of this chapter
1. Motivation: In Part I of the unit we will develop, analyse, and implement
numerical methods to solve large linear systems A~x = ~b.
2. This introductory chapter gives some examples of linear systems in real-
life applications.
3. These examples will be used as model problems throughout Part I of the
unit.
1.1 A Simple 1D Example from Structural Mechanics
Consider a string of unit length under tension T, which is subjected to a trans-
verse distributed load of magnitude p(x) per unit length (see figure). Let u(x)
denote the vertical displacement at point x. We choose signs such that both p(x)
and u(x) are positive in the upward direction.
For small displacements, the vertical displacement u(x) is governed by the
ordinary differential equation (ODE)
du(x)
dx2
= −p(x)
T
.
3
4 Chapter 1. Introduction and Model Problems
Since the string is fixed on the left and right, we can use boundary conditions
u(0) = 0 and u(1) = 0.
The problem of finding the displacement u(x) is fully specified by the follow-
ing ODE boundary value problem (BVP):
BVP

du(x)
dx2
= −p(x)
T
x ∈ [0, 1]
u(0) = 0
u(1) = 0
We can approximate the solution to this problem numerically by discretising
the ODE and solving the resulting linear system A~v = ~b.
1.1.1 Discretising the ODE
We discretise the ODE by deriving a finite different approximation for the second
derivative in the ODE, using Taylor series expansions:
u(x+ h) = u(x) + u′(x)h+ u′′(x)h2/2 + u′′′(x)h3/6 +O(h4),
u(x− h) = u(x)− u′(x)h+ u′′(x)h2/2− u′′′(x)h3/6 +O(h4).
Summing these up gives
u(x+ h) + u(x− h) = 2u(x) + u′′(x)h2 +O(h4),
from which we obtain
u′′(x) =
u(x+ h)− 2u(x) + u(x− h)
h2
+O(h2). (1.1)
We consider a grid that divides the problem domain [0, 1] into N + 1 intervals
of equal length
∆x = h =
1
N + 1
with N + 2 equally spaced grid points xi given by
xi = ih i = 0, . . . , N + 1
(i.e., there are two boundary points x0 and xN+1 at x = 0 and x = 1, and there
are N interior points). We then approximate the unknown function u(x) (the
exact solution to the BVP) at the grid points by discrete approximations vi:
vi ≈ u(xi),
using the finite difference formula. That is, we solve the following discretised
BVP for the unknown numerical approximation values vi:
discretised BVP

vi+1 − 2vi + vi−1
h2
= −p(xi)
T
(i = 1, . . . , N)
xi = ih
v0 = 0
v1 = 0.
1.1. A Simple 1D Example from Structural Mechanics 5
1.1.2 Formulation as a Linear System
This discretised BVP can be written as a linear system
A~v = ~b
with N equations for the N unknowns vi (i = 1, . . . , N) at the interior points
of the problem domain. We normally consider square matrices of size n× n, so
for this problem, the total number of unknowns n equals the number of interior
grid points, i.e., we have n = N .
We write the discretised BVP as A~v = ~b with the matrix A ∈ Rn×n given by
the so-called 1-dimensional (1D) Laplacian matrix:
Definition 1.1: 1D Laplacian Matrix (Model Problem 1)
A =

−2 1 0 . . . 0
1 −2 1 0 . . . 0
0 1 −2 1 0 . . . 0
...
. . .
. . .
. . .
...
0 . . . 0 1 −2 1 0
0 . . . 0 1 −2 1
0 . . . 0 1 −2

(1.2)
The vectors ~v and ~b in A~v = ~b are given by
~v =

v1
v2
...
...
vn−1
vn

~b = −h
2
T

p(x1)
p(x2)
...
...
p(xn−1)
p(xn)

,
with h = 1/(n+ 1).
Note that the matrix A is tridiagonal, and it is very sparse: it has very few
nonzero elements (close to 3 per row, on average).
Definition 1.2: Sparse Matrix
Let A ∈ Rm×n.
1. nnz(A) is the number of nonzero elements of A
2. A is called a sparse matrix if
nnz(A) mn.
Otherwise, A is called a dense matrix.
Efficient numerical methods for this problem should exploit this sparsity,
and the study of efficient numerical methods for sparse matrix problems is an
important focus of this unit.
6 Chapter 1. Introduction and Model Problems
1.1.3 Solving the Linear System
Suppose the transverse load in the above problem is given specifically by
p(x) = −(3x+ x2) exp(x),
and T = 100.
The figure below shows the numerical approximation ~v obtained from solving
A~v = ~b, for n = N = 2, 4, 8, 16.
0 0.2 0.4 0.6 0.8 1
-5
-4
-3
-2
-1
0 #10
-3
0 0.2 0.4 0.6 0.8 1
-5
-4
-3
-2
-1
0 #10
-3
0 0.2 0.4 0.6 0.8 1
-5
-4
-3
-2
-1
0 #10
-3
0 0.2 0.4 0.6 0.8 1
-5
-4
-3
-2
-1
0 #10
-3
As it happens, the exact solution to this problem can also be obtained in
closed form:
u(x) = x (x− 1) exp(x)/100,
(it is shown in blue in the figure). This allows us to verify the accuracy of
the numerical approximation, and it can be shown theoretically and verified
numerically that the error u(xi)− vi = O(h2).
Problem 1.3: Vertical Displacement in a String (not examinable)
Can you show that the vertical displacement u(x) is governed by the ODE
uxx = −p(x)/T?
(Hint: Assume that displacements are small, so that the tension T can be
taken as constant over the whole string, and so that the angle θ can be consid-
ered small (θ is measured from the horizontal in counter-clockwise direction).
Consider vertical force equilibirum.)
1.2. A 2D Example: Poisson’s Equation for Heat Conduction 7
1.2 A 2D Example: Poisson’s Equation for Heat
Conduction
We first consider models for heat flow in a metal plate.
The flow of heat in a metal plate can be modeled by the heat equation,
which is a partial differential equation (PDE) that describes the evolution of the
temperature in the plate, u(x, y, t), in space and time:
∂u
∂t
= κ
(
∂2u
∂x2
+
∂2u
∂y2
)
+ g(x, y). (1.3)
Here, κ is the heat conduction coefficient, and g(x, y) is a heat source or sink.
We consider the specific problem of determining the stationary temperature
distribution in a square domain Ω of length 1 m, (x, y) ∈ Ω = [0, 1] × [0, 1],
with the temperature on the four boundaries fixed at u = u0 where u0 = 600
Kelvin, and with a heat source g(x, y) with Gaussian profile centered at (x, y) =
(3/4, 3/4) :
g(x, y) = 10, 000 exp
(
− (x− 3/4)
2 + (y − 3/4)2
0.01
)
.
For simplicity we set the heat conduction coefficient to κ = 1. Since we seek a
stationary solution, we can set the time derivate in Eq. (1.3) equal to zero, and
solve
∂2u
∂x2
+
∂2u
∂y2
= f(x, y), (1.4)
with f(x, y) = −g(x, y)/κ.
The problem of finding the stationary temperature profile u(x, y) is then fully
specified by the following PDE boundary value problem (BVP):
BVP

∂2u
∂x2
+
∂2u
∂y2
= −g(x, y)
(x, y) ∈ Ω = [0, 1]× [0, 1]
u(x, y) = u0 on ∂Ω,
where ∂Ω denotes the boundary of the spatial domain Ω.
We can approximate the solution to this problem numerically by discretising
the PDE and solving the resulting linear system A~v = ~b.
Eq. (1.4) is called Poisson’s equation, and it arises in many areas of applica-
tion, including Newtonian gravity, electrostatics, or elasticity. When g(x, y) = 0,
the equation is called Laplace’s equation. The symbol ∆ is often used as a short-
hand notation for the differential operator in Eq. (1.4), and
∆u =
∂2u
∂x2
+
∂2u
∂y2
is called the Laplacian of u. Note that the 1D string problem described in the
previous section features the 1D version of the Laplacian operator. The Laplacian
operator can clearly also be extended to dimension 3 and higher.
8 Chapter 1. Introduction and Model Problems
1.2.1 Discretising the PDE
We discretise the PDE by using finite difference approximations for the second-
order partial derivatives that are similar to Eq. (1.1):
∂2u(x, y)
∂x2
=
u(x+ h, y)− 2u(x, y) + u(x− h, y)
h2
+O(h2),
∂2u(x, y)
∂y2
=
u(x, y + h)− 2u(x, y) + u(x, y − h)
h2
+O(h2).
We consider a regular Cartesian grid that partitions the problem domain into
squares of equal size by dividing both the x-range and the y-range into N + 1
intervals of equal length
∆x = ∆y = h =
1
N + 1
.
The grid points xi and yj are given by
xi = ih i = 0, . . . , N + 1,
yj = jh j = 0, . . . , N + 1,
(i.e., there are layers of boundary points at x0, xN+1, y0, and yN+1, and there
are N2 interior points). We then approximate the unknown function u(x, y) (the
exact solution to the BVP) at the grid points by discrete approximations wi,j :
wi,j ≈ u(xi, yj),
using the finite difference formula. That is, we solve the following discretised
BVP for the unknown numerical approximation values wi,j :
discretised BVP

wi+1,j + wi,j+1 − 4wi + wi−1,j + wi,j−1
h2
= −g(xi, yj)
(i, j = 1, . . . , N)
xi = ih, yi = jh
w0,j = wN+1,j = wi,0 = wi,N+1 = u0.
(1.5)
1.2.2 Formulation as a Linear System
Similar to the 1D model problem, the 2D discretised BVP can be written as a
linear system
A~v = ~b,
with now N2 equations for the N2 unknowns wi,j (i, j = 1, . . . , N) at the interior
points of the problem domain. Here, A ∈ Rn×n with total number of unknowns
n = N2.
We first have to assemble the N2 unknowns wi,j (i, j = 1, . . . , N) into a
single vector ~v. We can do this using lexicographic ordering by rows, in which
we assemble rows of wi,j in the spatial domain into ~v, from top to bottom
1.2. A 2D Example: Poisson’s Equation for Heat Conduction 9
starting from row j = 1, and from left to right within each row. For example,
when N = 3, the vector ~v is given by
~v =

w1,1
w2,1
w3,1
w1,2
w2,2
w3,2
w3,1
w3,2
w3,3

.
Next, if we want to write the BVP as a linear system A~v = ~b of the N2
interior unknowns, the values of wi,j at boundary points of the domain need to
be moved to the right-hand side (RHS) of the discretised PDE in Eq. (1.5). If
we do this, the system matrix in A~v = ~b is given by the so-called 2-dimensional
(2D) Laplacian matrix:
Definition 1.4: 2D Laplacian Matrix (Model Problem 2)
A =

T I 0 . . . 0
I T I 0 . . . 0
0 I T I 0 . . . 0
...
. . .
. . .
. . .
...
0 . . . 0 I T I 0
0 . . . 0 I T I
0 . . . 0 I T

∈ Rn×n, (1.6)
where n = N2 and T and I are block matrices ∈ RN×N :
T =

−4 1 0 . . . 0
1 −4 1 0 . . . 0
0 1 −4 1 0 . . . 0
...
. . .
. . .
. . .
...
0 . . . 0 1 −4 1 0
0 . . . 0 1 −4 1
0 . . . 0 1 −4

∈ RN×N , (1.7)
and I is the N ×N identity matrix.
The vector ~b in A~v = ~b is given by −h2g(x, y) evaluated in xi and yj , plus a
contribution of −u0 for every neighbour of wi,j that lies on the boundary. For
the simple example with N = 3 (where only the midpoint of the grid does not
10 Chapter 1. Introduction and Model Problems
have neighbour points on the boundary), ~b is given by
~b =

−h2g1,1 − 2u0
−h2g2,1 − u0
−h2g3,1 − 2u0
−h2g2,1 − u0
−h2g2,2
−h2g2,3 − u0
−h2g3,1 − 2u0
−h2g3,2 − u0
−h2g3,3 − 2u0

,
where gi,j = g(xi, yj) and h = 1/(N + 1).
Note that the matrix A is block tridiagonal, and it is very sparse: it has very
few nonzero elements (close to 5 per row, on average). Again, it is essential that
efficient numerical methods for this problem exploit this sparsity, and the study
of efficient numerical methods for sparse matrices like the 2D Laplacian matrix
is an important focus of this unit.
For example, a 2D resolution of 1000 × 1000 grid points is quite modest for
scientific applications on current-day computers. In this case, A ∈ Rn×n with
n = N2 = 106. Using Gaussian elimination (or, equivalently, LU decomposition)
in a naive fashion (without taking advantage of the zeros in the sparse matrix),
the number of floating point operations required, W , would scale like W =
O(n3) = O(1018), which would take a very large amount of time. In this unit we
will pursue methods for sparse matrices with work complexity approaching W =
O(n). Such methods power many of today’s advances in science, engineering and
technology.
1.2.3 Solving the Linear System
When considering the linear system in Matlab using N = 64, we obtain the
following plots for the source term and for the approximation of the temperature
profile (surface and contour plots, using Matlab’s mesh and contour):
0
1
2000
0.8
4000
1
6000
0.6 0.8
source
8000
0.60.4
10000
0.4
0.2 0.2
0 0
1.2. A 2D Example: Poisson’s Equation for Heat Conduction 11
source
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
600
1
610
620
630
0.8 1
640
650
0.6 0.8
approximate temperature profile
660
670
0.60.4
680
0.4
0.2 0.2
0 0
approximate temperature profile
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
12 Chapter 1. Introduction and Model Problems
1.3 An Example from Data Analytics: Netflix Movie
Recommendation
In 2006, the online DVD-rental and video streaming company Netflix launched a
competition for the best collaborative filtering algorithm to predict user ratings
for films, based on a training data set of previous ratings.
Netflix provided a training data set of 100,480,507 ratings that 480,189 users
gave to 17,770 movies, with ratings from 1 to 5 (integral) stars. Let the number
of users be given by m = 480, 189, and the number of movies by n = 17, 770.
Each rating consists of a triplet (i, j, v), where i is the user ID, j is the movie ID,
and v is the rating value in the range 1–5. The training ratings can be stored in
a sparse ratings matrix R ∈ Rm×n. The set of matrix indices with known values
is indicated by index set R = {(i, j)}. For example, a simple ratings matrix R
with m = 7 users and n = 4 movies could be given by
R =

2
3
5
1
1 5
1 5
2

,
with index set R = {(1, 2), (2, 2), (3, 1), (4, 2), (5, 3), (5, 4), (6, 1), (6, 4), (7, 3)}.
(To be precise, the ratings matrix is actually not a usual sparse matrix, in which
values that are not stored are assumed to be zero, but rather an incomplete
matrix, with values that are not stored considered unknown.)
The goal of a collaborative filtering algorithm is to predict the unknown
ratings in R based on the training data in R. These predicted ratings can then
be used to recommend movies to users. In linear algebra, this type of problem
is known as a matrix completion problem.
The recommendation problem (for movies, music, books, . . . ) can be seen as a
problem in the field of machine learning, which studies algorithms that can learn
from and make predictions on data. In the sub-category of supervised learning,
the computer is presented with example inputs and their desired outputs (the
training data set), and the goal is to learn a general rule that maps inputs to
outputs.
1.3.1 Movie Recommendation using Linear Algebra and Optimisation
A powerful approach to attack the matrix completion problem is to seek matrices
U ∈ Rf×m and V ∈ Rf×n, with f a small integer m,n, such that UTM
approximates the ratings matrix R on the set of known ratings, R. Pictorially,
1.3. An Example from Data Analytics: Netflix Movie Recommendation 13
we seek U and M such that
R

≈

UT

 M
 (1.8)
In practice, we will seek U and M that are dense, and we will allow their elements
to assume any real value. Each row in these matrices represents a latent feature
or factor of the data. The UTM decomposition of R effectively seeks to provide
a model that with a small number of features, f (typically chosen ≤ 50), is able
to provide good predictions for the unknown values in R.
The user and movie matrices U and M have shape
U =
 ~ui
 , (1.9)
M =
 ~mj
 . (1.10)
The column vectors of U , ~ui ∈ Rf , are called the user feature vectors, and the
column vectors of M , ~mj ∈ Rf , are called the movie feature vectors. With
f m,n, the interpretation of the approximation UTM is that, for each user i
and movie j, their affinity for each of the f latent ‘feature categories’ is encoded
in the vectors ~ui and ~mj . (For instance, if feature k were to represent the
‘commedy’ category, uk,i would express to which degree user i is into comedies,
and mk,j would express to which degree movie j is a comedy.)
The approximation UTM ofR with small f is called a low-rank approximation
of R, since 
UT

 M
 = f∑
k=1
(UT )∗k(M)k∗, (1.11)
where (UT )∗k is the kth column of UT and (M)k∗ is the kth row of M , and the
terms (UT )∗k(M)k∗ are m× n matrices of matrix rank 1.
We can seek user and movie matrices U and M that optimally approximate
the rating matrix R, if we choose a specific sense in which UTM should approx-
imate R. We define the Frobenius norm of a matrix by
14 Chapter 1. Introduction and Model Problems
Definition 1.5: Frobenius Norm of a Matrix
Let A ∈ Rm×n. Then the Frobenius norm of A is given by
‖A‖F =
√√√√ m∑
i=1
n∑
j=1
a2ij .
It is natural, then, to seek U and M such that the following measure of the
difference between UTM and R is minimised:
g(U,M) = ‖R− UTM‖2F,R , (1.12)
where the ‖·‖F,R norm is a partial Frobenius norm, summed only over the known
entries of R, as given by the index set R.
In practice, it is necessary to add a regularisation term to g(U,M), to ensure
the optimisation problem is well-posed and gives useful results. So the final
optimization problem we seek to solve for the recommendation task is
min
U,M
g(U,M) =
∑
(i,j)∈R
(
rij − ~uTi ~mj
)2
+ λ
 m∑
i=1
nnz((R)i∗) ‖~ui‖22 +
n∑
j=1
nnz((R)∗j) ‖~mj‖22
 , (1.13)
where nnz((R)i∗) is the number of movies ranked by user i, and nnz((R)∗j) is
the number of users that ranked movie j. The regularisation parameter λ is
a fixed number that can be chosen by trial-and-error or by techniques such as
cross-validation.
1.3.2 An Alternating Least Squares Approach to Solving the
Optimisation Problem
We seek U ∈ Rf×m and V ∈ Rf×n that minimize g(U,M). A popular way of
solving the optimisation problem is to determine U and M in an alternating
fashion: starting from an initial guess for M , determine the optimal U with M
fixed, then determine the optimal M with U fixed, and so forth. As it turns
out, each subproblem of determining U with fixed M (and vice versa) in this
alternating algorithm boils down to a (regularized) linear least-squares problem,
and the resulting procedure is called Alternating Least Squares (ALS). Also,
with fixed M , each column of U can be determined independent of the other
columns (and vice versa for M , with fixed U). This means that ALS can be
executed efficiently in parallel, which makes it suitable for big data sets.
The figure below (from [Winlaw et al., 2015]) shows the performance of ALS
on a small ratings matrix of size 400 × 80. Typically ALS requires quite a few
iterations to reach high accuracy, and it is possible to improve its convergence
behaviour, for example, as shown for the ALS-NCG method.
1.3. An Example from Data Analytics: Netflix Movie Recommendation 15
One of the focus areas of Part I of this unit is to solve least-squares problems
in accurate and efficient ways. We will return to the movie recommendation
problem in that context. In particular, we will learn how to derive the formulas
for determining U with fixed M , and vice versa, and will use them to solve movie
recommendation problems.
PS: On September 21, 2009, the grand prize of US$1,000,000 for the Net-
flix prize competition was given to the BellKor’s Pragmatic Chaos team which
bested Netflix’s own algorithm for predicting ratings by 10.06% (using a blend of
approaches, including multiple variations of the matrix factorization approach).
PPS: Although the Netflix prize data sets were constructed to preserve cus-
tomer privacy, in 2007, two researchers showed it was possible to identify indi-
vidual users by matching the data sets with film ratings on the Internet Movie
Database. On December 17, 2009, four Netflix users filed a class action law-
suit against Netflix, alleging that Netflix had violated U.S. fair trade laws and
the Video Privacy Protection Act by releasing the data sets. The sequel to the
Netflix prize was canceled. We are living in a crazy world.
16 Chapter 1. Introduction and Model Problems
Chapter 2
LU Decomposition for
Linear Systems
2.1 Gaussian Elimination and LU Decomposition
We consider nonsingular linear systems A~x = ~b where A ∈ Rn×n. We recall the
following theorem about solvability of linear systems.
Theorem 2.1
Let A ∈ Rn×n be nonsingular (i.e., det(A) 6= 0), and let ~b ∈ Rn. Then the
linear system A~x = ~b has a unique solution, given by ~x = A−1~b.
If A is singular, A~x = ~b either has infinitely many solutions (if ~b ∈ range(A)),
or no solution (if ~b /∈ range(A)).
2.1.1 Gaussian Elimination
We first consider standard Gaussian elimination (GE) and assume that no zero
pivot elements are encountered, so no pivoting (switching of rows) is required.
Example 2.2: One Step of Gaussian Elimination
Let
A =
 2 3 46 8 4
8 9 0

In the first step of GE, 2 is the pivot element, and we add -6/2 times row 1 to
row 2, and add -8/2 times row 1 to row 3, resulting in
A =
 2 3 40 −1 −8
0 −3 −16

17
18 Chapter 2. LU Decomposition for Linear Systems
For the case of a general system A~x = ~b, we can write the result of one step of
GE for 
a11 ~r
T
1
~c1 A
(2)


x1
~x(2)
 =

b1
~b(2)
 ,
as 
a11 ~r
T
1
0 A(2) − ~c1
a11
~rT1


x1
~x(2)
 =

b1
~b(2) − ~c1
a11
b1
 .
2.1.2 LU Decomposition
The following theorem and its proof show us that Gaussian elimination on A ∈
Rn×n (when no zero pivots are encountered) is equivalent to decomposing A as
the product LU of two triangular matrices, and tell us how to construct the L
and U factors.
Theorem 2.3: LU Decomposition
Let A ∈ Rn×n be a nonsingular matrix. Assume no zero pivots arise when
applying standard Gaussian elimination (without pivoting) to A.
Then A can be decomposed as A = LU , where L ∈ Rn×n is unit lower trian-
gular, and U ∈ Rn×n is upper triangular and nonsingular.
Proof. The proof proceeds by mathematical induction on n.
Base Case: The statement holds for n = 1, since for any a ∈ R1 = R, with
a 6= 0 (i.e., 1/a exists so a is nonsingular), the LU decomposition
a = l u
exists, with l = 1 and u = a nonsingular.
Induction step: we show that, if the statement of the theorem holds for
n− 1, then it holds for n.
We perform one step of Gaussian elimination on n× n matrix
A =

a11 ~r
T
1
~c1 A
(2)
 ,
which is assumed nonsingular and such that no zero pivots arise when applying
GE to it.
2.1. Gaussian Elimination and LU Decomposition 19
Since a11 6= 0 we can define the Gauss transformation matrix
M (1) =

1 0
~m1 I
(2)
 ,
with ~m1 = −~c1/a11 and I(2) the identity matrix of size (n−1)× (n−1). Then
the first step of Gaussian elimination can be written as
M (1)A =

a11 ~r
T
1
0 A(2) + ~m1~r
T
1
 =

a11 ~r
T
1
0 A˜
 ,
where A˜ is an (n − 1) × (n − 1) matrix for which no zero pivots arise when
applying GE to it (since the same holds for A); this also implies A˜ is nonsin-
gular.
By the induction hypothesis, A˜ can be decomposed as L˜U˜ , which leads to
M (1)A =

a11 ~r
T
1
0 L˜U˜

=

1 0
0 L˜


a11 ~r
T
1
0 U˜

or
A =
(
M (1)
)−1 
1 0
0 L˜


a11 ~r
T
1
0 U˜
 .
The inverse of M (1) is easily obtained from observing that
1 0
−~m1 I(2)


1 0
~m1 I
(2)
 = I = (M (1))−1M (1).
Then
A =

1 0
−~m1 I(2)


1 0
0 L˜


a11 ~r
T
1
0 U˜
 = LU,
20 Chapter 2. LU Decomposition for Linear Systems
with
L =

1 0
−~m1 L˜
 . (2.1)
Matrix U is nonsingular: det(U) = a11 det(U˜) 6= 0, since U˜ is nonsingular and
a11 6= 0.
This proves the induction step and completes the proof.
Eq. (2.1) shows that L can be obtained by inserting the multiplier elements
−~m1 = ~c1/a11 in its columns, for every step of Gaussian elimination.
Note also that, by the construction in the proof, the LU decomposition is unique
(when no pivoting is performed).
If pivoting is employed during Gaussian elimination (e.g., when zero pivots
arise), a similar theorem holds:
Theorem 2.4: LU Decomposition
For any A ∈ Rn×n, a decomposition PA = LU exists where P ∈ Rn×n is
a permutation matrix, L ∈ Rn×n is unit lower triangular, and U ∈ Rn×n is
upper triangular.
Here, P encodes the row permutations of the pivoting operations. The PA =
LU decomposition is unique when P is fixed, and this theorem also holds for
singular A.
Remark 2.5: Solving a Linear System using LU Decomposition
We can solve A~x = ~b in three steps:
1. compute L and U in the decomposition A = LU , leading to LU~x = ~b
2. solve L~y = ~b using forward substitution
3. solve U~x = ~y using backward substitution
2.1.3 Implementation of LU Decomposition and Computational Cost
Implementation of LU Decomposition
A basic implementation of LU decomposition in Matlab-like pseudo-code is given
by
2.1. Gaussian Elimination and LU Decomposition 21
Algorithm 2.6: LU decomposition, kij version
Input: Matrix A
Output: L and U
U=A;
L=I ;
for k=1:n−1 % p i v o t k
for i=k+1:n % row i
m=u( i , k )/u(k , k ) ;
u ( i , k )=0;
for j=k+1:n % column j
u( i , j )=u( i , j )−m∗u(k , j ) ;
end
l ( i , k)=m;
end
end
However, we can also implement LU decomposition in-place:
Algorithm 2.7: LU decomposition, kij version, in-place
Input: Matrix A
Output: L and U stored in A
for k=1:n−1 % p i v o t k
for i=k+1:n % row i
a ( i , k)=a ( i , k )/ a (k , k ) ;
for j=k+1:n % column j
a ( i , j )=a ( i , j )−a ( i , k )∗ a (k , j ) ;
end
end
end
Also, we can depart from the standard order of operations in Gaussian elimi-
nation and consider to do all the operations for row i of A at once, in the so-called
ikj version of the algorithm:
Algorithm 2.8: LU decomposition, ikj version, in-place
Input: Matrix A
Output: L and U stored in A
for i =2:n % row i
for k=1: i−1 % p i v o t k
a ( i , k)=a ( i , k )/ a (k , k ) ;
for j=k+1:n % column j
a ( i , j )=a ( i , j )−a ( i , k )∗ a (k , j ) ;
end
end
end
22 Chapter 2. LU Decomposition for Linear Systems
Computational Work for LU Decomposition
We now consider the amount of computational work that is spent by the LU
decomposition algorithm, in terms of the number of floating point operations
(flops) performed to decompose an n × n matrix A. We count the number
of additions, and subtractions (which we indicate by A) and the number of
multiplications, divisions, and square roots (indicated by M). We assume that
these operations take the same amount of work, which is a reasonable assumption
for modern computer processors.
The following summation identities are useful when determining computa-
tional work:
n−1∑
p=1
1 = n− 1,
n−1∑
p=1
p =
1
2
n(n− 1),
n−1∑
p=1
p2 =
1
6
n(n− 1)(2n− 1).
We consider the kij version of the algorithm and sum over the three nested
loops to determine the work W of LU decomposition:
W =
n−1∑
k=1
n∑
i=k+1
(1M +
n∑
j=k+1
(1M + 1A))
=
n−1∑
k=1
n∑
i=k+1
(1 + 2(n− k))
=
n−1∑
k=1
n∑
i=k+1
(1 + 2n− 2k)
=
n−1∑
k=1
(1 + 2n− 2k)(n− k)
=
n−1∑
k=1
((n+ 2n2)− k(2n+ 1 + 2n) + 2k2)
= (n− 1)(n+ 2n2)− (4n+ 1)(n− 1)n/2 + 2n(n− 1)(2n− 1)/6
= (2− 2 + 2/3)n3 +O(n2)
=
2
3
n3 +O(n2)flops.
As expected, the dominant term in the expression for the computational work is
proportional to n3, since LU decomposition entails three nested loops that are
of (average) length proportional to n, roughly speaking. We say that the com-
putational complexity of LU decomposition is cubic in the number of unknowns,
n. For example, for the 2D model problem of Eq. (1.6), with n = N2, we have
W = O(n3) = O(N6). For large problems, cubic complexity is often prohibitive,
and we will seek to exploit structural properties like sparsity to obtain methods
with lower computational complexity.
A similar computation shows that Forward substitution L~y = ~b and Back-
ward substitution U~x = ~y have computational work
W = n2 +O(n)flops.
2.2. Banded LU Decomposition 23
LU Decomposition for Symmetric Positive Definite Matrices
Finally, we note that if the matrix A is symmetric positive definite (SPD), pivot-
ing is never required in the LU decomposition and the symmetry can be exploited
to save about half the work.
Theorem 2.9
If A ∈ Rn×n is SPD, the decomposition A = LU , where L is unit lower
triangular and U is upper triangular, exists and is unique.
The above theorem implies that no zero pivot elements can occur in the LU
decomposition algorithm for SPD matrices (in exact arithmetic).
Theorem 2.10: Cholesky decomposition
If A ∈ Rn×n is SPD, the decomposition A = L̂L̂T , where L̂ is a lower triangular
matrix with strictly positive diagonal elements, exists and is unique.
In fact, it can be shown that L̂ = L
√
D and L̂T =
√
D−1 U , where D is the
diagonal matrix containing the diagonal elements of U , which are strictly positive
for an SPD matrix such that their square root can be taken in the diagonal matrix√
D. The work to compute the Cholesky decomposition is W = 13n
3 +O(n2).
2.2 Banded LU Decomposition
In this section, we consider special versions of the LU algorithm that save work
for sparse matrices that are zero outside a band around the diagonal.
Definition 2.11
A banded matrix A ∈ Rn×n is a sparse matrix whose nonzero entries are
confined to a band around the main diagonal. I.e.,
∃K < n s.t. aij = 0 ∀i, j s.t. |i− j| > K.
The smallest such K is called the bandwidth of A.
For example, for a diagonal matrix we have K = 0. For a tridiagonal matrix,
we have K = 1. For our 2D model problem, we have K = N − 1.
It turns out that, if A has bandwidth K, then we need to compute the U and
L factors only within the band. This can be proved formally, but it can also be
seen intuitively by considering, e.g., the kij version as in Algorithm 2.7. First,
the statement a(i,k)=a(i,k)/a(k,k) cannot create new nonzeros. Second, the
statement a(i,j)=a(i,j)-a(i,k)*a(k,j) can only create new nonzeros if both
the multiplier a(i,k) and the element in the pivot row a(k,j) are nonzero. It
turns out that a(i,k)=0 outside the band for all rows i on the left, and a(k,j)=0
outside the band for all rows i on the right. So the banded structure maintains
additional zero elements in L and U according to
lij = 0 if i− j > B,
uij = 0 if j − i > B,
so nonzeros in row i of L don’t occur before column j = i−B, and nonzeros in
row i of U don’t occur after column j = i+B.
24 Chapter 2. LU Decomposition for Linear Systems
This means that, for banded matrices with bandwidth B, we can safely mod-
ify the ranges of the loops in the the ikj version of the LU algorithm as follows:
Algorithm 2.12: Banded LU decomposition, ikj version, in-place
Input: Matrix A with bandwidth B
Output: L and U stored in A
for i =2:n % row i
for k=max(1 , i−B) : i−1 % p i v o t k
a ( i , k)=a ( i , k )/ a (k , k ) ;
for j=k+1:min( i+B, n) % column j
a ( i , j )=a ( i , j )−a ( i , k )∗ a (k , j ) ;
end
end
end
Computational Work for Banded LU Decomposition
The amount of computational work for banded LU can be estimated as follows.
We sum over the three nested loops and obtain an upper bound for the work:
W =
n∑
i=2
i−1∑
k=max(1,i−B)
(1 +
min(i+B,n)∑
j=k+1
2) flops
≤
n∑
i=2
i−1∑
k=max(1,i−B)
(1 +
i+B∑
j=k+1
2)
=
n∑
i=2
i−1∑
k=max(1,i−B)
(1 + 2(i+B − k))
≤
n∑
i=2
i−1∑
k=i−B
(1 + 2(i+B)− 2k)
=
n∑
i=2
B(1 + 2(i+B))− 2
i−1∑
k=i−B
k
=
n∑
i=2
B(1 + 2(i+B))− 2(
i−1∑
k=1
k −
i−B−1∑
k=1
k)
=
n∑
i=2
B(1 + 2(i+B))− (i(i− 1)− (i−B)(i−B − 1))
=
n∑
i=2
B(1 + 2B + 2i))− (2Bi−B −B2)
=
n∑
i=2
3B2 + 2B
≤ n(3B2 + 2B),
2.3. Matrix Norms 25
so
W = O(B2n).
Notes:
• For the 1D model problem, B = 1, so we get
W ≤ n(3 + 2) = 5n,
i.e.,
W = O(n).
(This boils down to the so-called Thomas algorithm.)
• For the 2D model problem, with n = N2, we have B = N , so
W = O(B2n)
= O(N2n)
= O(n2)
= O(N4),
which is much better than the W = O(n3) = O(N6) cost of the regular
LU algorithm. (E.g., compare for N = 103; you save a factor 106 in work.)
• Further improvements in cost for the 2D model problem can be obtained
using more advanced techniques, which reorder the variables and equations
to minimize the bandwidth, or, more generally, to minimize the fill-in (i.e.,
the creation of new non-zeros) in the L and U factors. For example, the so-
called nested dissection algorithm obtains W = O(n3/2) for the 2D model
problem.
Still, it is possible to do better (up to W = O(n)) using iterative methods,
which, rather than direct methods like Gaussian elimination that solve
A~x = ~b exactly (in exact arithmetic) after n steps, solve the problem
iteratively starting from an initial guess ~x0 that is iteratively improved
over a number of steps until a desired accuracy is reached, typically in a
number of steps that is much smaller than n. These iterative methods are
the subject of the last three chapters in Part I of these notes.
2.3 Matrix Norms
In order to discuss accuracy and stability of algorithms for solving linear system,
we need to define ways to measure the size of a matrix. For this reason, we
consider the following matrix norms.
2.3.1 Definition of Matrix Norms
Definition 2.13: Natural or Vector-Induced Matrix Norm
Let ‖ · ‖p be a vector p-norm. Then for A ∈ Rn×n, the matrix norm induced
by the vector norm is given by
‖A‖p = max
~x6=0
‖A~x‖p
‖~x‖p .
26 Chapter 2. LU Decomposition for Linear Systems
Note: alternatively, we may also write
‖A‖p = max‖~x‖p=1 ‖A~x‖p.
Theorem 2.14
Let A ∈ Rn×n. The vector-induced matrix norm function ‖A‖p is a norm on
the vector space of real n × n matrices over R. That is, ∀A,B ∈ Rn×n and
∀ a ∈ R, the following hold
1. ‖A‖ ≥ 0, and ‖A‖ = 0 iff A = 0
2. ‖aA‖ = |a|‖A‖
3. ‖A+B‖ ≤ ‖A‖+ ‖B‖.
In addition, the following properties also hold:
Theorem 2.15
1. ‖A~x‖p ≤ ‖A‖p‖~x‖p
2. ‖AB‖p ≤ ‖A‖p‖B‖p.
Here we only prove part 1.
Proof. If ~x = 0, the inequality holds.
For any ~x 6= 0, we have
‖A~x‖p
‖~x‖p ≤ max~x6=0
‖A~x‖p
‖~x‖p = ‖A‖p,
by the definition of matrix norm. Hence, ‖A~x‖p ≤ ‖A‖p‖~x‖p.
Note: the Frobenius norm introduced in Def. 1.5 is an example of a matrix
norm that is not induced by a vector norm.
2.3.2 Matrix Norm Formulas
We can derive the following specific expressions for some commonly used matrix
p-norms.
2.3. Matrix Norms 27
Theorem 2.16
Let A ∈ Rn×n.
1.
‖A‖∞ = max
1≤i≤n
 n∑
j=1
|aij |
 “maximum absolute row sum”
2.
‖A‖1 = max
1≤j≤n
(
n∑
i=1
|aij |
)
“max absolute column sum”
3.
‖A‖2 = max
1≤i≤n
√
λi(ATA)
= max
1≤i≤n
√
λi(AAT )
= max
1≤i≤n
σi
where λi(A
TA) are the eigenvalues of ATA and σi are the singular values
of A.
Here we only prove part 1.
Proof. We will derive the formula for the matrix infinity norm using the
second variant of the defition,
‖A‖∞ = max‖~x‖∞=1 ‖A~x‖∞.
Also, observe that ‖~x‖∞ = 1 iff max1≤i≤n |xi| = 1.
Let
r = max
1≤i≤n
 n∑
j=1
|aij |
 (maximum absolute row sum).
We first show that ‖A‖∞ ≤ r. This follows from ‖A~x‖∞ ≤ r if ‖~x‖∞ = 1,
since then
|(A~x)i| =
∣∣∣∣∣∣
n∑
j=1
aijxj
∣∣∣∣∣∣ ≤
n∑
j=1
|aij ||xj | ≤
n∑
j=1
|aij | ≤ r for any i.
Now, to show that ‖A‖∞ = r, it is sufficient to find a specific ~y s.t. ‖~y‖∞ = 1
and ‖A~y‖∞ = r. Let ν be the index of a row in A with maximum absolute
row sum, meaning that
n∑
j=1
|aνj | = r.
28 Chapter 2. LU Decomposition for Linear Systems
Define ~y as follows:
yj := sign(aνj) =
 1 if aνj > 00 if aνj = 0−1 if aνj < 0
This ~y converts each aνjyj into |aνj | in the formula for the νth component of
the product A~y, so we have:
|(A~y)ν | =
∣∣∣∣∣∣
n∑
j=1
aνjyj
∣∣∣∣∣∣ =
∣∣∣∣∣∣
n∑
j=1
|aνj |
∣∣∣∣∣∣ =
n∑
j=1
|aνj | = r.
Therefore ‖A~y‖∞ = r with ‖~y‖∞ = 1, and so ‖A‖∞ = r.
2.3.3 Spectral Radius
Definition 2.17
Let A ∈ Rn×n with eigenvalues λi, i = 1, . . . , n. The spectral radius ρ(A) of A
is given by
ρ(A) = max
1≤i≤n
|λi|.
Theorem 2.18
Let A ∈ Rn×n. For any matrix p-norm, it holds that
ρ(A) ≤ ‖A‖p.
Remark 2.19
The matrix 2-norm formula simplifies as follows when A is symmetric:
‖A‖2 = max
1≤i≤n
√
λi(ATA)
= max
1≤i≤n
√
λi(A2)
= max
1≤i≤n
√
λi(A)2
= max
1≤i≤n
|λi(A)| = ρ(A).
2.4. Floating Point Number System 29
2.4 Floating Point Number System
2.4.1 Floating Point Numbers
Definition 2.20
The floating point number system F (β, t, L, U) consists of the set of floating
point numbers x of format
x = ±d1.d2d3 · · · dt βe = m βe
where
m = ±d1.d2d3 · · · dt
is called the mantissa, β is called the base, e is called the exponent, and t is
called the number of digits in the mantissa. The digits di are specified by
di ∈ {0, 1, . . . , β − 1} (i = 2, . . . , t)
d1 ∈ {1, . . . , β − 1}
and the exponent satisfies
L ≤ e ≤ U.
Note: The mantissa is normalised, by requiring d1 to be nonzero.
2.4.2 Rounding and Unit Roundoff
Definition 2.21
Let x ∈ R. The rounded representation of x in F (β, t, L, U) is indicated by
fl(x).
Most computer systems use the rounding rule round to nearest, tie to even,
as in the following example. (The tie-to-even part serves to avoid bias up or
down.)
Example 2.22
Consider floating point number system F (β = 10, t = 4, L = −10, U = 10).
Some examples illustrating the round to nearest, tie to even rule:
x = 123.749 fl(x) = 1.237 102
x = 123.751 fl(x) = 1.238 102
x = 123.750 fl(x) = 1.238 102 (tie!)
x = 123.850 fl(x) = 1.238 102 (tie!)
30 Chapter 2. LU Decomposition for Linear Systems
Theorem 2.23
Consider a floating point number system F (β, t, L, U) with a rounding-to-
nearest rule. Let fl(x) be the rounded representation of x ∈ R, x 6= 0. Then
the relative error in the representation of x is bounded by
|x− fl(x)|
|x| ≤ µ =
1
2
β−t+1.
Here, µ is called the unit roundoff (also, sometimes, machine precision or
machine epsilon).
Proof. Let
x = m βe
and
fl(x) = m βe.
Since
m = ± d1.d2d3 · · · dtdt+1 . . .
= ± d1 + d2β−1 + d3β−2 + · · ·+ dtβ−t+1 + dt+1β−t + . . . ,
and rounding to nearest with t digits is used, we have
|m−m| ≤ 1
2
β−t+1,
so
|x− fl(x)| ≤ 1
2
β−t+1βe,
or
|x− fl(x)|
|x| ≤
1
2
β−t+1
|m|
βe
βe
≤ 1
2
β−t+1 = µ,
because m ≥ 1.
Note: We can also write
fl(x) = x(1 + ν) with |ν| ≤ µ,
because ν = (fl(x)− x)/x so |ν| ≤ µ.
2.4.3 IEEE Double Precision Numbers
The IEEE double precision standard is being used on most computers for repre-
senting floating point numbers in hardware and carrying out computations with
them. For instance, Matlab normally uses double precision numbers. Higher
precision numbers can be represented in software, but are much slower to work
with than the native hardware representations.
2.4. Floating Point Number System 31
Example 2.24: IEEE Double Precision Numbers
The IEEE double precision floating point number system is based on F (β =
2, t = 53, L = −1022, U = 1023). It is a binary system with 53 digits in the
mantissa, and exponent range from -1022 to 2023. It represents numbers in
the format
x = 1.01001 · · · 001 2e = m βe.
Here, the first digit of the mantissa
m = ±1.f
does not need to be stored because it is always 1 (due to the normalisation).
The fraction f has 52 digits. The sign of the mantissa is stored in a sign bit
s. A shifted form of the exponent is stored:
E = e+ 1023,
such that E is an integer between 1 and 2046, which can be represented by
11 bits (211 = 2048). In total, storing an IEEE double precision number in
computer memory requires 64 bits (i.e., 8 bytes):
s f E
1 bit 52 bits 11 bits
Numbers with E in the range 1 ≤ E ≤ 2046 represent the standard normalised
numbers. The values E = 0 and E = 2047 are used to represent special
numbers:
1 ≤ E ≤ 2046 : x = (−1)s(1.f)2E−1023 (normalised numbers)
E = 2047 : f 6= 0 =⇒ x = NaN (not a number, e.g. 0/0)
f = 0 =⇒ x = (−1)sInf (infinity, e.g. 1/0)
E = 0 : f = 0 =⇒ x = 0
f 6= 0 =⇒ (denormalised numbers:
mantissa is not normalised)
e.g. x = 0.0001011010 . . . 0 2−1022
With β = 2 and t = 53, the unit roundoff is
µ =
1
2
β−t+1 =
1
2
β−53+1 = 2−53 ≈ 1.1 10−16,
which is roughly equivalent to β = 10, t = 16 (then µ = 0.5 10−16+1 = 5 10−16.
We say that double precision binary numbers have between 16 and 17 decimal
digits of (relative) accuracy.
The smallest positive nonzero (normalised) number (realmin in Matlab) is
1.0 . . . 0 2−1022 ≈ 2.2 10−308,
and the largest positive number (realmax in Matlab) is
1.1 . . . 1 21023 = (2− 252) 21023 ≈ 21024 ≈ 1.8 10308.
Note, in Matlab, eps is the distance from 1 to the next larger floating point
number. We have eps=2µ.
32 Chapter 2. LU Decomposition for Linear Systems
2.4.4 Rounding and Basic Arithmetic Operations
Basic arithmetic operations such as addition, subtraction, multiplication, divi-
sion, and square root, are implemented in computer hardware such that the
rounded representation of the exact result is obtained. (This is achieved by us-
ing additional digits of precision when computing intermediate results.)
More precisely, assume x and y are floating point numbers stored in computer
memory, after rounding (i.e., x = fl(x), and y = fl(y)). Let x+ y be the result
computed and stored by the computer (after rounding). Then the IEEE stan-
dard requires that the + operation be implemented in computer hardware such
that
x+ y = fl(x+ y),
i.e., the result of x+ y evaluated on the computer is the exact x + y, rounded
to its floating point representation. This is a stringent requirement! This also
implies
x+ y = (x+ y)(1 + ν) with |ν| ≤ µ.
Similary, we have
x− y = fl(x− y),
x ∗ y = fl(x ∗ y),
x/y = fl(x/y),
√
x = fl(
√
x).
Other standard functions like sin(x) and exp(x) are typically implemented in
software, and don’t have the same accuracy guarantees. When they are evalu-
ated, we can normally assume that the relative errors satisfy bounds like
sin(x) = sin(x)(1 + c1ν),
exp(x) = exp(x)(1 + c2ν),
with |ν| ≤ µ and c1 and c2 constants not much larger than 1, see, e.g., https://
blogs.mathworks.com/cleve/2017/01/23/ulps-plots-reveal-math-function-
accurary.
2.5 Conditioning of a Mathematical Problem
2.5.1 Conditioning of a Mathematical Problem
Consider the mathematical problem P to find output ~z from input ~x with the
relation by ~z and ~x given by the function f :
Problem 2.25: Mathematical Problem P
P: ~z = f(~x)
The concept of “conditioning” of problem P relates to the sensitivity of ~z to
changes in ~x. We perturb ~x by ∆~x and investigate the effect of this perturbation
on ~z:
~z + ∆~z = f(~x+ ∆~x).
2.5. Conditioning of a Mathematical Problem 33
Definition 2.26
Consider mathematical problem P: ~z = f(~x) with perturbed input:
~z + ∆~z = f(~x+ ∆~x).
1. Problem P is called ill-conditioned with respect to absolute errors if the
absolute condition number
κA =
‖∆~z‖
‖∆~x‖ (∆~x 6= 0)
satisfies
κA 1.
P is called well-conditioned otherwise.
2. Problem P is called ill-conditioned with respect to relative errors if the
relative condition number
κR =
‖∆~z‖
‖~z‖
‖∆~x‖
‖~x‖
(∆~x 6= 0, ~z 6= 0, ~x 6= 0)
satisfies
κR 1.
P is called well-conditioned otherwise.
Note: Ill-conditioning is often considered relative to the precision of the com-
puter and number system being used. For example, for double precision numbers,
the unit roundoff µ ≈ 1.1 10−16, indicating that number representation and el-
ementary computations have a relative accuracy of 16 decimal digits. If the
problem is ill-conditioned with κR ≈ 1/µ ≈ 1016, you cannot expect any correct
digits in your computation. If κR ≈
√
1/µ ≈ 108, you can expect about half of
the digits in the computed result to be correct (if you use an algorithm that is
numerically stable, see the next section). If κR ≈ 1, you can expect almost all
digits to be correct when using a stable algorithm.
Note: We did not specify in which norm to evaluate the condition numbers.
Depending on the problem, some norms may be easier to work with than others.
2.5.2 Conditioning of Elementary Operations
Example 2.27: Conditioning of the Sum Operation
We investigate the conditioning of mathematical problem
P: z = x+ y.
We have
z + ∆z = x+ ∆x+ y + ∆y,
leading to
∆z = ∆x+ ∆y.
34 Chapter 2. LU Decomposition for Linear Systems
Using the 1-norm, we find for the absolute condition number
κA =
|∆z|
‖(∆x,∆y)‖1
=
|∆z|
|∆x|+ |∆y|
=
|∆x+ ∆y|
|∆x|+ |∆y|
≤ |∆x|+ |∆y||∆x|+ |∆y| = 1,
so addition is well-conditioned w.r.t. the absolute error: the absolute error in
z is never much larger than the absolute errors in x or y.
However, again using the 1-norm, we find for the relative condition number
κR =
|∆z|
|z|
‖(∆x,∆y)‖1
‖(x,y)‖1
=
|∆x+∆y|
|x+y|
‖(∆x,∆y)‖1
‖(x,y)‖1
=
|x|+ |y|
|x+ y|
|∆x+ ∆y|
|∆x|+ |∆y|
≤ |x|+ |y||x+ y| .
The upper bound for κR shows that the problem is well-conditioned as long
as x+ y 6≈ 0. However, the relative condition number can be arbitrarily large
when x + y ≈ 0, i.e., when one subtracts two numbers of almost equal size,
x ≈ −y. In this case, the relative error in z can be much greater than the
relative error in x and y. When x ≈ −y, addition is ill-conditioned w.r.t.
the relative error. This blow-up of the relative error, and the loss of relative
accuracy that goes along with it, is referred to as catastrophic cancellation.
Example 2.28: An Example of Catastrophic Cancellation
Compute
z = x+ y
with
x = 1.000002, ∆x = 10−6, x+ ∆x = 1.000003,
|∆x|
|x| ≈ 10
−6,
y = −1.000013, ∆y = −2 10−6, y + ∆y = −1.000015, |∆y||y| ≈ 2 10
−6,
where ∆x and ∆y may be due, for example, to floating point rounding on a
computer.
2.5. Conditioning of a Mathematical Problem 35
We have
z = x+ y = −0.000011,
∆z = ∆x+ ∆y = −10−6,
z + ∆z = −0.000012,
so |∆z|
|z| = 0.09,
i.e., we have a 9% relative error in z, whereas the relative error in x and y
was only of the order of 0.0001%. This blow-up in relative error is due to
catastrophic cancellation.
Example 2.29: A Second Example of Catastrophic Cancellation
In the context of perturbations due to rounding in a floating point system, we
can consider the following example of catastrophic cancellation.
Consider floating point system
F (β = 10, t = 5, L = −10, U = 10),
with t = 5 digits in the mantissa and unit roundoff
µ =
1
2
β−t+1 = 0.00005.
We compute
z = x− y
for the following numbers x ≈ y with rounded floating point representation
x = fl(x) and y = fl(y):
x = 1.23456789, x = fl(x) = 1.2346,
y = 1.23111111, y = fl(y) = 1.2311.
The absolute and relative errors in x and y due to rounding are
∆x = fl(x)− x ≈ 3.2 · 10−5, |∆x||x| ≈ 2.6 · 10
−5 ≤ µ,
∆y = fl(y)− y ≈ −1.1 · 10−5, |∆y||y| ≈ 9.0 · 10
−6 ≤ µ.
Computing the difference of x and y, we have for the exact z and floating point
result z:
z = x− y = 0.00345678,
z = fl(fl(x)− fl(y)) = fl(0.0035) = 0.0035,
36 Chapter 2. LU Decomposition for Linear Systems
so
∆z = z − z ≈ 4.3 · 10−5, |∆z||z| ≈ 0.013,
i.e., we obtain a result with a 1% relative error in z, whereas the relative error
in x and y was only of the order of 0.005%. This blow-up of the relative
error is due to catastrophic cancellation. Equivalently, we can see that we only
have two correct digits in z, while we had 5 correct digits in x and y, and the
computer used can represent 5 correct digits. So when computing z, 3 of the
5 digits in relative accuracy were lost due to catastrophic cancellation.
Something to remember . . .
When devising numerical algorithms, avoid steps where two almost equal num-
bers are subtracted, if you can. (This ill-conditioned step in the algorithm may
cause the algorithm to be numerically unstable, due to blow-up of the relative
error, as explained in Section 2.6.)
Example 2.30: Conditioning of the Division Operation
We investigate the conditioning of mathematical problem
P: z =
x
y
(y 6= 0).
We have
z + ∆z =
x+ ∆x
y + ∆y
,
or
∆z = −z + x(1 + ∆x/x)
y(1 + ∆y/y)
,
which leads to
∆z
z
= −1 + 1 + ∆x/x
1 + ∆y/y
=
−1−∆y/y + 1 + ∆x/x
1 + ∆y/y
or
∆z
z
=
∆x
x
− ∆y
y
1 +
∆y
y
. (2.2)
In terms of relative conditioning, Eq. (2.2) shows immediately that ∆z/z
can only be large if ∆x/x or ∆y/y are large, which means that the relative error
does not blow up in a division operation and the problem is well-conditioned.
(Note that the relative condition number
κR =
|∆z|
|z|
‖(∆x,∆y)‖
‖(x,y)‖
2.5. Conditioning of a Mathematical Problem 37
does not easily lead to a useful bound in this case.)
In terms of absolute conditioning, however, we have
κA =
|∆z|
‖(∆x,∆y)‖
=
∣∣∣∣xy
∣∣∣∣ ∣∣∣∣∆xx − ∆yy
∣∣∣∣∣∣∣∣1 + ∆yy
∣∣∣∣ (|∆x|+ |∆y|) .
Assuming that ∆x/x and ∆y/y are small, κA can be arbitrarily large if y
approaches 0. This means that the absolute error may blow up if y ≈ 0 (as
can also be seen directly from Eq. (2.2)). Note that large |x| may also lead
to large κA, but if x is large, |∆x| can often also be expected to be large
proportional to x (in particular, if |∆x| is due to rounding in a floating point
number system), which would make κA small again.
In summary, divison is ill-conditioned with respect to absolute error when
y ≈ 0; in that case the absolute error of the result blows up.
(Note that this can be seen very easily by considering division by a small y
without error: in that case
z + ∆z =
x+ ∆x
y
,
so
∆z =
∆x
y
and ∆z clearly blows up when y ≈ 0.)
Example 2.31: Ill-Conditioning when Dividing by a Small Number
Compute
z =
x
y
with
x = 1, ∆x = 10−3, x+ ∆x = 1.001,
|∆x|
|x| = 10
−3,
y = 10−6, ∆y = 10−12, y + ∆y ≈ 10−6, |∆y||y| = 10
−6.
Then
z = x/y = 106,
z + ∆z = (x+ ∆x)/(y + ∆y) ≈ 106 + 103,
so
∆z ≈ 103 ≈ ∆x
y
,
while
∆x = 10−3,
i.e., the absolute error in x/y is 106 times greater than the absolute error in x.
38 Chapter 2. LU Decomposition for Linear Systems
Something to remember . . .
When devising numerical algorithms, avoid steps where you divide by a number
that is small in absolute value, if you can. (This ill-conditioned step in the
algorithm may cause the algorithm to be numerically unstable, due to blow-up
of the absolute error, see also Section 2.6.)
2.5.3 Conditioning of Solving a Linear System
We investigate the conditioning of solving linear system A~x = ~b for ~x, given A
and ~b.
Example 2.32: Conditioning of A~x = ~b, case ∆A = 0, ∆~b 6= 0
We consider mathematical problem
P: ~x = A−1~b = f(A,~b).
We perturb A and ~b in
(A+ ∆A)(~x+ ∆~x) = ~b+ ∆~b.
For simplicity, we first consider the case that ∆A = 0, ∆~b 6= 0. In this case,
we have
A(~x+ ∆~x) = ~b+ ∆~b
or
A∆~x = ∆~b.
We want to find an upper bound for
κR =
‖∆~x‖
‖~x‖
‖∆~b‖
‖~b‖
. (2.3)
From ∆~x = A−1∆~b we have
‖∆~x‖ = ‖A−1∆~b‖ ≤ ‖A−1‖ ‖∆~b‖, (2.4)
and from A~x = ~b we have
‖~b‖ = ‖A∆~x‖ ≤ ‖A‖ ‖~x‖,
or
1
‖~x‖ ≤
‖A‖
‖~b‖
. (2.5)
Plugging Eqs. (2.4) and (2.5) into Eq. (2.3), we obtain the upper bound
κR ≤
‖A−1‖ ‖∆~b‖‖A‖
‖~b‖
‖∆~b‖
‖~b‖
= ‖A‖ ‖A−1‖. (2.6)
2.5. Conditioning of a Mathematical Problem 39
Definition 2.33: Matrix Condition Number
Let A ∈ Rn×n be a nonsingular matrix. Then
κ(A) = ‖A‖ ‖A−1‖
is called the condition number of A.
Theorem 2.34
Let A ∈ Rn×n be a nonsingular matrix. The following property holds:
κ(A) = ‖A‖ ‖A−1‖ ≥ 1.
Proof. This simply follows from
1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖ ‖A−1‖ = κ(A),
for any vector-induced matrix norm.
We see that the relative condition number of problem P : ~x = A−1~b is
bounded above by the matrix condition number, ‖A‖ ‖A−1‖ (if we assume ∆A =
0). The matrix condition number also appears in a bound for κR for the general
problem when both A and ~b are perturbed:
Example 2.35: Conditioning of A~x = ~b, case ∆A 6= 0, ∆~b 6= 0
We consider mathematical problem
P: ~x = A−1~b = f(A,~b),
perturbing A and ~b as in
(A+ ∆A)(~x+ ∆~x) = ~b+ ∆~b.
It can be shown that
κR =
‖∆~x‖
‖~x‖
‖∆A‖
‖A‖ +
‖∆~b‖
‖~b‖
≤ κ(A) 1
1− τ , (2.7)
if
τ = ‖A−1‖ ‖∆A‖ < 1.
We say that matrix A is ill-conditioned when κ(A) 1, and
well-conditioned otherwise. Linear systems with a well-conditioned matrix
can be solved accurately on computers (because rounding errors in the input
do not disproportionally affect the computed result). Linear systems with ill-
conditioned matrices, however, are prone to inaccurate numerical solutions on
computers.
For the 2-norm matrix condition number we have the following explicit for-
mulas:
40 Chapter 2. LU Decomposition for Linear Systems
Theorem 2.36
Let A ∈ Rn×n be a nonsingular matrix. Then
κ2(A) = ‖A‖2 ‖A−1‖2 =
√
λmax(AAT )√
λmin(AAT )
=
σmax(A)
σmin(A)
.
If A is symmetric, then
κ2(A) =
|λ|max(A)
|λ|min(A) .
2.6 Stability of a Numerical Algorithm
If a mathematical problem is well-conditioned, it should be possible in principle
to obtain its solution accurately on a computer using finite-precision calculations.
For ill-conditioned problems, on the contrary, this is precarious, since rounding
errors in the input data or while the steps of the computation are performed,
may easily lead to large inaccuracies in the computed approximate solution.
But even for problems that are well-conditioned and that are in principle
accurately computable using finite precision, it still depends on our choice of
algorithm whether an accurate result is indeed obtained.
Some algorithms use steps that are by themselves ill-conditioned, causing er-
rors in those steps that may be magnified by error propagation and/or may ac-
cumulate, leading to inaccurate results for an otherwise well-conditioned mathe-
matical problem. When the problem itself is ill-conditioned such ill-conditioned
steps tend to be unavoidable, but when the problem is well-conditioned, it is
often possible to devise alternative algorithms that avoid these ill-conditioned
steps and lead to an accurate result. We call algorithms that obtain accurate
results for well-conditioned problems numerically stable algorithms. On the
contrary, algorithms that lead to unnecessary accuracy loss for well-conditioned
problems, e.g., because they employ avoidable ill-conditioned steps, are called
numerically unstable algorithms.
2.6.1 A Simple Example of a Stable and an Unstable Algorithm
Example 2.37: Stable Algorithm for the Roots of a Quadratic Polynomial
Consider the following mathematical problem:
P : compute the roots of p(x) = x2 − 400x+ 2.
The solution of this problem is given with high accuracy by
x1 ≈ 399.9950
≈ 0.005000063.
We assume the problem is well-conditioned (this can be shown).
We illustrate the stability of two possible algorithms for computing the roots
2.6. Stability of a Numerical Algorithm 41
in the floating point number system F (β = 10, t = 4, L = −10, U = 10), with
unit roundoff
µ =
1
2
β−t+1 = 0.5 · 10−3 = 0.0005.
Algorithm 1: We use the standard formulas for computing the roots of a
quadratic polynomial
ax2 + bx+ c = 0,
i.e.,
x1,2 =
−b±√b2 − 4ac
2a
,
or, in our case, for
x2 + 2fx+ c = 0,
we have
x1,2 = −f ±
√
d,
with
f = −200, c = 2, d = f2 − c.
In our floating point system we have
fl(200) = 200 fl(2002) = 2002 = 40 000 fl(2) = 2,
or, using symbols,
fl(f) = f fl(f2) = f2 fl(c) = c,
so we will not explicitly write the fl(·) operation for f, f2 and c in what follows.
For the discriminant d, we get
fl(f2 − c) = fl(40 000− 2) = 40 000 = fl(f2),
and we note that the contribution of c = 2 is lost in this operation due to
rounding. So we get for the approximate roots x1 and x2
x1 = fl[f + fl(
√
fl(f2 − c))]
= fl[200 + fl(
√
40 000)]
= fl[200 + 200] = 400,
x2 = fl[f − fl(
√
fl(f2 − c))]
= fl[200− fl(
√
40 000)]
= fl[200− 200] = 0,
with relative errors
x1 − x1
x1
≈ 1.25 · 10−5 ≈ µ, x2 − x2
x2
= 1 µ.
We see that the result for x2 is highly inaccurate: we obtain a relative error of
100%. We note that catastrophic cancellation has occurred in computing x2:
all accuracy was lost in computing the difference between two almost equal
numbers in the expression −f −
√
f2 − c. (The contribution of c, which is
essential for the relative accuracy of the solution, was entirely lost.) We say
that Algorithm 1 is numerically unstable, in this case because it clearly
contains an ill-conditioned step in which accuracy is lost.
42 Chapter 2. LU Decomposition for Linear Systems
Algorithm 2: A more stable algorithm can be obtained as follows. We compute
x1, the largest root in absolute value, as above, but we compute x2 using an
alternative formula. Observe that
x1x2 = c,
because p(x) can be factored as
p(x) = ax2 + bx+ c = a(x− x1)(x− x2).
So we compute x2 from
x2 =
c
x1
.
This step is well-conditioned unless x1 is small. So we have
x2 = fl
(
c
x1
)
= fl
(
2
400
)
= fl(0.005) = 0.005,
with relative error
x2 − x2
x2
≈ 1.2 · 10−5 ≈ µ.
Algorithm 2 is numerically stable (it avoids the ill-conditioned step).
2.6.2 Stability of LU Decomposition
It can be shown that the standard LU decomposition algorithm is somewhat un-
stable: the algorithm contains steps in which divisions occur by a small number,
when pivot elements are used that are close to zero in absolute value. This may
lead to large elements in the L and U factors, and may lead to inaccuracies, also
for well-conditioned problems.
For this reason, the following partial pivoting variant of LU decomposition is
often used: in every stage (indexed by k) of Gaussian elimination, determine the
pivot element in position (k, k) as follows. In column k (starting from position
(k, k) and below) one determines the largest element in absolute value, and
switches the row with the largest element with the current row k. As such, one
chooses, in each stage, the pivot element with the largest absolute value. This
extra operation is easy to implement and is computationally inexpensive (it
does not change the asymptotic cost of the algorithm). The resulting algorithm
tempers the growth of elements in L and U and is numerically more stable.
2.6. Stability of a Numerical Algorithm 43
Algorithm 2.38: LU decomposition with partial pivoting, kij version, in-
place
Input: A matrix A ∈ Rn×n
Output: L and U stored in A, and a vector ~p storing the pivoting rows
1: ~p(k) = (1, 2, . . . , n)> . Initialise the permutation vector
2: for k = 1, . . . , n− 1 do
3: Determine µ with k ≤ µ ≤ n such that |A(µ, k)| = ‖A(k : n, k)‖∞
4: if |A(µ, k)| < τ then
5: Break . Stop the loop if near zero pivot found
6: end if
7: Swap elements of the permutation vector ~p(k) and ~p(µ)
8: Swap the rows A(k, :) and A(µ, :)
9: rows = k + 1 : n . Update all the rows below k
10: A(rows, k) = A(rows, k)/A(k, k)
11: A(rows, rows) = A(rows, rows)−A(rows, k)A(k, rows)
12: end for
It can be seen as follows that this type of stability problem cannot occur when
applying Cholesky to SPD matrices A. The Cholesky algorithm decomposes SPD
matrices A as A = L̂L̂T , after which A~x = ~b is solved by forward and backward
substitutions L̂~y = ~b and L̂T~x = ~y.
We consider the matrix 2-norm and recall that, for SPD matrices A,
‖A‖2 =
√
λmax(AAT ) =
√
λmax(A2) = λmax(A).
For the Cholesky factor L̂ we obtain
‖L̂‖2 =
√
λmax(L̂L̂T ) =
√
λmax(A) =
√
‖A‖2,
and, similarly,
‖L̂T ‖2 =
√
‖A‖2.
This indicates that the matrix elements in A = L̂L̂T cannot grow strongly.
Cholesky decomposition is numerically stable (without need for pivoting).
44 Chapter 2. LU Decomposition for Linear Systems
Chapter 3
Least-Squares Problems
and QR Factorisation
3.1 Gram-Schmidt Orthogonalisation and QR
Factorisation
In this chapter, we generally consider real rectangular matrices A with more
rows than columns:
A ∈ Rm×n with m ≥ n.
As we will see in Section 3.3, such matrices arise in overdetermined linear systems
(with more equations than unknowns), which may be solved in the least-squares
(LS) sense.
When solving a LS problem, it will be useful to construct an orthonormal
basis for range(A) = span{~a1, . . . ,~an}, the vector space spanned by the columns
of
A =
 ~a1 . . . ~an
 .
In this section we will consider the Gram-Schmidt algorithm to orthogonalise
the columns of A, which will lead to the so-called QR factorisation of A.
3.1.1 Gram-Schmidt Orthogonalisation
For now, we will assume that A ∈ Rm×n, with m ≥ n, has full rank, i.e., its
columns are linearly independent. We seek to construct an orthonormal basis
for range(A).
We first recall the concept of expansion of a vector in an orthonormal basis.
Example 3.1
Let {~e1, ~e2} be a standard orthonormal basis for R2, i.e., ~eTi ~ej = δij for all i, j.
Then any ~x ∈ R2 can be expanded in the basis as
~x = (~eT1 ~x)~e1 + (~e
T
2 ~x)~e2.
In the Gram-Schmidt procedure, we begin by constructing an orthogonal set
of vectors {~v1, . . . , ~vn} that spans range(A) = span{~a1, . . . ,~an}, by taking the
45
46 Chapter 3. Least-Squares Problems and QR Factorisation
vectors ~ai and subtracting their components in the directions of the previous ~vj .
For example, for the case where A has 3 columns (n = 3):
~v1 = ~a1
~v2 = ~a2 − ~v
T
1 ~a2
‖~v1‖2 ~v1
~v3 = ~a3 − ~v
T
1 ~a3
‖~v1‖2 ~v1 −
~vT2 ~a3
‖~v2‖2 ~v2.
In this chapter, all vector norms denote 2-norms. We then obtain the set of
orthonormal vectors {~q1, . . . , ~qn} such that span{~q1, . . . , ~qn} = span{~a1, . . . ,~an},
by normalising the vectors ~vi to unit length:
~qi =
~vi
‖~vi‖ .
For the n = 3 case, this results in
~q1 ‖~v1‖ = ~v1 = ~a1
~q2 ‖~v2‖ = ~v2 = ~a2 − (~qT1 ~a2) ~q1
~q3 ‖~v3‖ = ~v3 = ~a3 − (~qT1 ~a3) ~q1 − (~qT2 ~a3) ~q2. (3.1)
We rewrite this as
~q1 r11 = ~a1
~q2 r22 = ~a2 − r12 ~q1
~q3 r33 = ~a3 − r13 ~q1 − r23 ~q2,
which leads to the factorisation of matrix A as
A =
 ~a1 ~a2 ~a3
 =
 ~q1 ~q2 ~q3
 r11 r12 r130 r22 r23
0 0 r33
 = Q̂R̂.
This factorisation A = Q̂R̂ of A is known as the reduced QR factorisation, see
below.
This leads to the following algorithm for Gram-Schmidt orthogonalisation:
Algorithm 3.2: Gram-Schmidt Orthogonalisation
Input: matrix A ∈ Rm×n
Output: the factor matrices Q̂ and R̂ in the thin QR factorisation A = Q̂R̂
Q̂ = 0
R̂ = 0
for j=1:n do
~vj = ~aj
for i=1:j-1 do
r̂ij = ~̂q
T
i ~aj
~vj = ~vj − r̂ij ~̂qi
end for
r̂jj = ‖~vj‖
~̂qj = ~vj/r̂jj
end for
3.1. Gram-Schmidt Orthogonalisation and QR Factorisation 47
3.1.2 QR Factorisation
We now consider the general case of A ∈ Rm×n, with m ≥ n, but where the
columns of A are not necessarily linearly independent.
Definition 3.3
Let A ∈ Rm×n. The reduced QR factorisation of A is given by
A = Q̂R̂, (3.2)
where Q̂ ∈ Rm×n has orthonormal columns:
Q̂T Q̂ = In,
and R̂ ∈ Rn×n is upper triangular.
The n columns of Q̂ form an orthonormal basis for an n-dimensional subspace
of Rm. It is possible to expand this to a basis of the entire Rm by expanding Q̂ on
the right with m−n additional columns that contain m−n further orthonormal
vectors in Rm, leading to the (full) QR factorisation:
Definition 3.4
Let A ∈ Rm×n. The (full) QR factorisation of A is given by
A = QR = Q
[
R̂
0
]
, (3.3)
where Q ∈ Rm×m is an orthogonal matrix:
QTQ = Im = QQ
T ,
and R̂ ∈ Rn×n is upper triangular.
Theorem 3.5
Every A ∈ Rm×n has a full QR factorisation A = QR, and hence also a
reduced QR factorisation A = Q̂R̂.
This can be shown, for the reduced QR factorisation, using the observation
that in the Gram-Schmidt algorithm, if a zero ~vj is obtained and ~qj cannot
be computed, one can instead choose any vector ~qj that is orthonormal with
respect to the previous ~qi (for example, by repeating the orthogonalisation step
for determining ~vj starting from a random vector ~aj ∈ Rm, instead of the original
jth column ~aj of A). For the full QR factorisation, the additional orthogonal
columns of Q can be determined in a similar manner.
3.1.3 Modified Gram-Schmidt Orthogonalisation
It can be observed in example computations, and shown theoretically, that the
Gram-Schmidt algorithm is numerically unstable. If the orthonormal basis is
computed as in Eq. (3.1), the resulting vectors ~qi may suffer from loss of orthog-
onality due to rounding errors.
The stability can be improved substantially by the following small modifi-
cation to the algorithm. For example, for ~v3 in Eq. (3.1), one subtracts the
48 Chapter 3. Least-Squares Problems and QR Factorisation
component in the direction of ~q2 by projecting the original column vector ~a3
onto ~q2. Even though in exact arithmetic ~q1 is orthogonal to ~q2 and the com-
ponent of ~a3 in the direction of ~q2 is equal to the component of ~a3 − (~qT1 ~a3) ~q1
in the direction of ~q2, it turns out that ~a3 − (~qT1 ~a3) ~q1 may have a slightly dif-
ferent component in the direction of ~q2 due to rounding, and it is better for
stability to subtract the component of ~a3 − (~qT1 ~a3) ~q1 in the direction of ~q2. In
a similar manner we repeatedly subtract, as terms are added to determine each
~vj , the components in direction ~qi of the intermediate result for ~vj , instead of
the components of ~aj in direction ~qi. This results in the following modified
Gram-Schmidt algorithm:
Algorithm 3.6: Modified Gram-Schmidt Orthogonalisation
Input: matrix A ∈ Rm×n
Output: the factor matrices Q̂ and R̂ in the thin QR factorisation A = Q̂R̂
Q̂ = 0
R̂ = 0
for j=1:n do
~vj = ~aj
for i=1:j-1 do
r̂ij = ~̂q
T
i ~vj
~vj = ~vj − r̂ij ~̂qi
end for
r̂jj = ‖~vj‖
~̂qj = ~vj/r̂jj
end for
It can be shown that this modified version is substantially more stable than
the original Gram-Schmidt procedure, but for ill-conditioned problems loss of
orthogonality can still occur and a more stable approach is desired. In the next
section we consider a procedure using orthogonal transformations of Householder
reflection type that will accomplish this goal.
3.2 QR Factorisation using Householder Transformations
Since the Gram-Schmidt orthogonalisation and its modified version are deficient
in terms of their numerical stability problems, we seek a more stable approach
to compute the QR decomposition.
It turns out that an approach based on applying orthogonal transformations
to A results in a method with more favourable stability properties. One reason
why such methods have good stability properties is that multiplying A with an
orthogonal matrix Q preserves the Euclidean length of the columns of A:
Theorem 3.7
Orthogonal matrices preserve Euclidean length.
3.2. QR Factorisation using Householder Transformations 49
Proof. Let Q ∈ Rn×n with QTQ = I. Suppose ~y = Q~x. Then
‖~y‖ = ‖Q~x‖
=
√
(Q~x)TQ~x
=
√
~xTQTQ~x
=
√
~xT~x
= ‖~x‖,
where, as in the rest of this chapter, ‖ · ‖ indicates the vector 2-norm.
This means that the matrix element sizes of QA cannot be much larger than
those of A.
A useful further property of orthogonal matrices is the following:
Theorem 3.8
The product of orthogonal matrices is orthogonal.
Proof. Let Q = Q1Q2, where Q1, Q2 ∈ Rn×n are orthogonal. Then
QTQ = (Q1Q2)
TQ1Q2 = Q
T
2 Q
T
1 Q1Q2 = I
3.2.1 Householder Reflections
We want to transform A ∈ Rm×n (with m ≥ n) into an upper-triangular matrix
R ∈ Rm×n by applying orthogonal transformations to A.
Our approach will be to multiply A by a sequence of orthogonal transfor-
mation matrices Qj that create zeros in column j below the element in position
(j, j). This aim is similar to LU decomposition, but we know that each or-
thogonal transformation preserves the Euclidean length of the matrix columns
it operates on.
Let’s consider the first orthogonal transformation, Q1 ∈ Rm×m, which is
applied to
A =
 ~a1 . . . ~an
 .
such that Q1A has zeros in its first column below the first element. Since the
length of column ~a1 is preserved, we know that the first element in the trans-
formed column has to be ±‖~a1‖:
Q1A =

±‖~a1‖ ~rT1
0 A˜2
 .
50 Chapter 3. Least-Squares Problems and QR Factorisation
We choose for now a transformation Q1 that results in a transformed first column
with a negative value as its first element:
Q1~a1 =

−‖~a1‖
0
...
0
 .
The specific type of transformation we choose for Q1 (and all subsequent Qjs) is
a reflection in Rm about a hyperplane that is orthogonal to the line from ~a1 to
Q1~a1 and intersects the line in the middle between ~a1 and Q1~a1. This reflection
operation is called a Householder reflection. Let ~v1 be the vector pointing from
Q1~a1 to ~a1:
~v1 = ~a1 −Q1~a1,
and let ~u1 be the unit vector in that direction:
~u1 =
~v1
‖~v1‖ .
The vector ~u1 is called a Householder vector. The operation of the Householder
reflection Q1 onto a vector ~x ∈ Rm can then be expressed as
Q1~x = ~x− 2(~uT1 ~x)~u1,
and, since (~uT1 ~x)~u1 = (~u1~u
T
1 )~x, the matrix form of the Householder orthogonal
transformation is given by
Q1 = Im − 2~u1~uT1 .
Theorem 3.9
Let ~u ∈ Rm with ‖~u‖ = 1. Then the Householder reflection matrix
Q~u = Im − 2~u~uT
is a symmetric and orthogonal matrix.
Proof. Clearly, QT~u = Q~u.Then
QT~uQ~u = Q
2
~u
= (Im − 2~u~uT )(Im − 2~u~uT )
= Im − 4~u~uT + 4~u(~uT~u)~uT
= Im.
Finally, we note that the sign in
Q1~a1 =

±‖~a1‖
0
...
0

3.2. QR Factorisation using Householder Transformations 51
is chosen in practical implementations based on numerical stability concerns.
For numerical stability reasons, we choose the sign of ±‖~a1‖ opposite to the
sign of the first component of the original column ~a1 of A, (~a1)1: this avoids
catastrophic cancellation in computing ~v1 = ~a1−Q1~a1 that may otherwise arise
when |(~a1)1| ≈ ‖~a1‖. In other words, we choose the sign of ±‖~a1‖ such that the
size of ~v1 is as large as possible.
3.2.2 Using Householder Reflections to Compute the QR
Factorisation
We now use a sequence of n Householder reflections to compute the QR decom-
position of A ∈ Rm×n. The first transformation creates the desired zeros in the
first column of A:
Q1A =

r11 ~r
T
1
0 A˜2
 ∈ Rm×n,
and is followed by a second orthogonal transformation of Householder type that
creates zeros in the first column of A˜2 ∈ R(m−1)×(n−1):
Q˜2A˜2 =

r22 ~r
T
2
0 A˜3
 ∈ R(m−1)×(n−1).
Defining
Q2 =

1 0
0 Q˜2
 ∈ Rm×m,
these steps can be combined as
Q2Q1A =

r11 ~r
T
1
0 Q˜2A˜2

=
 r11 ~rT1
0
r22 ~r
T
2
0 A˜3
 ∈ Rm×n,
and so on, with, in the next step,
Q˜3A˜3 =

r33 ~r
T
3
0 A˜4
 ∈ R(m−2)×(n−2),
and
Q3 =

I2 0
0 Q˜3
 ∈ Rm×m,
52 Chapter 3. Least-Squares Problems and QR Factorisation
etc.
After n transformations this results in
QnQn−1 . . . Q2Q1A =
[
R̂
0
]
,
where R̂ ∈ Rn×n is upper triangular, and
QT = QnQn−1 . . . Q2Q1
is an orthogonal matrix. Finally, the QR factorisation of A results as
A = Q1Q2 . . . Qn−1Qn
[
R̂
0
]
= QR.
3.2.3 Computing Q
In many cases, forming the m×m matrix Q is not needed explicitly. For example,
if only matrix-vector products Q~x are required, one can save the Householder
vectors ~ui (i = 1, . . . , n) and evaluate Q~x as
Q~x = Q1Q2 . . . Qn−1Qn ~x.
If Q is desired explicitly, there are several options for constructing it:
• The transpose of Q can be formed as the loop over the columns of A
progresses:
QT = QnQn−1 . . . Q2Q1Im,
starting with the Q1 multiplication, and then Q2, etc., and Q can be ob-
tained by taking the transpose at the end. This is the approach used in the
pseudocode for the QR decomposition by Householder reflections below.
However, this approach is more costly than necessary because Q1 is typi-
cally dense and does not have leading rows that are zero below the diagonal.
Therefore, all subsequent Householder reflections with Q˜2, . . . , Q˜n−1 need
to be carried out on all n columns of the relevant rows of the intermediate
result (in forming R, in contrast, the transformations do not need to be
carried out on the leading zero columns). The reverse order used in the
next option avoids these extra flops.
• One can store the Householder vectors ~ui (i = 1, . . . , n) and form Q at the
end as
Q = Q1Q2 . . . Qn−1QnIm,
starting with the Qn multiplication, and then Qn−1, etc. This is more
efficient since the Q˜k don’t need to be applied to the leading columns of
the intermediate results that are zero below the diagonal.
This is pseudocode for computing the QR decomposition by Householder
reflections:
3.2. QR Factorisation using Householder Transformations 53
Algorithm 3.10: QR Factorisation using Householder Transformations
Input: matrix A ∈ Rm×n
Output: the factor matrices Q and R in the (full) QR factorisation A = QR
1: R = A
2: Qt = Im . Qt will be the transpose of Q
3: for k=1:n do
. first determine the Householder vector ~uk
4: ~x = R(k : m, k)
5: ~y = zeros(m− k + 1, 1)
6: if x1 < 0 then
7: y1 = ‖~x‖
8: else
9: y1 = −‖~x‖
10: end if
11: ~v = ~x− ~y
12: ~uk = ~v/‖~v‖
. apply the Householder transformation to the relevant part of R
13: R(k : m, k : n) = R(k : m, k : n)− 2~uk (~uTkR(k : m, k : n))
. finally, update Qt (note: we need *all* columns here!)
14: Qt(k : m, 1 : m) = Qt(k : m, 1 : m)− 2~uk (~uTkQt(k : m, 1 : m))
15: end for
16: Q = QTt
The following more compact version of the pseudocode computes QR Fac-
torisation using Householder Transformations without forming Q and perform
operations in-place.
Algorithm 3.11: QR Factorisation using Householder Transformations
(without forming Q)
Input: A matrix A ∈ Rm×n, m > n
Output: The factor matrix R ∈ Rn×n and a sequence of vectors ~uk, k =
1, . . . , n that defines the sequence of unitary similarity transformations.
1: for k = 1, . . . , n do
2: ~b = A(k:m, k)
3: ~v = ~b+ sign(b1) ‖~b‖~e1
4: ~uk = ~v/‖~v‖
5: A(k, k) = −sign(b1)‖~b‖
6: A(k+1:m, k) = 0
7: A(k:m, k+1:n) = A(k:m, k+1:n)− (2~uk)
(
~u>k A(k:m, k+1:n)
)
8: end for
3.2.4 Computational Work
When implementing the Householder algorithm, it is essential to implement the
reflection by first computing the vector ~zTk = ~u
T
kR(k : m, k : n) in
R(k : m, k : n) = R(k : m, k : n)− 2~uk (~uTkR(k : m, k : n)),
54 Chapter 3. Least-Squares Problems and QR Factorisation
rather than first constructing the rank-1 matrix ~uk ~u
T
k and multiplying it with
R(k : m, k : n), which is much more expensive.
When implemented in this order, it can be shown that the dominant terms
in the computational work are given by
W ≈ 2mn2 − 2
3
n3 flops.
Notes:
• For the case of square matrices, m = n, we have
W ≈ 2n3 − 2
3
n3 =
4
3
n3 flops,
which is twice the work of LU decomposition.
• The QR decomposition can be used to solve linear systems as follows:
1. Compute Q and R in A = QR, e.g., using Householder transforma-
tions.
2. The system A~x = ~b can be solved by backward substitution as can be
seen from the following equivalences:
A~x = ~b
QR~x = ~b
QTQR~x = QT~b
R~x = QT~b.
Solving the system in this way is more stable than using the LU
decomposition, but comes at twice the cost.
• The QR decomposition using Householder transformations is really useful
for solving least-squares problems in a numerically stable way, as will be
explained in the next sections.
3.3 Overdetermined Systems and Least-Squares Problems
Let A ∈ Rm×n, where m > n. Such overdetermined linear systems, where there
are more equations (m) than unknowns (n), are common in applications.
Example 3.12
Consider the linear regression problem of finding the “best” linear relation
y(t) = c t + d between m observations {(ti, yi)} for i = 1, . . . ,m. We aim to
solve the following linear system
t1 1
t2 1
...
...
tm 1

(
c
d
)
=

y1
y2
...
ym
 . (3.4)
However, the above linear system may be overdetermined and does not have a
solution.
3.3. Overdetermined Systems and Least-Squares Problems 55
Exact solutions do not generally exist for overdetermined systems
A~x = ~b, A ∈ Rm×n, m > n.
Instead, one can seek the “optimal” ~x that minimizes the residual vector
~r = ~b−A~x ∈ Rm
in some norm. A popular choice for the norm, which can be justified, e.g., in
statistical applications, is the 2-norm. This leads to the following definition of
an overdetermined linear least-squares (LS) problem:
Definition 3.13: Least-Squares Problem
Let A ∈ Rm×n with m > n. Find ~x that minimizes f(~x) = ‖~b−A~x‖22.
Note that
f(~x) = ‖~b−A~x‖22 = ‖~r‖22 =
m∑
k=1
r2k,
which explains that the solution is indeed sought that provides the least sum of
squares of the residual components.
3.3.1 The Normal Equations – A Geometric View
Let A ∈ Rm×n with m > n. The columns of A span a subspace of Rm. The
solution of the LS problem is the vector ~x ∈ Rn such that the vector A~x ∈
range(A) is the best approximation of~b in range(A), in the sense that ~xminimises
the residual, ~r = ~b−A~x. The residual ~r = ~b−A~x is minimal if it is orthogonal to
range(A) (or, equivalently, if A~x is the orthogonal projection of~b onto range(A)).
If we specify this geometric condition, we find a linear system of equations that
specifies the solution of the LS problem:
~r ⊥ A~z ∀~z ⇐⇒ (A~z)T~r = 0 ∀~z
⇐⇒ ~zTAT (~b−A~x) = 0 ∀~z
⇐⇒ AT~b−ATA~x = 0
⇐⇒ ATA~x = AT~b.
where ATA ∈ Rn×n. The equations
ATA~x = AT~b
are called the normal equations, the first way to compute the LS solution. One
problem in this approach is that ATA can be ill-conditioned, more so than A
(see below).
3.3.2 The Normal Equations
The following theorem shows that linear least-squares problems can be solved
by finding the solution of a square linear system with matrix ATA.
56 Chapter 3. Least-Squares Problems and QR Factorisation
Theorem 3.14
Let A ∈ Rm×n with m > n.
1. Any minimiser of f(~x) = ‖~b−A~x‖22 satisfies
ATA~x = AT~b.
2. Any solution of the normal equations is a minimiser of f(~x).
3. If A has linearly independent columns, then ATA~x = AT~b (and the least-
squares problem) has a unique solution.
Proof.
1. Consider
f(~x) =
m∑
k=1
r2k =
m∑
k=1
(bk − (
n∑
j=1
akjxj))
2.
If ~x is a minimiser of f(~x), then ~x satisfies the optimality equations
∂f
∂xi
= 0 (i = 1, . . . , n).
This gives
∂f
∂xi
=
m∑
k=1
2aki(bk − (
n∑
j=1
akjxj)) = 0 (i = 1, . . . , n),
or
m∑
k=1
n∑
j=1
akiakjxj =
m∑
k=1
akibk (i = 1, . . . , n).
It is easy to see that this corresponds to ATA~x = AT~b. (Check!) Note
that solutions of this equation could also be maximisers of f(~x), which
we exclude in the next part.
2. Let ~x satisfy ATA~x = AT~b, and ~r = ~b−A~x. Then f(~x+ ~u) ≥ f(~x) ∀~u ∈
Rn×n, since
f(~x+ ~u) = (~b−A(~x+ ~u))T (~b−A(~x+ ~u))
= (~r −A~u)T (~r −A~u)
= ~rT~r − ~rTA~u− ~uTAT~r + ~uTATA~u
= ~rT~r − 2~uTAT~r + ~uTATA~u
= f(~x) + ‖A~u‖22.
3. If A has linearly independent columns, then A~x 6= 0 for all ~x 6= 0.
Therefore, ~xTATA~x = ‖A~x‖22 > 0 for all ~x 6= 0 and ATA is SPD. This
implies that ATA is nonsingular, so ATA~x = AT~b has a unique solution.
3.4. Solving Least-Squares Problems using QR Factorisation 57
Note: If A has linearly dependent columns, then ATA is singular and the
normal equations have infinitely many solutions.
3.3.3 Computational Work for Forming and Solving the Normal
Equations
If A ∈ Rm×n with m > n has linearly independent columns, then the LS solution
of A~x = ~b can be computed efficiently using Cholesky decomposition applied to
the normal equations, since ATA ∈ Rn×n is SPD. The dominant terms in the
computational work, including the cost of forming ATA, are W ≈ n3/3+n2(2m−
1) flops, where the cost of forming ATA dominates strongly for m n.
3.3.4 Numerical Stability of Using the Normal Equations
Regarding conditioning, we have for general non-symmetric square A ∈ Rn×n
κ2(A) =
σmax(A)
σmin(A)
=
√
λmax(ATA)√
λmin(ATA)
.
This can be extended to rectangular matrices A ∈ Rm×n with m > n and linearly
independent columns, using the same expressions for κ2 (since λi(A
TA) > 0 for
all i = 1, . . . , n in this case).
The condition number of the matrix ATA arising in the normal equations is
given by
κ2(A
TA) =
σmax(A
TA)
σmin(ATA)
=
√
λmax((ATA)TATA)√
λmin((ATA)TATA)
.
Since
σmax(A
TA) =
√
λmax((ATA)TATA) =
√
λmax((ATA)2) = λmax(A
TA),
σmin(A
TA) =
√
λmin((ATA)TATA) =
√
λmin((ATA)2) = λmin(A
TA),
we obtain
κ2(A
TA) = κ22(A).
This indicates that solving the normal equations squares the condition number
of the original matrix A, and may thus be ill-conditioned. In the next section we
will see how the QR decomposition of A, e.g. using Householder transformations,
can be used to solve the LS problem. This avoids the squaring of the condition
number of A, and is more numerically stable than solving the normal equations.
3.4 Solving Least-Squares Problems using QR
Factorisation
We consider overdetermined system
A~x = ~b with A ∈ Rm×n (m ≥ n),~b ∈ Rm, ~x ∈ Rn.
We seek to solve the system in the least-squares sense, i.e., we minimize
‖~r‖2 = ‖~b−A~x‖2. (3.5)
58 Chapter 3. Least-Squares Problems and QR Factorisation
We use the (full) QR decomposition of A, with A = Q
[
R̂
0
]
, where Q =
[
Q̂|Q¯
]
∈
Rm×m and Q̂ ∈ Rm×n and R̂ ∈ Rn×n. The factors Q̂ and R̂ can be obtained
using the Householder algorithm.
Then we observe that
‖~r‖22 = ‖QT~r‖22
= ‖QT (~b−A~x)‖22
=
∥∥∥∥QT (~b−Q [R̂0
]
~x
)∥∥∥∥2
2
=
∥∥∥∥∥
[
Q̂T~b
Q¯T~b
]
−
[
R̂~x
0
]∥∥∥∥∥
2
2
= ‖Q̂T~b− R̂~x‖22 + ‖Q¯T~b‖22︸︷︷︸
indep. of ~x
.
Thus ‖~r‖22 is minimal when Q̂T~b− R̂~x = 0 or
R̂~x = Q̂T~b. (3.6)
We solve this n×n system by backward substitution to find the optimal ~x. This
is numerically more stable than solving the normal equations.
3.4.1 Geometric Interpretation in Terms of Projection Matrices
Equation (3.6) can be interpreted geometrically as follows. We know that the
vector ~x minimising Eq. (3.5) satisfies
A~x = the orthogonal projection of ~b onto range(A).
The columns of
Q̂ =
 ~q1 . . . ~qn

form an orthogonal basis of range(A). The product
Q̂T~b =
 ~qT1 ~b. . .
~qTn
~b

contains the projection coefficients of ~b onto the basis vectors ~qi. Then
Q̂Q̂T~b = (~qT1
~b)~q1 + . . .+ (~q
T
n
~b)~qn
= (~q1~q
T
1 )
~b+ . . .+ (~qn~q
T
n )
~b
is the orthogonal projection of ~b onto range(A). So we conclude that the LS
solution ~x satisfies
A~x = Q̂Q̂T~b
Q̂R̂~x = Q̂Q̂T~b
Q̂T Q̂R̂~x = Q̂T Q̂Q̂T~b
3.5. Alternating Least-Squares Algorithm for Movie Recommendation 59
or
R̂~x = Q̂T~b
since Q̂T Q̂ = Im. This is a geometric way to derive result (3.6).
Note that the matrix
P = Q̂Q̂T ∈ Rm×m
is an orthogonal projection matrix, since it satisfies P 2 = P and PT = P .
The matrix-vector product Q̂Q̂T~z projects any vector ~z ∈ Rm orthogonally onto
range(A). The orthogonal projector
Q̂T Q̂ = ~q1~q
T
1 + . . .+ ~qn~q
T
n
is composed of the sum of n rank-one orthogonal projection matrices
Pi = ~qi~q
T
i ∈ Rm×m,
with each Pi satisfying P
2
i = Pi and P
T
i = Pi.
3.5 Alternating Least-Squares Algorithm for Movie
Recommendation
Continuing the discussion on algorithms for movie recommendation from Section
1.3, we now proceed with formulating a least-squares-based optimisation algo-
rithm to compute matrices U ∈ Rf×m and V ∈ Rf×n, with f a small integer
m,n, such that UTM approximates the ratings matrix R on the set of known
ratings, R: 
R

≈

UT

 M
 . (3.7)
In particular, we seek U and M that minimise
g(U,M) = ‖R− UTM‖2F,R + λN(U,M), (3.8)
where the ‖·‖F,R norm is a partial Frobenius norm, summed only over the known
entries of R, as given by the index set R, and N(U,M) is a regularisation term.
We will now explain the details of the Alternating Least Squares (ALS) algo-
rithm for solving minimisation problem (3.8). The algorithm determines U and
M in an alternating fashion: starting from an initial guess for U , determine the
optimal M with U fixed, then determine the optimal U with M fixed, and so
forth. Each subproblem of determining M with fixed U (and vice versa) in this
alternating algorithm boils down to a (regularized) linear least-squares problem,
as we will now describe.
60 Chapter 3. Least-Squares Problems and QR Factorisation
3.5.1 Least-Squares Subproblems for Movie Recommendation
For each user i, let Ji = {j1, j2, j3, . . .} be the set of the indices j of the movies
ranked by user i, and for each movie j, let Ij = {i1, i2, i3, . . .} be the set of the
indices i of the users who have ranked movie j. Let |Ji| be the number of movies
ranked by user i, and let |Ij | be the number of users who have ranked movie j.
Then the function (3.8) we want to minimise is given specifically by
min
U,M
g(U,M) =
∑
(i,j)∈R
(
rij − ~uTi ~mj
)2
+ λ
 m∑
i=1
|Ji| ‖~ui‖22 +
n∑
j=1
|Ij | ‖~mj‖22
 , (3.9)
with λ a fixed regularisation parameter.
We first rewrite the first part of the objective function g(U,M) as a sum over
all movies:
min
U,M
g(U,M) =
n∑
j=1
‖~rj − UT ~mj‖22,Ij
+ λ
 m∑
i=1
|Ji| ‖~ui‖22 +
n∑
j=1
|Ij | ‖~mj‖22
 , (3.10)
where ‖ · ‖2,Ij is a partial 2-norm, summed only over the vector entries that
correspond to users who have ranked movie j, as given by the index set Ij . That
is,
‖~rj − UT ~mj‖22,Ij = ‖~rIj − UTIj ~mj‖22,
where ~rIj is the vector containing all the known ratings for movie j (the elements
of column ~rj of R that contain ratings, by the users in the index set Ij), and UTIj
is a submatrix of the user matrix UT that contains only the rows of the users
that have ranked movie j. We rewrite Eq. (3.10) as
min
U,M
g(U,M) =
n∑
j=1
‖~rIj − UTIj ~mj‖22 + λ
 m∑
i=1
|Ji| ‖~ui‖22 +
n∑
j=1
|Ij | ‖~mj‖22
 .
(3.11)
In the first half of an ALS iteration, we fix U , and find the optimal M given
that fixed U . To this end, we set the gradient of g(U,M) with respect to the
elements of M equal to zero. It is convenient to express this for each of the
columns ~mj of M :
∇~mjg(U,M) = ∇~mj‖~rIj − UTIj ~mj‖22 + λ|Ij | ∇~mj‖~mj‖22 = 0 (j = 1, . . . , n).
(3.12)
These are n independent (regularised) linear least-squares problems for the n
columns ~mj of movie matrix M (with fixed user matrix U).
To compute the gradients in these expressions, the proof of Theorem 3.14
shows that
∇~x‖~b−A~x‖22 = −2AT (~b−A~x)
= 2(ATA~x−AT~b),
3.5. Alternating Least-Squares Algorithm for Movie Recommendation 61
and we also have (e.g., as a special case of the above) that
∇~x‖~x‖22 = 2~x.
Applying these to Eq. (3.12) gives the n (regularised) normal equation conditions
2(UIjU
T
Ij ~mj − UIj~rIj ) + 2λ|Ij |~mj = 0,
(UIjU
T
Ij + λ|Ij |I)~mj = UIj~rIj (j = 1, . . . , n). (3.13)
Solving these small f × f linear systems for the columns ~mj of M (which can be
done in parallel) updates M in the first half of an ALS iteration.
The second half of the ALS iteration fixes M and updates U in a manner
completely analogous to the first half of the iteration. Specifically, we define the
transpose, Q, of the ratings matrix R,
Q = RT
and write
Q ≈MTU
i.e.,  Q
 ≈
 MT

 U
 . (3.14)
We rewrite Eq. (3.9) as
min
U,M
g(U,M) =
m∑
i=1
‖~qJi −MTJi~ui‖22 + λ
 m∑
i=1
|Ji| ‖~ui‖22 +
n∑
j=1
|Ij | ‖~mj‖22
 ,
(3.15)
where ~qJi is the vector containing all the known ratings given by user i (the
elements of column ~qi of Q that contain ratings, for the movies in the index set
Ji), and MTJi is a submatrix of the movie matrix MT that contains only the
rows of the movies that are ranked by user i.
Setting the gradient with respect to the elements of U equal to zero, column-
by-column, gives
∇~uig(U,M) = ∇~ui‖~qJi −MTJi~ui‖22 + λ|Ji| ∇~ui‖~ui‖22 = 0 (i = 1, . . . ,m).
(3.16)
This gives the m (regularised) normal equations
(MJiM
T
Ji + λ|Ji|I)~ui = MJi~qJi (i = 1, . . . ,m). (3.17)
Solving these small f × f linear systems for the columns ~ui of U (which, again,
can be done in parallel) updates U in the second half of an ALS iteration.
62 Chapter 3. Least-Squares Problems and QR Factorisation
Chapter 4
The Conjugate Gradient
Method for Sparse SPD
Systems
In this Chapter we will consider iterative methods for solving linear systems
A~x = ~b,
for the specific case where A ∈ Rn×n is a symmetric positive definite (SPD)
matrix.
When using direct solvers for linear systems A~x = ~b such as Gaussian elimina-
tion / LU decomposition and Cholesky decomposition, the algorithm is executed
until completion at which time one obtains the exact solution (in exact arith-
metic), and the algorithm does not generate approximate solutions along the
way.
In contrast, iterative methods start from an initial guess ~x0 for the solution
that one seeks to improve in a sequence of approximations
~x0, ~x1, ~x2, ~x3, . . .
until some convergence criterion is attained that typically prescribes a desired
accuracy in the approximation.
Iterative methods can be advantageous in terms of computational cost, in
particular for large-scale problems that involve highly sparse matrices. For ex-
ample, the matrix A ∈ Rn×n in our 2D model problem has about 5 nonzeros per
row. The cost for a matrix-vector product is therefore O(n) flops (in particular,
about 9n flops). The cost per iteration of iterative solvers is often proportional
to the cost of a matrix-vector product. So if an iterative solver can solve A~x = ~b
up to a desired accuracy in a number of iterations that does not grow strongly
with n, then it can often beat direct solvers. For example, for the 2D model
problem, iterative solvers exist, with O(n) cost per iteration, that converge to
the accuracy with which the PDE was discretised in a number of iterations that
does not grow with problem size. Those iterative solvers can obtain an accurate
answer in O(n) work, which, for large problems, is much faster than the O(n3)
cost of LU decomposition, or the O(n2) cost of banded LU decomposition. In
this chapter we will start explore such iterative methods for solving A~x = ~b, for
the particular case that A is SPD, which arises in many applications.
63
64 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
4.1 An Optimisation Problem Equivalent to SPD Linear
Systems
Theorem 4.1
Let A ∈ Rn×n be an SPD matrix. Then
φ(~x) =
1
2
~xTA~x−~bT~x+ c (4.1)
with c an arbitrary constant, has a unique global minimum, ~x∗, which is the
unique solution of A~x = ~b.
Proof. Since A is SPD, it is nonsingular and A~x = ~b has a unique solution,
which we call ~x∗.
Given an approximation ~x of ~x∗, we define the error ~e by
~e = ~x∗ − ~x.
Considering the A-norm of the error, ~e = ~x∗ − ~x, we find
‖~x∗ − ~x‖A = ‖~e‖A
= ~eTA~e
= (~x∗ − ~x)TA(~x∗ − ~x)
= ~x∗TA~x∗ − ~x∗TA~x− ~xTA~x∗ + ~xTA~x
= ~x∗TA~x∗ − 2~xTA~x∗ + ~xTA~x sinceA = AT
= ~x∗T~b− 2~xT~b+ ~xTA~x
= ~x∗T~b+ 2φ(~x).
Since taking ~x = ~x∗ uniquely minimises the LHS of this equality, ~x∗ is also the
unique minimiser of the RHS, and hence of φ(~x), because ~x∗T~b is independent
of ~x. This shows that φ(~x) has a unique global minimiser, which is the solution
of A~x = ~b.
4.2 The Steepest Descent Method
The first iterative method we consider here for solving A~x = ~b is based on a
basic optimisation method for solving the optimisation problem
min
~x
φ(~x).
Recall that the gradient of φ(~x), ∇φ(~x), points in the direction of steepest
ascent of φ(~x), and is orthogonal to the level surfaces of φ(~x). The direction of
steepest descent is given by −∇φ(~x). In the case of φ(~x) corresponding to the
4.2. The Steepest Descent Method 65
SPD linear system A~x = ~b (Eq. (4.1)), we find
−∇φ(~x) = −(A~x−~b)
= ~r,
where the residual ~r is define as
~r = ~b−A~x.
Here, we have used that
∇(~xTA~x) = A~x+AT~x,
or
∇(~xTA~x) = 2A~x
when A is symmetric.
The steepest descent optimisation method proceeds as follows. Suppose we
are given an initial approximation ~x0. We seek a new, improved approxima-
tion ~x1 by considering φ(~x) along a line in the direction of steepest descent,
−∇φ(~x0) = ~r0, where we define, for approximation ~xi,
~ri = ~b−A~xi.
That is, we determine the next approximation ~x1 of the form
~x1 = ~x0 + α1~r0,
where ~r0 = −∇φ(~x0) is called the search direction.
Considering ~x1(α1) as a function of α1, we determine the optimal step length
α1 from ~x0 along the search direction from the condition
d
dα1
φ(~x1(α1)) = 0,
which leads to
0 =
d
dα1
φ(~x1(α1))
= ∇φ(~x1)T d~x1
dα1
= −~rT1 ~r0.
This has the natural interpretation that the optimal step length is obtained at
the point ~x1 where the line on which we seek the new approximation is tangent
to a level surface, i.e., in the new point ~x1 the new gradient, −~r1 is orthogonal
to the search direction ~r0.
This condition leads to an expression for the optimal step length as follows:
0 = −~rT1 ~r0
= −(~b−A~x1)T~r0
= −(~b−A(~x0 + α1~r0)T~r0
= −(~r0 − α1A~r0)T~r0,
66 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
or
α1 =
~rT0 ~r0
~rT0 A~r0
.
This process is repeated to determine ~x2, ~x3, . . ., until a stopping criterion is
satisfied.
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
kappa:2 steps:13
Figure 4.2.1: Steepest descent convergence pattern for matrix (4.2) with λ = 2
and κ2(A) = 2, from initial guess ~x0 = (−1, 0.5)T .
Algorithm 4.2: Steepest Descent Method for A~x = ~b, A SPD
Input: matrix A ∈ Rn×n, SPD; initial guess ~x0
Output: sequence of approximations ~x1, ~x2, . . .
~r0 = ~b−A~x0
k = 0
repeat
k = k + 1
αk = (~r
T
k−1~rk−1)/(~r
T
k−1A~rk−1)
~xk = ~xk−1 + αk~rk−1
~rk = ~rk−1 − αkA~rk−1
until convergence criterion is satisfied
The cost per iteration of the steepest descent algorithm consists of one matrix-
vector product, two scalar products of vectors in Rn, and two so-called axpy
operations, denoting operations of type a~x+ ~y with vectors in Rn. If A is sparse
with nnz(A) = O(n), then the cost of one steepest descent iteration is O(n).
It can be shown that, if A is SPD, convergence to ~x∗ is guaranteed from any
initial guess. However, convergence can take many iterations, as illustrated in
the following example.
4.2. The Steepest Descent Method 67
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
-0.5
0
0.5
kappa:20 steps:139
Figure 4.2.2: Steepest descent convergence patterns for matrices (4.2) with λ =
20 and λ = 200, from initial guess ~x0 = (−1, 0.05)T and ~x0 = (−1, 0.005)T ,
respectively.
Example 4.3
We consider solving A~x = ~b with SPD matrix
A =
[
1 0
0 λ
]
, (4.2)
where λ > 1, and with ~b = 0, i.e., the solution ~x∗ = (0, 0)T . Since A is SPD,
68 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
we have
κ2(A) =
λmax(A)
λmin(A)
= λ.
We first consider the case that λ = 2, i.e., κ2(A) = 2. Fig. 4.2.1 shows
level curves of φ(~x), which are ellipses aligned with the coordinate axes. The
figure shows the steepest descent convergence pattern starting from initial
guess ~x0 = (−1, 0.5)T . The convergence criterion
‖~ri‖
‖~r0‖ ≤ 10
−6
is satisfied after 13 steps.
However, Fig. 4.2.2 shows that, when increasing λ and κ2(A) to 20 and 200,
the number of iterations grows strongly: as κ2(A) increases, the level curves
become more elongated and, depending on the choice of the initial condition,
this may result in extreme zig-zag patterns. For example, when κ2(A) = 200,
the method requires more than 1,300 iterations. (Note, on the contrary, that
the exact solution is obtained in one step if κ2(A) = 1, in which case the level
curves are circles and the normal from any point is directed exactly towards
to the origin, which is the solution of the problem.)
This example shows that, for the steepest descent method, the number of
iterations required for convergence may increase proportionally to the matrix
condition number, κ. Since in many examples the condition number grows as
a function of problem size, this behaviour is clearly undesirable for the large-
scale problems we target. Therefore, we seek iterative methods with improved
convergence behaviour. The conjugate gradient method of the next section offers
such an improvement.
4.3 The Conjugate Gradient Method
Let A ∈ Rn×n be SPD. The Conjugate Gradient (CG) method for
A~x = ~b
is given by:
4.3. The Conjugate Gradient Method 69
Algorithm 4.4: Conjugate Gradient Method for A~x = ~b, A SPD
Input: matrix A ∈ Rn×n, SPD; initial guess ~x0
Output: sequence of approximations ~x1, ~x2, . . .
1: ~r0 = ~b−A~x0
2: ~p0 = ~r0
3: k = 0
4: repeat
5: k = k + 1
6: αk = (~r
T
k−1~rk−1)/(~p
T
k−1A~pk−1)
7: ~xk = ~xk−1 + αk~pk−1
8: ~rk = ~rk−1 − αkA~pk−1
9: βk = (~r
T
k ~rk)/(~r
T
k−1~rk−1)
10: ~pk = ~rk + βk~pk−1
11: until convergence criterion is satisfied
We first define residual and error, before explaining the context in which the
CG algorithm was derived.
Definition 4.5
Consider iterate ~xk for solving A~x = ~b with exact solution ~x
∗.
The residual of iterate ~xk is given by
~rk = ~b−A~xk.
The error of iterate ~xk is given by
~ek = ~x
∗ − ~xk.
Note that
A~ek = A~x
∗ −A~xk
= ~b−A~xk
= ~rk.
Recall the iteration formula of the steepest descent method,
~xk = ~xk−1 + αk~rk−1
where
~rk−1 = −∇φ(~xk−1) = ~b−A~xk−1
is the direction of steepest descent. We have seen in an example that the steepest
descent direction may not be a suitable direction when the linear system is ill-
conditioned. The CG algorithm aims at making a step in a better direction. It
considers the iteration formula
~xk = ~xk−1 + αk~pk−1, (4.3)
or
~xk = ~xk−1 + ~qk, (4.4)
70 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
with ~qk = αk~pk−1, where ~pk−1 is the step direction and αk is the step length,
which are chosen optimally in the following sense.
Definition 4.6
Let A ∈ Rn×n and ~r0 ∈ Rn. The Krylov space Kk(~r0, A) generated by ~r0
and A is the subspace of Rn defined by
Kk(~r0, A) = span{~r0, A~r0, A2~r0, . . . , Ak−1~r0}.
Considering Eq. (4.4), the CG method determines the vector ~qk in the Krylov
space Kk(~r0, A) such that the error ~ek is minimised in the A-norm.
From Eq. (4.4) we have
~x∗ − ~xk = ~x∗ − ~xk−1 − ~qk
or ~ek = ~ek−1 − ~qk,
so CG chooses ~qk in Kk(~r0, A) such that
‖~ek‖A = ‖~ek−1 − ~qk‖A
is minimal over all vectors ~q in Kk(~r0, A). In the next section we show that
Algorithm 4.4 achieves this goal. This optimality leads to desirable convergence
properties for broad classes of problems, significantly improving over steepest
descent.
Note also that the cost per iteration of the CG algorithm is not much larger
than the cost of steepest descent: CG requires one matrix-vector product, two
scalar products, and three axpy operations per iteration.
4.4 Properties of the Conjugate Gradient Method
In this section we will show and discuss some properties of the CG algorithm.
To make the proofs somewhat easier, we consider, without loss of generality,
the case where we solve
A~x = ~b
with initial guess
~x0 = 0.
This is no restriction, because it is equivalent to applying CG to
A(~x− ~x0) = ~b−A~x0 or A~y = ~c.
when ~x0 6= 0. Note that the residuals for A~y = ~c are the same as for A~x = ~b,
~rk = ~c−A~yk = (~b−A~x0)−A(~xk − ~x0) = ~b−A~xk,
which implies that the step directions and α and β parameters in the CG algo-
rithm also don’t change.
4.4.1 Orthogonality Properties of Residuals and Step Directions
An important property of CG is that the step directions ~pi are mutually A-
orthogonal or A-conjugate, from which the method derives its name.
4.4. Properties of the Conjugate Gradient Method 71
Definition 4.7
Let A ∈ Rn×n be and SPD matrix. Then vectors ~pi and ~pj ∈ Rn are called
A-orthogonal or A-conjugate if
~pTi A~pj = 0.
A-orthogonality of the step directions in Algorithm 4.4 is proven as part of
the following theorem.
Theorem 4.8
Let ~x0 = 0 in the CG algorithm (Algorithm 4.4). As long as convergence has
not been reached before iteration k (~rk−1 6= 0), there are no divisions by 0, and
the following hold:
(A) Let
Xk = span{~x1, . . . , ~xk},
Pk = span{~p0, . . . , ~pk−1},
Rk = span{~r0, . . . , ~rk−1},
Kk = span{~r0, A~r0, A2~r0, . . . , Ak−1~r0} = Kk(~r0, A).
Then
Xk = Pk = Rk = Kk.
(B) The residuals are mutually orthogonal:
~rTk ~rj = 0 (j < k).
(C) The step directions are mutually A-orthogonal:
~pTkA~pj = 0 (j < k).
Proof. The proof is by induction on k. The details are quite involved; we
provide a sketch of the proof.
(A) Assume Xk−1 = Pk−1 = Rk−1 = Kk−1.
Line 7 in Algorithm 4.4 (l7), ~xk = ~xk−1 + αk~pk−1, shows that Xk = Pk.
And (l10), ~pk = ~rk + βk~pk−1, shows that Pk = Rk.
Finally, (l8), ~rk = ~rk−1 − αkA~pk−1, shows that Rk = Kk.
(B) Multiplying (l8), ~rk = ~rk−1 − αkA~pk−1, with ~rj on the right, we get
~rTk ~rj = ~r
T
k−1~rj − αk~pTk−1A~rj .
Case j < k − 1:
~rTk ~rj = 0,
since, by the induction hypothesis, ~rTk−1~rj = 0, and ~p
T
k−1A~rj = 0 since
~rj ∈ Pk−1.
72 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
Case j = k − 1:
~rTk ~rk−1 = 0,
if
αk = (~r
T
k−1~rk−1)/(~p
T
k−1A~rk−1).
However, this is equivalent to (l6):
αk =
~rTk−1~rk−1
~pTk−1A~rk−1
=
~rTk−1~rk−1
~pTk−1A(~pk−1 − βk−1~pk−2)
(by (l10))
=
~rTk−1~rk−1
~pTk−1A~pk−1
(by A-orthogonality).
(C) Multiplying (l10), ~pk = ~rk + βk~pk−1, with A~pj on the right, we get
~pTkA~pj = ~r
T
k A~pj + βk~p
T
k−1A~pj .
Case j < k − 1:
~pTkA~pj = 0,
since, by the induction hypothesis, ~rTk A~pj = ~r
T
k (~rj − ~rj+1)/αj+1 = 0
(using (l8)), and ~pTk−1A~pj = 0.
Case j = k − 1:
~pTkA~pk−1 = 0,
if
βk = −(~rTk A~pk−1)/(~pTk−1A~pk−1).
However, this is equivalent to (l9):
βk =
−(~rTk A~pk−1)
~pTk−1A~pk−1
=
−(~rTk A~pk−1)
~pTk−1A~pk−1
αk
αk
=
~rTk (−αkA~pk−1)
~rTk−1~rk−1
(by (l6))
=
~rTk (~rk − ~rk−1)
~rTk−1~rk−1
(by (l8))
=
~rTk ~rk
~rTk−1~rk−1
(by residual orthogonality)
4.4. Properties of the Conjugate Gradient Method 73
Some additional comments can be made about the residual orthogonality,
~rTk ~rj = 0 (j < k). (4.5)
• Condition (4.5) implies that, for consecutive residuals,
~rTk ~rk−1 = 0,
as in the steepest descent method. However, condition (4.5) implies that
~rk is orthogonal to all previous residuals, which is clearly a much stronger
property than for steepest descent. In fact, this implies finite termina-
tion in at most n steps: since the ~ri are mutually orthogonal vectors in
Rn, and there can be at most n nonzero orthogonal residual vectors in Rn,
we have ~rn = 0. So we have proved the following theorem:
Theorem 4.9
The CG algorithm converges to the exact solution in at most n steps (in
exact arithmetic).
This property may indicate that we can consider CG as a direct method,
but in practice it is used as an iterative method, because in many practical
cases it attains an accurate approximation in much fewer than n steps.
Figure 4.4.1 compares the performance of the CG and steepest descent
methods for the 2D Laplacian matrix.
• It can be shown that, in the update
~xk = ~xk−1 + αk~pk−1,
CG chooses the optimal step length along direction ~pk−1, as in steepest
descent:
d
dαk
φ(~xk(αk)) = 0.
It is easy to show that this requires step length
αk =
~rTk−1~pk−1
~pTk−1A~pk−1
,
which can be shown to be equivalent to (l6) in Algorithm 4.4.
4.4.2 Optimal Error Reduction in the A-Norm
Theorem 4.10
Let ~x0 = 0 in the CG algorithm (Algorithm 4.4). As long as convergence has
not been reached before iteration k (~rk−1 6= 0), the iterate ~xk minimises
‖~ek‖A = ‖~x∗ − ~xk‖A
over the Krylov space Kk(~r0, A).
74 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
100 200 300 400 500
iterations
-6
-5
-4
-3
-2
-1
0
1
lo
g1
0(r
es
idu
al)
cg
steepest descent
Figure 4.4.1: Comparison of steepest descent and CG convergence histories for
the 2D Laplacian with N = 32 and n = 1024, with RHS a vector of all-ones, and
zero initial guess. The condition number κ2(A) ≈ 440.
Proof. We know that ~xk ∈ Kk(~r0, A).
Consider any vector ~y ∈ Kk(~r0, A) that is different from ~x, i.e.,
~y = ~xk + ~z
for some ~z ∈ Kk(~r0, A), ~z 6= 0. Then
~e~y = ~x
∗ − ~y
= ~x∗ − ~xk − ~z
= ~ek − ~z.
We have
‖~e~y‖A = (~ek − ~z)TA(~ek − ~z)
= ~eTkA~ek − ~eTkA~z − ~zTA~ek + ~zTA~z
= ~eTkA~ek − 2~zTA~ek + ~zTA~z (since A = AT )
= ~eTkA~ek − 2~zT~rk + ~zTA~z (since A~ek = ~rk)
= ‖~ek‖A + ~zTA~z (since ~z ∈ Kk(~r0, A) = span{~r0, . . . , ~rk−1}),
so
‖~e~y‖A ≥ ‖~ek‖A,
since A is SPD.
4.4. Properties of the Conjugate Gradient Method 75
Note: this theorem implies that
‖~ek‖A ≤ ‖~ek−1‖A,
since Kk−1 ⊂ Kk. We say that convergence in the A-norm is monotone.
4.4.3 Convergence Speed
The following theorems can be proved about the convergence speed of the steep-
est descent and CG methods.
Theorem 4.11
Let A ∈ Rn×n be SPD. Let κ be the 2-norm condition number of A, κ = κ2(A).
Then the errors of the iterates in the steepest descent method satisfy
‖~ek‖A
‖~e0‖A ≤
(
κ− 1
κ+ 1
)k
.
Theorem 4.12
Let A ∈ Rn×n be SPD. Let κ be the 2-norm condition number of A, κ = κ2(A).
Then the errors of the iterates in the CG method satisfy
‖~ek‖A
‖~e0‖A ≤ 2
(√
κ− 1√
κ+ 1
)k
.
It can be shown that, for large κ, this leads to the following estimates for the
number of iterations k required to converge to
‖~ek‖A
‖~e0‖A ≈
with a fixed small :
• steepest descent:
k = O(κ).
• CG:
k = O(
√
κ).
Example 4.13: Condition Number of 1D Laplacian
Consider the 1D Laplacian matrix
A =

−2 1
1 −2 1
1 −2 1
. . .
. . .
. . .
1 −2 1
1 −2

∈ Rn×n.
76 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
It can be shown that the n eigenvalues of this matrix are given by
λk = 2 (cos(kpih)− 1) (k = 1, . . . , n)
= −4 sin2(kpih/2)
= −4 sin2
(
kpi
2(n+ 1)
)
,
where h = 1/(n + 1). Since A is symmetric and all eigenvalues are strictly
negative, this matrix is symmetric negative definite. (I.e, −A is SPD.)
Since A is symmetric,
κ2(A) =
|λ|max(A)
|λ|min(A) .
It is easy to show that
κ2(A) = O
(
1
h2
)
= O(n2).
This means that linear systems with A become increasingly harder to solve for
iterative methods when n grows:
• Steepest descent takes O(κ) = O(n2) iterations, and since the cost per
iteration is O(n), the total cost is O(n3).
• CG takes O(√κ) = O(n) iterations, so the total cost is O(n2).
Example 4.14: Condition Number of 2D Laplacian
Consider the 2D Laplacian matrix
A =

T I 0 . . . 0
I T I 0 . . . 0
0 I T I 0 . . . 0
...
. . .
. . .
. . .
...
0 . . . 0 I T I 0
0 . . . 0 I T I
0 . . . 0 I T

∈ Rn×n,
where n = N2 and T and I are block matrices ∈ RN×N (T is tridiagonal with
elements 1, -4, 1 on the three diagonals). (Here, N is the number of interior
points in the x and y directions, and h = 1/(N + 1). Note that A does not
include the 1/h2 factor.)
It can be shown that the n = N2 eigenvalues of this matrix are given by the
N2 possible sums of the N eigenvalues of the 1D Laplacian matrices in the x
and y directions:
λk,l = 2 (cos(kpih)− 1 + cos(lpih)− 1) (k = 1, . . . , N ; l = 1, . . . , N)
= −4(sin2(kpih/2) + sin2(lpih/2))
= −4
(
sin2
(
kpi
2(N + 1)
)
+ sin2
(
lpi
2(N + 1)
))
,
4.5. Preconditioning for the Conjugate Gradient Method 77
where h = 1/(N + 1). Since A is symmetric and all eigenvalues are strictly
negative, this matrix is symmetric negative definite. (I.e, −A is SPD.)
It is easy to show that
κ2(A) = O
(
1
h2
)
= O(N2) = O(n).
This means that linear systems with A become increasingly harder to solve for
iterative methods when n grows:
• Steepest descent takes O(κ) = O(n) iterations, and since the cost per
iteration is O(n), the total cost is O(n2).
• CG takes O(√κ) = O(√n) iterations, so the total cost is O(n3/2).
4.5 Preconditioning for the Conjugate Gradient Method
4.5.1 Preconditioning for Solving Linear Systems
We saw in the previous section that the number of iterations, k, required to
reach a specific tolerance when solving a linear system
A~x = ~b
using CG, satisfies
k = O(
√
κ(A)).
If A is ill-conditioned, this may lead to large numbers of iterations, which can be
especially undesirable when the condition number grows as a function of problem
size, like for our 2D model problem. The idea of preconditioning the linear
system aims at reducing the number of iterations an iterative method requires
for convergence by reformulating the linear system as an equivalent problem that
has the same solution, but features a matrix with a smaller condition number.
The first approach is the idea of left preconditioning: multiply A~x = ~b
on the left with a nonsingular matrix P ∈ Rn×n to obtain the equivalent linear
system
PA~x = P~b,
where the preconditioning matrix (or preconditioner) P is chosen such that
κ(PA) κ(A),
perhaps by choosing P such that
P ≈ A−1.
Such a choice may reduce the condition number and the number of iterations sub-
stantially, and will not increase the cost per iteration too much if P is a cheaply
computable approximation of A−1. More broadly, the convergence speed of it-
erative methods for general linear systems A~x = ~b, where A is not necessarily
SPD, usually depends on the eigenvalue distribution of the matrix – e.g., the
clustering of eigenvalues – and its condition number, and the goal of precondi-
tioning is to improve the eigenvalue distribution of PA such that the iterative
method converges faster than for A.
78 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
An alternative approach to left preconditioning is the idea of right precon-
ditioning with a nonsingular matrix P ∈ Rn×n, which reformulates the system
as
APP−1~x = ~b,
and solves
AP~y = ~b
using the iterative method, where ~x is obtained at the end from
P~y = ~x.
4.5.2 Left Preconditioning for CG
The general matrix preconditioning strategies described above are, however, not
directly applicable to CG, because CG requires the system matrix to be SPD,
and, with the original A being SPD, PA or AP are generally not symmetric.
However, preconditioning can be applied to CG as follows.
When applying preconditioning to linear system
A~x = ~b
with A ∈ Rn×n an SPD matrix, we choose a preconditioning matrix P that is
SPD. The matrix P can always be written as the product XXT , where X is a
nonsingular matrix in Rn×n:
P = V ΛV T
= V
√
Λ
√
ΛV T
= (V
√
Λ)(V
√
Λ)T
= XXT ,
where V ∈ Rn×n contains n orthonormal eigenvectors of A, and Λ ∈ Rn×n is
a diagonal matrix containing the corresponding eigenvalues, which are strictly
positive.
The we can, using a change of variables, reformulate the left-preconditioned
linear system as an equivalent system with an SPD matrix as follows:
PA~x = P~b
XXTA~x = XXT~b
XTAXX−1~x = XT~b
(XTAX)~y = XT~b,
B~y = ~c,
where
B = XTAX
~c = XT~b
~y = X−1~x.
The following result shows that B is SPD, so we can apply CG to B~y = ~c.
4.5. Preconditioning for the Conjugate Gradient Method 79
Theorem 4.15
Let A ∈ Rn×n be SPD and X ∈ Rn×n be nonsingular. Then B = XTAX is
SPD.
Proof. B is symmetric since A is symmetric. Moreover, for any ~x 6= 0,
~xT (XTAX)~x = (X~x)TA(X~x) > 0
since A is SPD and X~x 6= 0 because X is nonsingular.
Also, B has the same eigenvalues as PA, so the eigenvalues of PA determine
the 2-condition number of B, and hence, the speed of convergence of CG applied
to B~y = ~c.
Theorem 4.16
Let A ∈ Rn×n be SPD and X ∈ Rn×n be nonsingular, with P = XXT . Then
B = XTAX has the same eigenvalues as PA.
Proof. This follows because B is similar to PA:
B = XTAX = (X−1X)XTAX = X−1(PA)X,
which implies that B and PA have the same eigenvalues.
4.5.3 Preconditioned CG (PCG) Algorithm
Applying CG to
B~y = ~c
results in the following algorithm, where we use notation with a hat for the resid-
uals ~̂rk and search directions ~̂p0 associated with formulating the CG algorithm
for computing ~y, rather than ~x.
80 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
Algorithm 4.17: Preconditioned CG (PCG) Method – Version 1
Input: SPD matrix A, RHS ~b; initial guess ~x0; SPD preconditioner P = XX
T
Output: approximation ~xk after stopping criterion is satisfied
1: B = XTAX
2: ~c = XT~b . we will apply CG to B~y = ~c
3: ~y0 = X
−1~x0
4: ~̂r0 = ~c−B~y0
5: ~̂p0 = ~̂r0
6: k = 0
7: repeat
8: k = k + 1
9: αk = (~̂r
T
k−1~̂rk−1)/(~̂p
T
k−1B~̂pk−1)
10: ~yk = ~yk−1 + αk~̂pk−1
11: ~̂rk = ~̂rk−1 − αkB~̂pk−1
12: βk = (~̂r
T
k ~̂rk)/(~̂r
T
k−1~̂rk−1)
13: ~̂pk = ~̂rk + βk~̂pk−1
14: until stopping criterion is satisfied
15: ~xk = X~yk
It turns out, however, that the PCG algorithm can be reformulated in terms
of the original ~x variable, in a way that involves only the P and A matrices,
without explicit need for the X and XT factors.
This proceeds as follows.
We first multiply (l10) in Algorithm 4.17 by X from the left to convert from
~y to ~x = X~y:
X~yk = X~yk−1 + αkX~̂pk−1
~xk = ~xk−1 + αk~pk−1,
where we have defined the search direction for ~xk, ~pk−1, by
~pk−1 = X~̂pk−1.
Next we observe that residuals for ~y and ~x are related by
~̂r = ~c−B~y
= XT~b−XTAX~y
= XT (~b−A~x)
= XT~r,
which we use to transform (l11) to
~̂rk = ~̂rk−1 − αkB~̂pk−1
XT~rk = X
T~rk−1 − αkXTAX~̂pk−1
~rk = ~rk−1 − αkA~pk−1.
4.5. Preconditioning for the Conjugate Gradient Method 81
Then we multiply (l13) by X from the left to convert from ~̂p to ~p:
X~̂pk = X~̂rk + βkX~̂pk−1
~pk = XX
T~rk + βk~pk−1
~pk = P~rk + βk~pk−1,
where we have used that P = XXT .
Finally, to convert the scalar products in αk and βk to use ~rk and ~pk, we
write
~̂r
T
k ~̂rk = (X
T~rk)
T (XT~rk)
= ~rTkXX
T~rk
= ~rTk P~rk,
and
~̂p
T
k−1B~̂pk−1 = (X
−1~pk−1)TXTAX(X−1~pk−1)
= ~pTk−1X
−TXTAXX−1~pk−1
= ~pTk−1A~pk−1,
resulting in
αk =
~rTk−1P~rk−1
~pTk−1A~pk−1
,
βk =
~rTk P~rk
~rTk−1P~rk−1
.
This gives the second version of the PCG algorithm:
Algorithm 4.18: PCG Method – Version 2
Input: SPD matrix A, RHS ~b; initial guess ~x0; SPD preconditioner P
Output: sequence of approximations ~x1, ~x2, . . .
1: ~r0 = ~b−A~x0
2: ~p0 = P~r0
3: k = 0
4: repeat
5: k = k + 1
6: αk = (~r
T
k−1P~rk−1)/(~p
T
k−1A~pk−1)
7: ~xk = ~xk−1 + αk~pk−1
8: ~rk = ~rk−1 − αkA~pk−1
9: βk = (~r
T
k P~rk)/(~r
T
k−1P~rk−1)
10: ~pk = P~rk + βk~pk−1
11: until stopping criterion is satisfied
In practice, multiplication of a residual ~r by P to obtain a preconditioned
residual ~q = P~r usually involves solving a linear system: since P ≈ A−1, we
normally know the sparse matrix P−1 ≈ A, and we solve
P−1~q = ~r
82 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
for ~q. This step needs to be performed only once per iteration, and it is worth-
while to rewrite the algorithm once more to indicate this explicitly:
Algorithm 4.19: PCG Method – Version 3
Input: SPD matrix A, RHS ~b; initial guess ~x0; SPD preconditioner P
Output: sequence of approximations ~x1, ~x2, . . .
1: ~r0 = ~b−A~x0
2: solve P−1~q0 = ~r0 for ~q0 (the preconditioned residual)
3: ~p0 = ~q0
4: k = 0
5: repeat
6: k = k + 1
7: αk = (~r
T
k−1~qk−1)/(~p
T
k−1A~pk−1)
8: ~xk = ~xk−1 + αk~pk−1
9: ~rk = ~rk−1 − αkA~pk−1
10: solve P−1~qk = ~rk for ~qk (the preconditioned residual)
11: βk = (~r
T
k ~qk)/(~r
T
k−1~qk−1)
12: ~pk = ~qk + βk~pk−1
13: until stopping criterion is satisfied
4.5.4 Preconditioners for PCG
We now briefly describe some standard preconditioners that are often used when
solving linear systems
A~x = ~b.
We begin by writing A as a sum of its diagonal part and its strictly lower and
upper triangular part,
A = AD −AL −AU ,
where the convention of using negative signs for the triangular parts stems from
SPD matrices with positive diagonal elements and negative off-diagonal elements
that arise in the context of certain PDE problems (e.g., −A for our 2D Lapla-
cian).
Example 4.20
The following standard preconditioning matrices are often used as cheaply
computable approximations of A−1, where we assume that the matrix inverses
in the expressions exist:
1. Jacobi:
P = A−1D
2. Gauss-Seidel (GS):
P = (AD −AL)−1
3. Symmetric Gauss-Seidel (SGS):
P = (AD −AU )−1AD(AD −AL)−1
4. Successive Over-Relaxation (SOR):
P = ω(AD − ωAL)−1 (ω ∈ (0, 2))
4.5. Preconditioning for the Conjugate Gradient Method 83
5. Symmetric Successive Over-Relaxation (SSOR):
P = ω(2− ω)(AD − ωAU )−1AD(AD − ωAL)−1 (ω ∈ (0, 2))
A few notes:
• 1., 3. and 5. give symmetric preconditioners P when A is symmetric (i.e.,
ATL = AU ), and they are the only ones that can be used with CG.
• Preconditioners 2.-5. contain (a sequence of) triangular matrices, which
can be inverted inexpensively by forward or backward substitution. If A is
sparse with nnz(A) = O(n), the cost of applying these preconditioners is
O(n), so preconditioning does not increase the computational complexity
per iteration beyond O(n). It may substantially reduce the number of
iterations required for convergence, and hence, may lead to faster overall
solve times and better scalability for large problems.
4.5.5 Using Preconditioners as Stand-Alone Iterative Methods
The preconditioning matrices presented in the previous section can also be used
as iterative methods by themselves, as we now explain. When solving
A~x = ~b
with exact solution ~x∗ and residual and error
~r = ~b−A~x,
~e = ~x∗ − ~x,
satisfying
A~e = ~r,
we start from the identity
~x∗ = ~x+ ~e = ~x+A−1~r.
We obtain a stationary iterative method by considering an easily computable
approximate inverse P of A,
P ≈ A−1,
and writing
~xk+1 = ~xk + P~rk, (4.6)
where
~rk = ~b−A~xk.
We easily derive the error propagation equation
~x∗ − ~xk+1 = ~x∗ − ~xk − P~rk,
~ek+1 = ~ek − PA~ek,
~ek+1 = (I − PA)~ek.
It can be shown that the iteration converges for any initial guess ~x0 when
‖I − PA‖p < 1
in some p-norm.
84 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
Example 4.21
The Gauss-Seidel (GS) iterative method for A~x = ~b with A ∈ Rn×n computes
a new approximation ~xnew from a previous iterate ~xold by (considering a simple
3× 3 example)
a11x
new
1 + a12x
old
2 + a13x
old
3 = b1
a21x
new
1 + a22x
new
2 + a23x
old
3 = b2
a31x
new
1 + a32x
new
2 + a33x
new
3 = b3.
(4.7)
Rearranging (for the general n × n case) yields the defining equation for the
Gauss-Seidel method:
xnewi =
1
aii
bi − i−1∑
j=1
aijx
new
j −
n∑
j=i+1
aijx
old
j
 . (4.8)
Using
A = AD −AL −AU ,
we can derive a matrix expression for this method by
A~x = ~b
(AD −AL −AU )~x = ~b
(AD −AL)~xk+1 = AU~xk +~b
~xk+1 = (AD −AL)−1((AD −AL −A)~xk +~b)
= ~xk + (AD −AL)−1(~b−A~xk)
= ~xk + (AD −AL)−1~rk.
Comparing with general update formula (4.6), we identify the preconditioning
matrix P for GS as
P = (AD −AL)−1.
A few notes:
• The Jacobi iteration is defined by
xnewi =
1
aii
bi − i−1∑
j=1
aijx
old
j −
n∑
j=i+1
aijx
old
j
 . (4.9)
• The preconditioning matrix for Symmetric Gauss-Seidel (SGS) can be de-
rived by concatenating a forward and a backward Gauss-Seidel sweep:
~xk+1/2 = ~xk + (AD −AL)−1~rk,
~xk+1 = ~xk+1/2 + (AD −AU )−1~rk+1/2.
• The Successive Over-Relaxation (SOR) method for A~x = ~b is an iterative
method in which for every component a linear combination is taken of a
4.5. Preconditioning for the Conjugate Gradient Method 85
Gauss-Seidel-like update and the old value:
xnewi = (1− ω)xoldi + ω
1
aii
bi − i−1∑
j=1
aij x
new
j −
n∑
j=i+1
aij x
old
j
 ,
with ω a fixed weight. Symmetric Successive Over-Relaxation (SSOR) is
obtained from combining a forward and a backward SOR sweep.
86 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems
Chapter 5
The GMRES Method for
Sparse Nonsymmetric
Systems
5.1 Minimising the Residual
In this chapter, we consider the generalised minimal residual (GMRES) iterative
method for solving linear systems
A~x = ~b
with A a nonsingular matrix ∈ Rn×n.
We recall that the CG method for linear systems with A SPD seeks the
optimal update in the Krylov space generated by A and the first residual, ~r0:
Ki+1(~r0, A) = span{~r0, A~r0, A2~r0, . . . , Ai~r0}.
CG considers the update formula
~xi+1 = ~xi + αi+1~pi, (5.1)
or
~xi+1 = ~xi + ~zi, (5.2)
and the update ~zi is determined in the Krylov space Ki+1(~r0, A) such that the
error ~ei+1 is minimised in the A-norm, i.e., with
~ei+1 = ~ei − ~zi,
each step of CG minimises the A-norm of the error:
min
~zi∈Ki+1(~r0,A)
‖~ei+1‖A.
Note that error minimisation in the A-norm is only possible when A is SPD.
GMRES, on the contrary, is intended for linear systems with generic nonsin-
gular matrices A, non necessarily symmetric, and also considers optimal updates
in the same Krylov space as CG, Ki+1(~r0, A) = span{~r0, A~r0, A2~r0, . . . , Ai~r0}.
It seeks ~zi in Ki+1(~r0, A) such that
~xi+1 = ~x0 + ~zi,
87
88 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems
with residual
~ri+1 = ~r0 −A~zi,
is minimal in the 2-norm:
min
~zi∈Ki+1(~r0,A)
‖~ri+1‖.
This minimisation of the residual in the 2-norm is more general than minimi-
sation of the error in the A-norm, because it can be done for any matrix A.
The resulting formulas are somewhat less economical than CG for SPD A, but
GMRES is a very powerful approach for general linear systems.
The GMRES method proceeds as follows: GMRES computes an orthonormal
basis {~q0, . . . , ~qi} for Ki+1(~r0, A),
Qi+1 =
 ~q0 ~q1 . . . ~qi
 ,
and it does so in an incremental way, computing an additional orthonormal vector
~qi for every iteration. The matrix Qi+1, with the orthonormal basis vectors as
its columns, satisfies
QTi+1Qi+1 = Ii+1.
GMRES chooses the update ~zi ∈ Ki+1(~r0, A), which can be represented as
~zi = Qi+1~y
for some ~y ∈ Ri+1. GMRES finds the optimal ~y ∈ Ri+1 in the expression
~xi+1 = ~x0 +Qi+1~y,
that minimises ~ri+1 in the 2-norm. Note that all vectors norms in this chapter
denote vector 2-norms.
5.2 Arnoldi Orthogonalisation Procedure
GMRES generates an orthonormal basis for the Krylov space
Ki+1(~r0, A) = span{~r0, A~r0, A2~r0, . . . , Ai~r0}.
by setting
~q0 = ~r0/‖~r0‖
and applying modified Gram-Schmidt to orthogonalise the vectors
{~q0, A~q0, A~q1, . . . , A~qi−1}.
Gram-Schmidt generates a new vector ~vm+1 orthogonal to the previous {~q0, ~q1, . . . , ~qm}
by subtracting from A~qm the components in the directions of the previous ~qj :
~vm+1 = A~qm − h0,m~q0 − h1,m~q1 − . . .− hm,m~qm,
where the projection coefficients hj,m are determined in the standard way. The
new orthonormal vector ~qm+1 is then determined by normalising ~vm+1:
~qm+1 = ~vm+1/hm+1,m
5.2. Arnoldi Orthogonalisation Procedure 89
where hm+1,m = ‖~vm+1‖.
So the basis vectors {~q0, ~q1, . . . , ~qm, ~qm+1} satisfy
hm+1,m~qm+1 = A~qm − h0,m~q0 − h1,m~q1 − . . .− hm,m~qm.
This procedure to generate an orthonormal basis of the Krylov space is called
the Arnoldi procedure. It can easily be shows that the set of Arnoldi vectors
generated by the procedure is a basis for span{~r0, A~r0, A2~r0, . . . , Ai~r0}:
Theorem 5.1
Let {~q0, . . . , ~qi} be the vectors generated by the Arnoldi procedure. Then
span{~q0, . . . , ~qi} = span{~r0, A~r0, A2~r0, . . . , Ai~r0}.
Proof. (sketch) This follows from a simple induction argument based on
hm+1,m~qm+1 = A~qm − h0,m~q0 − h1,m~q1 − . . .− hm,m~qm.
The Arnoldi procedure is given by:
Algorithm 5.2: Arnoldi Procedure for an Orthonormal Basis of Ki+1(~r0, A)
Input: matrix A ∈ Rn×n; vector ~r0
Output: vectors ~q0, . . . , ~qi that form an orthonormal basis of Ki+1(~r0, A)
ρ = ‖~r0‖
~q0 = ~r0/ρ
for m = 0 : i− 1 do
~v = A~qm
for j = 0 : m do
hj,m = ~q
T
j ~v
~v = ~v − hj,m~qj
end for
hm+1,m = ‖~v‖
~qm+1 = ~v/hm+1,m
end for
The vectors and coefficients computed in during the Arnoldi procedure can
90 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems
be written in matrix form as:
A
 ~q0 ~q1 . . . ~qi
 =
 ~q0 ~q1 . . . ~qi ~qi+1


h0,0 h0,1 h0,2 . . . h0,i
h1,0 h1,1 h1,2
h2,1 h2,2
. . .
...
h3,2
. . .
. . .
. . .
0 hi,i−1 hi,i
hi+1,i

,
or
AQi+1 = Qi+2H˜i+1.
Note that Qi+1 ∈ Rn×(i+1), Qi+2 ∈ Rn×(i+2), and H˜i+1 ∈ R(i+2)×(i+1). GMRES
uses this relation to minimise ‖~ri+1‖ over the Krylov space in an efficient manner,
as explained in the next section.
Note also that, when i+ 1 = n, the process terminates with hn,n−1 = ‖~vn‖ =
0, because there cannot be more than n orthogonal vectors in Rn. At this point,
we obtain
AQ = QH,
where Q ∈ Rn×n is orhtogonal and H ∈ Rn×n is a square matrix with zeros
below the first subdiagonal:
H =

h0,0 h0,1 h0,2 . . . h0,n−1
h1,0 h1,1 h1,2
h2,1 h2,2
. . .
...
h3,2
. . .
. . .
. . .
. . .
0 hn−2,n−3
. . .
hn−1,n−2 hn−1,n−1

.
This type of matrix is called an (upper) Hessenberg matrix:
Definition 5.3
Let H ∈ Rn×n. Then H is called an (upper) Hessenberg matrix if
hij = 0 for j ≤ i− 2.
This provides an orthogonal decomposition of A into Hessenberg form:
QTAQ = H.
5.3 GMRES Algorithm
GMRES uses the relation
AQi+1 = Qi+2H˜i+1, (5.3)
5.3. GMRES Algorithm 91
as obtained from the Arnoldi procedure, to minimise ‖~ri+1‖ over the Krylov
space in an efficient manner.
Since the columns of Qi+1 form an orthonormal basis for Ki+1(~r0, A), GM-
RES chooses the optimal ~y ∈ Ri+1 in
~xi+1 = ~x0 +Qi+1~y,
that minimises ~ri+1 in the 2-norm.
Note that, in Eq. (5.3), Qi+1 ∈ Rn×(i+1) and Qi+2 ∈ Rn×(i+2). Since n is
typically large (millions or billions) and i is small (perhaps 20-30 or so), these
matrices have many rows, so we will seek to avoid computing with them directly.
On the contrary, H˜i+1 ∈ R(i+2)×(i+1) is a small matrix, and we will exploit this
as follows.
Using Eq. (5.3), we write
‖~ri+1‖ = ‖~r0 −AQi+1~y‖
= ‖~r0 −Qi+2H˜i+1~y‖.
We know that ~q0 = ~r0/‖~r0‖ forms the first column of Qi+2, so we can write
~r0 = ‖~r0‖Qi+2~e1,
where ~e1 ∈ Ri+2 is the first canonical basis vector, ~e1 = (1, 0, . . . , 0)T . Therefore,
‖~ri+1‖2 =
∥∥∥Qi+2 (‖~r0‖~e1 − H˜i+1~y)∥∥∥2
=
(
Qi+2
(
‖~r0‖~e1 − H˜i+1~y
))T (
Qi+2
(
‖~r0‖~e1 − H˜i+1~y
))
=
(
‖~r0‖~e1 − H˜i+1~y
)T
QTi+2Qi+2
(
‖~r0‖~e1 − H˜i+1~y
)
=
(
‖~r0‖~e1 − H˜i+1~y
)T (
‖~r0‖~e1 − H˜i+1~y
)
=
∥∥∥‖~r0‖~e1 − H˜i+1~y∥∥∥2 .
Minimising ‖~ri+1‖ over ~y ∈ Ri+1 then boils down to solving a small least-squares
problem with an overdetermined matrix H˜i+1 ∈ R(i+2)×(i+1). For example, the
normal equations for this problem are given by
H˜Ti+1H˜i+1~y = ‖~r0‖H˜Ti+1~e1.
We find ~xi+1 from
~xi+1 = ~x0 +Qi+1~y.
In accurate implementations, the least-squares problem is solved using QR
decomposition. As i grows, the QR decomposition does not need to be recom-
puted for every new i, but can be updated cheaply as explained in [Saad, 2003].
Also, ~xi+1, or even ~y, does not need to be computed in every iteration. Since
the least-squares problem grows, it is common to restart the algorithm every 20
or so iterations.
The GMRES method for
A~x = ~b
92 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems
is given by:
Algorithm 5.4: GMRES Method for A~x = ~b
Input: matrix A ∈ Rn×n; initial guess ~x0
Output: sequence of approximations ~x1, ~x2, . . .
1: ~r0 = ~b−A~x0
2: ρ = ‖~r0‖
3: ~q0 = ~r0/ρ
4: m = 0
5: repeat
6: ~v = A~qm
7: for j = 0 : m do
8: hj,m = ~q
T
j ~v
9: ~v = ~v − hj,m~qj
10: end for
11: hm+1,m = ‖~v‖
12: ~qm+1 = ~v/hm+1,m
13: find ~y that minimises ‖ρ~e1 − H˜m+1~y‖
14: ~xm+1 = ~x0 +Qm+1~y
15: ‖~rm+1‖ = ‖ρ~e1 − H˜m+1~y‖
16: m = m+ 1
17: until convergence criterion is satisfied
5.4 Convergence Properties of GMRES
The following convergence result can be proved for the case that A is diagonal-
isable,
A = V ΛV −1.
(Note that eigenvalues of A ∈ Rn×n may be complex.)
Theorem 5.5
Let A ∈ Rn×n, nonsingular, be diagonalisable, A = V ΛV −1. Then the residu-
als generated in the GMRES method satisfy
‖~ri‖
‖~r0‖ ≤ κ2(V ) minpi(x)∈Pi maxΣ(A) |pi(λ)|.
Here, pi(x) is a polynomial of degree at most i in Pi, the set of polynomials of
degree at most i which satisfy pi(0) = 1. Σ(A) is the eigenvalue spectrum of
A, i.e., the set of eigenvalues of A.
This theorem indicates that the convergence behaviour depends on the con-
dition number of the matrix of eigenvectors of A, and on the distribution of the
eigenvalues of A in the complex plane. E.g., clustered spectra tend to lead to
fast convergence, since a low-degree polynomial can then typically be found that
is small on the whole spectrum. Since GMRES updates can be written in terms
of polynomials of A multiplying ~r0, GMRES can be interpreted as seeking the
optimal polynomial in Pi, which is used in the proof of this theorem.
5.5. Preconditioned GMRES 93
5.5 Preconditioned GMRES
Left preconditioning for GMRES proceeds by considering
PA~x = P~b
with, e.g.,
P ≈ A−1.
Alternatively, right preconditioning for GMRES proceeds by considering
APP−1~x = ~b
or
AP~z = ~b,
P−1~x = ~z.
The two variants perform similarly, but right preconditioning is sometimes pre-
ferred because it works with the original residual:
~r0 = ~b−AP~z0 = ~b−APP−1~x0 = ~b−A~x0.
This is right-preconditioned GMRES:
Algorithm 5.6: Right-Preconditioned GMRES Method for A~x = ~b
Input: matrix A ∈ Rn×n; initial guess ~x0; preconditioner P ≈ A−1
Output: sequence of approximations ~x1, ~x2, . . .
1: ~r0 = ~b−A~x0
2: ρ = ‖~r0‖
3: ~q0 = ~r0/ρ
4: m = 0
5: repeat
6: ~v = AP~qm
7: for j = 0 : m do
8: hj,m = ~q
T
j ~v
9: ~v = ~v − hj,m~qj
10: end for
11: hm+1,m = ‖~v‖
12: ~qm+1 = ~v/hm+1,m
13: find ~y that minimises ‖ρ~e1 − H˜m+1~y‖
14: ~xm+1 = ~x0 + PQm+1~y
15: ‖~rm+1‖ = ‖ρ~e1 − H˜m+1~y‖
16: m = m+ 1
17: until convergence criterion is satisfied
5.6 Lanczos Orthogonalisation Procedure for Symmetric
Matrices
If A = AT , then the Hessenberg matrix obtained by the Arnoldi process satisfies
HT = (QTAQ)T = QTATQ = QTAQ = H,
94 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems
so H is symmetric, which implies that it is tridiagonal.
Therefore, the Arnoldi update formula simplifies from
hm+1,m~qm+1 = A~qm − h0,m~q0 − h1,m~q1 − . . .− hm,m~qm.
to a three-term recursion relation
hm+1,m~qm+1 = A~qm − hm−1,m~qm−1 − hm,m~qm,
with
A
 ~q0 ~q1 . . . ~qi
 =
 ~q0 ~q1 . . . ~qi ~qi+1


h0,0 h0,1
h1,0 h1,1 h1,2 0
h2,1 h2,2 h2,3
. . .
. . .
. . .
. . .
. . .
0 hi,i−1 hi,i
hi+1,i

,
or, taking into account the symmetry further,
H˜i+1 =

α0 β0
β0 α1 β1 0
β1 α2 β2
. . .
. . .
. . .
. . .
. . .
0 βi−1 αi
βi

.
The simplification of the Arnoldi procedure to compute the orthonormal basis
{~q0, . . . , ~qi} of the Krylov space based on
A~qi = βi−1~qi−1 + αi~qi + βi~qi+1
is called the Lanczos procedure. It can be shown that the Lanczos procedure
is related to the CG algorithm (just like Arnoldi is used by GMRES).
The Lanczos procedure is given by:
5.6. Lanczos Orthogonalisation Procedure for Symmetric Matrices 95
Algorithm 5.7: Lanczos Procedure for an Orthonormal Basis of Ki+1(~r0, A)
Input: matrix A ∈ Rn×n, symmetric; vector ~r0
Output: vectors ~q0, . . . , ~qi that form an orthonormal basis of Ki+1(~r0, A)
ρ = ‖~r0‖
~q0 = ~r0/ρ
β−1 = 0
~q−1 = 0
for m = 0 : i− 1 do
~v = A~qm
αm = ~q
T
m~v
~v = ~v − αm~qm − βm−1~qm−1
βm = ‖~v‖
~qm+1 = ~v/βm
end for
Finally, it can be shown that the eigenvalues of the (i + 1) × (i + 1) matrix
Ĥi+1 that is formed by the first i + 1 rows of H˜i+1 ∈ R(i+2)×(i+1), obtained by
Arnoldi or Lanczos, provide approximations for eigenvalues of A. Indeed, when
i+ 1 = n, we can consider the eigenvalue decomposition
H = V ΛV −1
of H, and then
AQ = QH = QV ΛV −1
implies
AQV = QV Λ,
i.e., the columns of QV are the eigenvectors of A, with associated eigenvalues
in Λ. When i + 1 = n this relation is exact. When i n, the eigenvalues of
Ĥi+1 = V ΛV
−1 approximate some of the eigenvalues of A, and the columns
of Qi+1V approximate the associated eigenvectors. Eigenvalue and eigenvector
computation is an important topic of the second part of this unit.
96 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems
Part II
Eigenvalues and Singular
Values

Chapter 6
Basic Algorithms for
Eigenvalues
Eigenvalues problems and singular value decomposition are particularly inter-
esting because they serve as the driving force behind many important practical
problems, ranging from structural dynamics, quantum chemistry, data science,
Markov chain techniques, control theory, and beyond. Numerically stable and
computationally fast algorithms for identifying eigenvalues and eigenvectors are
powerful and yet far from obvious to construct.
6.1 Example: Page Rank and Stochastic Matrix
Before diving into the details of these jewels of computational science, we will
first introduce the stochastic matrix of a Markov chain. This will be later used
as an example in tutorial and assignment questions.
Markov chain is widely used for studying cruise control systems in motor
vehicles, queues or lines of customers arriving at an airport, exchange rates of
99
100 Chapter 6. Basic Algorithms for Eigenvalues
currencies, or even modelling internet search. Here we will use page rank as the
motivating example1.
Step 1: A directed graph consists of a non-empty set of nodes and a set of
directed edges. Nodes are indexed by natural integers, 1, 2, · · · . If there is an
edge from the node i to the node j then i is often called tail, while j is called
head. Each directed edge represents a possible transition from its tail to its head.
Given a collection of web sites, it is reasonable to think a web site i is a node,
and a hyperlink linked to another web site j forms a directed edge from i to j.
This creates a directed graph.
0.5
0.5
0.4
0.6
0.3
0.3
0.3
0.1
0.4
0.5
0.1S1
S2
S3
S4
Remark
Since the importance of a web site is measured by its popularity (how many
incoming links it has), we can view the importance of a site i as the proba-
bility that a random surfer on the Internet entering that website by following
hyperlinks.
Step 2: We can weigh the edge (hyperlink) of the graph in a probabilistic
way: A web site i is linked to other web sites (including itself) by hyperlinks, we
can count the number of occurrences of hyperlinks that are pointed to a web site
j, and normalise these numbers by the total number of hyperlinks contained in
site i. This way, the directed edge from i to j are weighted, and such weights can
be interpreted as a discrete probability distribution. Following this probability
distribution, a random surfer currently browsing web site i will enter other sites.
This transition is described by a stochastic system, at each node i, the tran-
sition from the node i to the node j follows certain probability. This discrete
probability distribution is represented as a vector, the j-th entry of the vector
represents the probability of moving to a node j. It follows several principles:
• If the transition probability from i to j is 0, there is no edge started at i
and ended at j, and vice versa.
1The web pages shown above above are downloaded from http://www.math.cornell.edu/
mec/Winter2009/RalucaRemus/Lecture3/lecture3.html
6.1. Example: Page Rank and Stochastic Matrix 101
• We can have a probability of staying at the current node i, this is repre-
sented by an edge started and ended at the same node i.
• At any give node, the sum of transition probabilities to other nodes (in-
cluding the current node) must be 1.
This way, we can define a stochastic matrix, also known as transition matrix or
Markov matrix, to describe the transitions of a Markov chain. If we assume that
there are n possible nodes, the stochastic matrix M ∈ Rn×n is a square matrix
and each of its entries is a nonnegative real number representing a probability
of moving from the node indexed by its row number to another node indexed
by its column number. The i-th row of the matrix M is the discrete probability
distribution moving from the current node i to other nodes.
Example 6.1
For example, the system in the figure has the following stochastic matrix:
M =

0.5 0.5 0 0
0 0.4 0 0.6
0.3 0.3 0.3 0.1
0 0.4 0.5 0.1

Note that each row of the stochastic matrix sums to 1.
Step 3: How to work out the popularity of a collection of web sites given the
stochastic matrix? We need to figure out the probability distribution of surfers
entering this collection of web sites. A site is considered as popular if it has a
high probability to be visited.
At a current step k, we assume the probability of visiting the collection of
web sites can be represented by a vector ~x(k). This vector is called the state of
the system at time k. The probability of a web site j will be visited in the next
step k+ 1 is the sum of probabilities of currently visiting a site i and then follow
an edge entering into site j. This is given as
~x(k+1)(j) =
n∑
i=1
~x(k)(i)Mij , (6.1)
where ~x(k)(i) is the i-th element of the vector ~x(k). This way, the probability
~x(k+1) is given as
~x(k+1)> = ~x(k)>M. (6.2)
Example 6.2
Continuing with the transition matrix in the diagram, we assume all web
sites have equally probabilities to be visited at the beginning, i.e., ~x(0) =[
0.25 0.25 0.25 0.25
]>
. The probability of visiting the node 2 in the
next step is given as
4∑
i=1
~x(0)(i)M(i, 2) = ~x(0)>M(:, 2) =
[
0.25 0.25 0.25 0.25
] 
0.5
0.4
0.3
0.4
 = 0.4.
102 Chapter 6. Basic Algorithms for Eigenvalues
Starting with an initial probability ~v(0), after k steps, the probability distri-
bution of web sites being visited is
~x(k)> = ~x(0)>Mk. (6.3)
The vector ~x(k), k → represents the probability distribution of web sites being
visited after a large number of visits.
Example 6.3
Continuing with the transition matrix in the diagram, we assume all web sites
have equally probabilities to be visited, i.e., ~x(0) =
[
0.25 0.25 0.25 0.25
]>
. Af-
ter one step of the transition, it gives
~x(1)> = ~x(0)>M =
[
0.2 0.4 0.2 0.2
]
.
After three steps of the transition, it gives
~x(3)> = ~x(0)>M3 =
[
0.128 0.4 0.188 0.284
]
.
After five step of the transition, it gives
~x(5)> = ~x(0)>M5 = 0.11972 0.3922 0.20312 0.28496 .
After ten steps of the transition, it gives
~x(10)> = ~x(0)>M10 =
[
0.12164 0.3919 0.20269 0.28378
]
.
After one hundred steps of the transition, it gives
~x(100)> = ~x(0)>M100 =
[
0.12162 0.39189 0.2027 0.28378
]
.
After one thousands steps of the transition, it gives
~x(1000)> = ~x(0)>M1000 =
[
0.12162 0.39189 0.2027 0.28378
]
.
We can also randomly choose an initial distribution ~x(0) =[
0.081295 0.54474 0.30791 0.066051
]>
. After one thousands steps
of the transition, it gives
~x(1000)> = ~x(0)>M1000 =
[
0.12162 0.39189 0.2027 0.28378
]
.
Regardless with the initial distribution, the transition seems to converge to a
stationary distribution.
Studying the eigenvectors and eigenvalues are crucial for identifying ~x(k), k →
and understanding its behaviour. In fact, we can show that the largest eigenvalue
of M is one. Under certain conditions, with large k, the vector ~x(k) converges
to a vector ~x(∞). The vector ~x(∞) is called the stationary distribution. The
stationary distribution is invariant under the transition defined by M , which
can be stated in the form of
~x(∞)> = ~x(∞)>M.
In fact, the stationary distribution is the eigenvector of M> associated with
eigenvalue 1.
6.1. Example: Page Rank and Stochastic Matrix 103
Remark
Here, we assume that the next state of the system only depends on it current
state. This property is called the Markov property.
The stochastic matrix discussed so far is often called the right stochastic
matrix, as it appears in the right side of the multiplication in defining a transition.
Definition 6.4
Given a right stochastic matrix Mr ∈ Rn×n, its entry Mr(i, j) defines the
transition probability from the node i to the node j. Each row of M sums to 1.
For computational convenience, in this note we are dealing with the transpose
of the right stochastic matrix, which is often referred to as the left stochastic
matrix.
Definition 6.5
A left stochastic matrix Ml ∈ Rn×n is a stochastic matrix with each of its
entries Ml(i, j) defines the transition probability from the node j to the node i.
Each column of M sums to 1. For the same transition diagram, Ml = M
>
r .
Example 6.6
The left stochastic matrix of the given diagram is
Ml =

0.5 0 0.3 0
0.5 0.4 0.3 0.4
0 0 0.3 0.5
0 0.6 0.1 0.1
 .
Given an initial distribution ~x(0) =
[
0.25 0.25 0.25 0.25
]>
, the probability of
visiting the node 2 in the next step is given as
4∑
i=1
Ml(2, i)~x
(0)(i) = Ml(2, :) ~x
(0) =
[
0.5 0.4 0.3 0.4
] 
0.25
0.25
0.25
0.25
 = 0.4.
Given ~x(k), the probability ~x(k+1) is given as
~x(k+1) = Ml ~x
(k). (6.4)
The stationary distribution has the property
~x(∞) = Ml ~x(∞). (6.5)
104 Chapter 6. Basic Algorithms for Eigenvalues
6.2 Fundamentals of Eigenvalue Problems
Let A ∈ Rn×n be a square matrix, a non-zero ~x ∈ C is called a eigenvector and
λ ∈ is called its corresponding eigenvalue, given that
A~x = λ~x. (6.6)
Here we review the basic mathematics of eigenvalues and eigenvectors.
6.2.1 Notations
This note does not deal with the eigenvalue problems of matrices with complex
entries. However, the eigenvalues and eigenvectors of a matrix filled with real
entries may not be real.
Example 6.7
The matrix
A =
[
0 1
−1 0
]
,
has eigenvalues ±i and eigenvectors [1,±i]>/√2.
We need to introduce some special matrix operations and special matrices
involving complex numbers.
Definition 6.8
The conjugate transpose or Hermitian transpose of a matrix A ∈ with
complex entries is the m-by-n matrix A∗ obtained from A by taking the trans-
pose and then taking the complex conjugate of each entry (i.e. negating their
imaginary parts but not their real parts). This takes the form of
A∗ij = Aji.
Definition 6.9
A unitary matrix is a complex square matrix Q ∈ with its conjugate transpose
Q∗ is also its inverse. That is
QQ∗ = Q∗Q = I.
An orthogonal matrix is unitary.
We also introduce some basic matrix operations that will be used in Part II.
1. Given a matrix A, the entry on the i-th row and j-th column is denoted as
Aij .
2. The k-th power of a square matrix A is denoted as Ak.
3. The superscript (k) is used to denote some variables at the k-th iteration
of an algorithm. For example, A(k) is a matrix A in the k-th iteration of
some algorithm. In general, A(k) is not Ak.
4. ~vi is used to denote the i-th vector in a sequence of vectors, and ~vi(j) is
used to denote the j-th entry of the vector ~vi.
6.2. Fundamentals of Eigenvalue Problems 105
5. To be consistent with 4, we also use A(i, j) to denote the entry on the i-th
row and j-th column of a matrix A, i.e., A(i, j) = Aij .
6. Similarly, we use A(k : l,m : n) to denote a submatrix of A—the submatrix
spans the k-th to l-th rows and m-th to n-th columns of A.
6.2.2 Eigenvalue and Eigenvector
Equation (6.6) can be equivalently stated as
(A− λI)~x = 0. (6.7)
For a given eigenvalue λ, there may exists a set of linearly independent eigen-
vectors Eλ such that
(A− λI)~v = 0, ∀~v ∈ Eλ.
The subspace spanned by the set Eλ is called an eigenspace. The eigenspace is
the nullspace of the matrix (A− λI).
Equation (6.7) has a non-zero solution ~x if and only if the determinant of the
matrix A− λI is zero. The determinant of the matrix A− λI can be expressed
as a polynomial.
Definition 6.10
The characteristic polynomial of A is a degree n polynomial in the form of
pA(λ) = det(A− λI). (6.8)
The eigenvalues of a matrix A are the roots of the characteristic polynomial.
The fundamental theorem of algebra implies that the characteristic polyno-
mial of A ∈ Rn×n, is a degree-n polynomial, can be factored as
pA(λ) = (λ− λ1)(λ− λ2) · · · (λ− λn) =
n∏
i=1
(λ− λi).
We note that each of the roots of the characteristic polynomial, λi, can be a
complex number. The roots, λ1, λ2, . . . , λn, may not all have distinct values.
This leads to the concept of algebraic multiplicity of an eigenvalue.
Definition 6.11
The algebraic multiplicity of an eigenvalue λi, µA(λi) is the multiplicity of
λi as a root of pA(λi). An eigenvalue is simple if it has multiplicity 1.
Another multiplicity of an eigenvalue λi, the geometric multiplicity, is defined
by the dimension of the nullspace of (A− λiI).
Definition 6.12
The geometric multiplicity of λi, µG(λi), is the number of linearly inde-
pendent eigenvectors associated with λi, or the dimension of the nullspace of
(A− λiI).
The algebraic multiplicity of any eigenvalue of a matrix A ∈ Rn×n is always
greater than or equal to the its geometric multiplicity. Later we will prove this.
106 Chapter 6. Basic Algorithms for Eigenvalues
Example 6.13
Consider two matrices
A =
 a a
a
 , B =
 a 1a 1
a
 , (6.9)
where a > 0. Both A and B have the same characteristic polynomial (a− 1)3.
A has three linearly independent eigenvectors, where as B only has one, namely
the scalar multiplication of ~e1.
Definition 6.14
A defective eigenvalue is an eigenvalue whose algebraic multiplicity exceeds
its geometric multiplicity. A matrix A ∈ Rn×n is called a defective matrix
if it has one or more defective eigenvalues.
6.2.3 Similarity Transformation
Definition 6.15
If a matrix X ∈ is nonsingular, then the map
A→ X−1AX,
is called a similarity transformation. Two matrices A ∈ and B ∈ are called
similar if there exist a nonsingular matrix X ∈ such that
B = X−1AX.
Similar matrices A and X−1AX share many important properties.
Theorem 6.16
Given a matrix A ∈ and a nonsingular matrix X ∈, A and X−1AX have
the same characteristic polynomial, eigenvalues, and algebraic and geometric
multiplicities.
Proof. By the definition of characteristic polynomial, we have
pX−1AX(λ) = det(X
−1AX − λI)
= det(X−1(A− λI)X)
= det(X−1) det(A− λI) det(X)
= det(A− λI)
= pA(λ)
Since A andX−1AX have the same characteristic polynomial, the agreement of
eigenvalues and algebraic multiplicity follows. The dimension of the nullspace
of (A − λI) is the same as that of X−1(A − λI)X are identical because X is
nonsingular, and thus the agreement of geometric multiplicity follows.
6.2. Fundamentals of Eigenvalue Problems 107
Similarity transformation can be used to show connections between algrbric
multiplicity and geometric multiplicity.
Theorem 6.17
The algebraic multiplicity of any eigenvalue of a matrix A ∈ Rn×n is always
greater than or equal to the its geometric multiplicity.
Proof. Suppose an eigenvalue with geometric multiplicity r has r linearly
independent eigenvectors, ~v1, . . . , ~vr, forming a matrix
Vr =
 ~v1 · · · ~vr
 ,
such that AVr = λVr. We can extending Vr to a unitary matrix V = [Vr|V⊥].
Applying the similarity transformation V ∗AV , we obtain
B = V ∗AV =
[
λIr C
0 D
]
,
where Ir ∈ is an identity matrix, C ∈r×(n−r), and D ∈(n−r)×(n−r). Since A
and B have the same characteristic polynomial, we can then expressed the
characteristic polynomial of A as
pA(z) = det(B−zI) = det(zIr−λIr) det(zIn−r−D) = (z−λ)r det(zIn−r−D).
Thus the algebraic multiplicity of λ is greater than or equal to r.
6.2.4 Eigendecomposition, Diagonalisation, and Schur Factorisation
For a matrix A ∈ Rn×n that is non-defective, i.e., the algebraic multiplicity and
the geometric multiplicity of each of its eigenvalue are the same. We have
AV = V Λ, (6.10)
where
V =
 ~v1 ~v2 · · · ~vn
 , Λ =

λ1
λ2
. . .
λn

are the eigenvectors and corresponding eigenvalues. This effective factorise the
matrix A in the form of
A = V ΛV −1. (6.11)
This similarity transformation effectively diagonalise the matrix A. In fact, it is
easy to verify that a diagonal matrix is non-defective.
108 Chapter 6. Basic Algorithms for Eigenvalues
Theorem 6.18
A matrix A ∈ Rn×n is non-defective if and only if it has an eigenvalue decom-
position A = V ΛV −1.
Proof. Given an eigenvalue decomposition A = V ΛV −1, we know that A and
Λ are similar. Since the diagonal matrix Λ is non-defective, A is non-defective
by Theorem 6.7.
A non-defective matrix must have n linearly independent eigenvectors, because
(1) the number of linearly independent eigenvector associated with each eigen-
value is equal to its algebraic multiplicity; and (2) eigenvectors associated with
different eigenvalues are linearly independent. Thus, the resulting matrix V
formed by all the eigenvectors is nonsingular.
A matrix A is unitarily diagonalisable, that is, there exists a unitary matrix
Q such that
A = QΛQ∗.
Real symmetric matrices are special matrices that are orthogonally diagonalis-
able. This leads to many computational advantages in finding their eigenvalues.
Remark 6.19
A real symmetric matrix is orthogonally diagonalisable and its eigenvalues are
real. That is, both Q and Λ are real for a symmetric A.
Not every matrix is unitarily diagonalisable. Furthermore, deflective matrices
are even not diagonalisable. A more general matrix decomposition is the Schur
factorisation.
Definition 6.20
A Schur factorisation of a matrix A ∈ Rn×n takes the form
A = QTQ∗,
where T is upper-triangular and Q is unitary.
Theorem 6.21
Every square matrix A ∈ Rn×n has a Schur factorisation.
Proof.
The case n = 1 is trivial as A is a scalar. Suppose n ≥ 2. Let ~x be any
eigenvector of A with corresponding eigenvalue λ. Take ~x be normalised and
let it be the first column of a unitary matrix U in the form of
U = [~x |U2] ,
where U2 ∈ R(n−1)×n.
The product U∗AU has the form
U∗AU =
[
~x∗A~x ~x∗AU2
U∗2A~x U
∗
2AU2
]
.
6.2. Fundamentals of Eigenvalue Problems 109
Since ~x∗A~x = λ and U∗2A~x = λU
∗
2 ~x = 0. Let C = ~x
∗AU2 and D = U∗2AU2,
the product can be simplified to
U∗AU =
[
λ C
0 D
]
.
By induction, there exists a Schur factorisation V TV ∗ of the lower dimensional
matrix D. Then write the unitary matrix
Q =
[
1 0
0 V
]
,
and we have
(Q∗U∗)A(UQ) =
[
λ CV
0 T
]
.
Since UQ is unitary and (Q∗U∗)A(UQ) is upper triangular, we obtain the
Schur factorisation.
Theorem 6.22
The eigenvalues of a triangular matrix are the entries on its main diagonal.
Proof. Given an upper triangular matrix T ∈ Rn×n. The characteristic
polynomial of T can be written as
pT (λ) = det(T − λI).
We can partition T in the following form
T − λIn =
 T2 − λIn−1 ~t1
~0> T11 − λ
 ,
where T2 − λIn−1 is also upper triangular. Using the property of the determi-
nant of block matrices, we have
det(T − λI) = det(T2 − λIn−1) det(T11 − λ−~0>(T2 − λIn−1)−1~t1)
= det(T2 − λIn−1)(T11 − λ).
Since T2− λI is also upper triangular, repeatedly applying this procedure will
leads to the characteristic polynomial
pT (λ) =
n∏
i=1
(Tii − λ).
Therefore, the eigenvalues of a triangular matrix are the entries on its main
diagonal.
110 Chapter 6. Basic Algorithms for Eigenvalues
Remark 6.23
In summary, we have the following important results for identifying eigenvalues
of a matrix.
1. A matrix A is nondefective if and only if there exists an eigenvalue de-
composition A = V ΛV −1.
2. For a symmetric matrix A, there exists an orthogonal diagonalisation
A = QΛQ∗.
3. A unitary triangularisation (Schur factorisation) always exists A =
QTQ∗.
Theorem 6.24
A real square matrix is symmetric if and only if it has the eigendecomposition
A = QΛQ>, where Q is a real orthogonal matrix and Λ is a real diagonal
matrix whose entries are the eigenvalues of A.
Proof. (The “only if” part =⇒ ): From Theorem 6.21 we know that a
general square matrix has the Schur decomposition A = QTQ>, where T is
upper triangular. This way we have T = QAQ>. For a symmetric matrix A,
the matrix T should be also symmetric. A symmetric upper triangular matrix
must be diagonal. This leads to the decomposition A = QTQ> where T is
diagonal, which is an eigendecomposition. Furthermore, all the eigenvalues
and eigenvectors of a real symmetric matrix are real.
(The “if” part ⇐= ): Given the eigendecomposition A = QΛQ> with a real
orthogonal Q and real diagonal Λ, A is a real symmetric matrix since Λ is
symmetric.
Therefore the result follows.
6.2.5 Extending Orthogonal Vectors to a Unitary Matrix
In the proofs in the previous subsection, one important step is extending a rect-
angular matrix
Vr =
 ~v1 · · · ~vr
 ,
where Vr ∈, to a unitary matrix
V =
 Vr V⊥
 ,
6.2. Fundamentals of Eigenvalue Problems 111
where V⊥ ∈n×(n−r). Here we explain the details of this operation.
For a given matrix A ∈ Rn×n, suppose it has an eigenvalue λ with geometric
multiplicity r. This way, the eigenvalue λ has r linearly independent eigenvectors,
i.e., A~ui = λ~ui, i = 1, . . . , r. Furthermore, we can show that a sequence of
orthonormal eigenvectors {~v1, ~v2, · · · , ~vr} can be obtained by orthogonalising and
normalising this set of eigenvectors {~u1, ~u2, · · · , ~ur}—using either Gram-Schmidt
or Householder reflection. The vectors {~v1, ~v2, · · · , ~vr} are still in the null space
of A−λI (the eigenspace of λ) as they are linear combinations of {~u1, ~u2, · · · , ~ur},
and thus are eigenvectors. This forms the matrix Vr.
As in the QR factorisation, we can always construct another n-by-(n-r) or-
thonormal matrix
V⊥ =
 ~vr+1 · · · ~vn
 ,
such that each column of V⊥ is orthogonal to all the columns of Vr. Since both
Vr and V⊥ are orthonormal, the matrix V = [Vr|V⊥] is a unitary matrix.
Now we have
V ∗AV =
[
V ∗r
V ∗⊥
]
A
[
Vr V⊥
]
=
[
V ∗r AVr V
∗
r AV⊥
V ∗⊥AVr V
∗
⊥AV⊥
]
.
Since AVr = λVr and V⊥ is orthogonal to Vr, the above equation can be written
as
V ∗AV =
[
λIr C
0 D
]
,
where C = V ∗r AV⊥ and D = V
∗
⊥AV⊥. The resulting matrix V
∗AV is upper
block triangular.
Similarly, we can construct another matrix
U =
 V⊥ Vr
 ,
and repeat the above process. This leads to
U∗AU =
[
V ∗⊥
V ∗r
]
A
[
V⊥ Vr
]
=
[
V ∗⊥AV⊥ V
∗
⊥AVr
V ∗r AV⊥ V
∗
r AVr
]
=
[
D 0
C λIr
]
,
with the same C and D defined above. The resulting matrix V ∗AV is lower
block triangular.
112 Chapter 6. Basic Algorithms for Eigenvalues
6.3 Power Iteration and Inverse Iteration
Given a matrix A ∈ Rn×n, we recall that eigenvalues are the roots of the char-
acteristic polynomial pA(λ) = det(A − λI). In principle, this characteristic
polynomial has degree n. For a polynomial with degree 2 or 3, well established
formulas can be used to find its roots. However, as shown by Abel, Galois, and
others in nineteenth century, a degree n ≥ 5 polynomial in the form of
p(λ) = a0 +
n∑
i=1
aiλ
i,
where each coefficient ai is a rational number, its roots can not be obtained by
algebraic expressions—addition, subtraction, multiplication, and division. This
suggests that we are not able to have direct solvers for finding eigenvalues of
general matrices.
Remark 6.25
As many root finding algorithms, eigenvalue solvers must be iterative.
6.3.1 Power Iteration
A straightforward idea is that the sequence
~b
‖~b‖
,
A~b
‖A~b‖
,
A2~b
‖A2~b‖
, · · · , A
k~b
‖Ak~b‖
converges to an eigenvector corresponding to the largest eigenvalue (in absolute
value) of the matrix A. This is called the power iteration. It can be formalised
as the following:
Algorithm 6.26: Power Iteration
Input: Matrix A ∈ Rn×n and an initial vector ~b(0) = ~x ∈ Rn, where ‖~x‖ = 1
Output: An eigenvalue λ(m) and its eigenvector ~b(m)
1: for k = 1, 2, . . . ,m do
2: ~t(k) = A~b(k−1) . Apply A
3: ~b(k) = ~t(k)/‖~t(k)‖ . Normalise
4: λ(k) =
(
~b(k)
)∗ (
A~b(k)
)
. Estimate eigenvalue
5: end for
Repeatedly apply Steps 2 and 3, the vectors ~b(k), k = 0, 1, . . . ,m follows the
sequence
~x
‖~x‖ ,
A~x
‖A~x‖ ,
A2~x
‖A2~x‖ , · · · ,
Am~x
‖Am~x‖ .
Suppose ~b(k) is an eigenvector of A, then we have
A~b(k) = λ(k)~b(k).
As ~b(k) is normalised, multiplying both sides by
(
~b(k)
)∗
leads to the ratio
λ(k) =
(
~b(k)
)∗ (
A~b(k)
)
(
~b(k)
)∗
~b(k)
=
(
~b(k)
)∗ (
A~b(k)
)
. (6.12)
6.3. Power Iteration and Inverse Iteration 113
Definition 6.27
The ratio
r(~b) =
~b ∗A~b
~b ∗~b
, (6.13)
can be understood as: given a direction ~b, what scalar λ acts most like an eigen-
value for ~b, in the sense of minimising the f(λ) = ‖A~b−λ~b‖2. By differentiate
this term w.r.t. λ, we have
∂f
∂λ
=
∂‖A~b− λ~b‖2
∂λ
= −2~b ∗(A~b− λ~b).
At λ such that ∂f∂λ = 0, f(λ) has the local minima (as the second derivative is
2~b ∗~b = 2‖~b‖2), and thus we have λ = r(~b) as defined above. For a symmetric
matrix A ∈ Rn×n, this ratio if called Rayleigh quotient.
6.3.2 Convergence of Power Iteration
We want to show the convergence of the power iteration in two aspects. We first
show that the sequence ~b(k) converges linearly to an eigenvector corresponding
to the largest eigenvalue. Then we prove that for an estimated eigenvector, the
estimated eigenvalue given by the ratio (6.13) converges linearly to corresponding
eigenvector.
Theorem 6.28
Assume a matrix A ∈ Rn×n is non-defective. Suppose its eigenvalues are
ordered so that
|λ1| ≥ |λ2| ≥ | · · · ≥ |λn|.
Let ~v1, . . . , ~vn denote (normalised) eigenvectors corresponding to each of the
eigenvalues. Suppose further we have an initial vector ~b(0) = ~x such that
~x∗~v1 6= 0. Then the vector ~b(k) in the power iteration satisfies
‖~b(k) − (±~v1)‖ = O
(∣∣∣∣λ2λ1
∣∣∣∣k
)
,
as k →∞. The ± represents one or other choice of the sign is to be taken.
Proof. We represent ~x as a linear combination of all the (normalised) eigen-
vectors ~v1, . . . , ~vn, which takes the form of
~x =
n∑
i=1
ai ~vi.
114 Chapter 6. Basic Algorithms for Eigenvalues
Let
V =
 ~v1 ~v2 · · · ~vn
 , Λ =

λ1
λ2
. . .
λn
 , and ~a =

a1
a2
...
an
 ,
we have A = V ΛV −1 and ~x = V~a, and hence
~b(k) = c(k)Ak~x = c(k)V ΛkV −1V~a = c(k)V Λk~a = c(k)
n∑
i=1
λki ai ~vi,
where c(k) brings the vector ~b(k) normalised.
Now we bring λk1 to the outside of the summation in the form of
~b(k) = c(k)λk1
(
n∑
i=1
λki
λk1
ai ~vi
)
= c(k)λk1a1 ~v1 + c
(k)λk1
n∑
i=2
λki
λk1
ai ~vi.
Therefore, the convergence of ~b(k) to ~v1 is dominated by the rate that each of
λki
λk1
vanish, which is on the order of
∣∣∣λ2λ1 ∣∣∣k.
Theorem 6.29
Assume a non-symmetric matrix A ∈ Rn×n is non-defective. Suppose λK is
an eigenvalue of A with an eigenvector ~vK . The ratio
r(~b) =
~b ∗A~b
~b ∗~b
,
is a linearly accurate estimate of the eigenvalue λK :
|r(~b)− λK | = O(‖~b− ~vK‖), as ~b→ ~vK .
Proof. We represent~b as a linear combination of all the eigenvectors ~v1, . . . , ~vn,
which takes the form of
~b =
n∑
i=1
ai ~vi.
As defined in the previous proof, we have A = V ΛV −1 and ~b = V~a, and hence
A~b = V ΛV −1V~a = V Λ~a =
n∑
i=1
λi ai ~vi.
6.3. Power Iteration and Inverse Iteration 115
This way the ratio r(~b) can be written as
r(~b) =
(
∑n
i=1 λi ai ~vi)
∗~b
~b∗~b
Thus, the error in the eigenvalue estimate takes the form
r(~b)− λK = (
∑n
i=1 λi ai ~vi)
∗~b
~b∗~b
− λK
~b∗~b
~b∗~b
=
(
∑n
i=1 λi ai ~vi)
∗~b− λK (
∑n
i=1 ai ~v
∗
i )
~b
~b∗~b
=
(∑n
i 6=K(λi − λK) ai ~v∗i~b
)
~b∗~b
.
Now, we can express the error as a weighted sum of ai for i 6= K, which is in
the form of
r(~b)− λK =
n∑
i6=K
ai wi, where wi =
(λi − λK)~v∗i~b
~b∗~b
.
Given ~b = aK ~vK +
∑n
i 6=K ai ~vi, if ~b is close to ~vK , each ai for i 6= K is on
the order of ~b−~vK . Therefore, r(~b) converges linearly to the eigenvalue λK as
~b→ ~vK .
Power iteration by itself can be slow. For example, it does not converge
if |λ1| = |λ2|. Nevertheless, it serves as a basis for many powerful eigenvalue
algorithms we will explore in later section. It also reveals the iterative nature of
eigenvalue solvers.
6.3.3 Shifted Power Method
We have observed that if the first and second largest eigenvalues (in their absolute
value) are close, the power iteration suffers from slow convergence. One simple
yet power idea to handle is situation is using a shifted matrix A+ σI.
Theorem 6.30
If λ is an eigenvalue of A, then λ+µ is an eigenvalue of A+µI. Furthermore,
if ~v is an eigenvector of A associated with λ, ~v is also an eigenvector of A+µI
associated with λ+ µ.
Using the shifted matrix A+ σI, we can enhance the ratio between the first
and second largest eigenvalues.
6.3.4 Inverse Iteration
There also exists alternative ways to enhance the ratio between eigenvalues.
116 Chapter 6. Basic Algorithms for Eigenvalues
Theorem 6.31
Suppose µ is not an eigenvalue of A ∈ Rn×n, the eigenvectors of (A−µI)−1 are
the same as A, and the corresponding eigenvalues are (λi − µ)−1, i = 1, . . . , n,
where λi, i = 1, . . . , n are eigenvalues of A.
This theorem suggests that if we choose a µ that is close to an eigenvalue
λK . Then the eigenvalue (λK−µ)−1 may be much larger that other eigenvalues,
(λi−µ)−1, i 6= K, of the matrix (A−µI)−1. This leads to the inverse iteration.
Algorithm 6.32: Inverse Iteration
Input: Matrix A ∈ Rn×n, an initial vector ~b(0) = ~x ∈ Rn where ‖~x‖ = 1,
and a shift scalar µ ∈ R.
Output: An eigenvalue λ(m) and its eigenvector ~b(m)
1: for k = 1, 2, . . . ,m do
2: Solve (A− µI)~w(k) = ~b(k−1) for ~w(k) . Apply (A− µI)−1
3: ~b(k) = ~w(k)/‖~w(k)‖ . Normalise
4: λ(k) =
(
~b(k)
)∗ (
A~b(k)
)
. Estimate eigenvalue
5: end for
6.3.5 Convergence of Inverse Iteration
Theorem 6.33
Given a nondefective matrix A ∈ Rn×n, suppose λK is the closest eigenvalue
to µ and λL is the second closest, that is,
|λK − µ| < |λL − µ| ≤ |λi − µ|, for each i 6= K.
Let ~v1, . . . , ~vn denote eigenvectors corresponding to each of the eigenvalues of
A. Suppose further we have an initial vector ~x such that ~x∗~vK 6= 0. Then the
vector ~b(k) in the inverse iteration satisfies
‖~b(k) − (±~vK)‖ = O
(∣∣∣∣λK − µλL − µ
∣∣∣∣k
)
,
and the estimated eigenvalue λ(k) satisfies
|λ(k) − λK | = O
(∣∣∣∣λK − µλL − µ
∣∣∣∣k
)
.
Proof. Using Theorem 6.31, we can show that the matrix B = (A − µI)−1
has eigenvalues zi = (λi − µ)−1, i = 1, . . . , n associated with (normalised)
eigenvectors ~v1, . . . , ~vn. Note that the eigenvalues are ordered as
|zK | > |zL| ≥ |zi| for each i 6= K.
6.3. Power Iteration and Inverse Iteration 117
Using the same argument in the proof of Theorem 6.28, we can show that
‖~b(k) − (±~vK)‖ = O
(∣∣∣∣ zLzK
∣∣∣∣k
)
= O
(∣∣∣∣λK − µλL − µ
∣∣∣∣k
)
.
Since the estimated eigenvector ~b(k) converges to ±~vK on the order of
O
(∣∣∣ zLzK ∣∣∣k). Applying Theorem 6.29, we can show that λ(k) = r(~b(k)) sat-
isfies
|λ(k) − λK | = O
(∣∣∣∣λK − µλL − µ
∣∣∣∣k
)
.
Remark 6.34
Step 2 of the inverse iteration relies on solving a linear system that is exceed-
ingly ill-conditioned. Will this create any fatal flaw in the algorithm?
Fortunately this does not introduce fatal flaw if the linear system is solved
by some stable methods. Step 2 of the inverse iteration solves
(A− µI)~w(k) = ~b(k−1)
for ~w(k). Suppose µ is close to an eigenvalue λJ with an eigenvector ~vJ .
Using Theorem 6.30, we can show that the matrix C = A−µI has eigenvalues
σi = λi − µ, i = 1, . . . , n associated with (normalised) eigenvectors ~v1, . . . , ~vn.
Given a diagonal matrix D where Dii = σi, the matrix C has the similarity
transformation C = V DV −1. We can express the right hand side vector ~b(k−1)
as a linear combination of eigenvectors,~b(k−1) = V~a. This way, ~w(k) = C−1~b(k−1)
can be written as
~w(k) = V D−1~a =
~a(J)
λJ − µ~vJ +
n∑
i6=J
~a(i)
λi − µ~vi. (6.14)
This leads to the desired eigenvector we want to approximate if µ is close to λJ .
Now we deal with the ill-conditioning part. We want to examine the stability
of ~w(k) given a small perturbation to C and ~b(k−1).
(C + δC)(~w(k) + δ ~w) = ~b(k−1) + δ~b.
The left hand side takes the form of
(C + δC)(~w(k) + δ ~w) = C ~w(k) + Cδ ~w + δC ~w(k) + δCδ ~w.
Since the double perturbation term δCδ ~w can be neglected and C ~w(k) = ~b(k−1),
we have
δ ~w = −C−1(δC ~w(k) + δ~b).
Without loss of generality, we can express (δC ~w(k) + δ~b) as a linear combination
of eigenvectors, (δC ~w(k) + δ~b) = V ~d. Using the eigendecomposition of C, we
have
δ ~w = V D−1 ~d =
~d(J)
λJ − µ~vJ +
n∑
i 6=J
~d(i)
λi − µ~vi. (6.15)
118 Chapter 6. Basic Algorithms for Eigenvalues
If µ is close to λJ , the perturbation to the solution, δ ~w also lies along the desired
eigenvector we want to approximate.
Therefore, as far as the linear system is solved by a stable method (for exam-
ple, LU with pivoting) that can produce a solution ~w + δ ~w, both ~w and ~w + δ ~w
closely lies along the same direction ~vJ . One step of normalisation will resolve
the difference in size.
6.4. Symmetric Matrices and Rayleigh Quotient Iteration 119
6.4 Symmetric Matrices and Rayleigh Quotient Iteration
In this section, we focus on applying the power iteration and the inverse iteration
to symmetric matrices. The convergence of eigenvalue estimates of symmetric
matrices exhibit a higher speed of convergence compare with that of unsym-
metric matrices. We will also show a new algorithm that combines eigenvalue
estimation using Rayleigh quotient with the inverse iteration to further enhance
the converence speed.
6.4.1 Rate of Convergence
Definition 6.35
Suppose we have a sequence y(1), y(2), . . . , y(k) converges to a number y. We
say the sequence converges linearly to y if
lim
k→∞
|y(k+1) − y|
|y(k) − y| = σ,
for some sigma ∈ (0, 1).
If the sequence converges with an iteration dependent σk ∈ (0, 1)
lim
k→∞
|y(k+1) − y|
|y(k) − y| = σk.
We say the sequence converges superlinearly to y if σk → 0 as k →∞. We
say the sequence converges sublinearly to y if σk → 1 as k →∞.
An alternative way of viewing this is to look at the error in the logarithmic
scale:
log(|y(k+1) − y|)− log(|y(k) − y|) = log(σk).
If log(σk) < 0 is a constant, then the logarithmic of the error decreases linearly. If
log(σk)→ −∞ as k →∞, then the error decreases superlinearly. If log(σk)→ 0
as k →∞, then the error decreases sublinearly.
Definition 6.36
Suppose we have a sequence y(1), y(2), . . . , y(k) converges to a number y. We
say the sequence converges with order q to y if
|y(k+1) − y|
|y(k) − y|q = γ,
for some γ > 0. For example, q = 2 gives quadratic convergence.
Using the logarithmic scale:
lim
k→∞
log(|y(k+1) − y|)− q log(|y(k) − y|) = log(γ).
6.4.2 Power Iteration and Inverse Iteration for Symmetric Matrices
Recall the Rayleigh quotient
r(~b) =
~b ∗A~b
~b ∗~b
, (6.16)
120 Chapter 6. Basic Algorithms for Eigenvalues
for estimating eigenvalues given a vector ~b. Now we want to assess the accuracy
of this eigenvalue estimate for symmetric matrices.
Theorem 6.37
Given a symmetric matrix A ∈ Rn×n. Suppose λK is an eigenvalue of A
with an eigenvector ~qK . The ratio
r(~b) =
~b ∗A~b
~b ∗~b
,
is a quadratically accurate estimate of the eigenvalue λK :
|r(~b)− λK | = O(‖~b− ~vK‖2), as ~b→ ~vK .
Proof. Since a symmetric matrix A has an eigendecomposition A = QΛQ∗,
where Q is an orthogonal matrix and Λ is a diagonal matrix. Each diagonal
entry of λi = Λii is an eigenvalue of A, and the corresponding i-th column of
Q, ~qi = Q(:,i) is an eigenvector with eigenvalue λi. We represent ~b as a linear
combination of all the eigenvectors ~q1, . . . , ~qn, which takes the form of
~b =
n∑
i=1
ai ~qi, or ~b = Q~a.
Now we have
~b∗A~b = ~a∗Q∗QΛQ∗Q~a = ~a∗Λ~a =
n∑
i=1
λi a
2
i ,
since Q is orthogonal.
This way the ratio r(~b) can be written as
r(~b) =
∑n
i=1 λi a
2
i
~b∗~b
Thus, the error in the eigenvalue estimate takes the form
r(~b)− λK =
∑n
i=1 λi a
2
i
~b∗~b
− λK
∑n
i=1 a
2
i
~b∗~b
=
∑n
i 6=K(λi − λK) a2i
~b∗~b
.
Now, we can express the error as a weighted sum of a2i for i 6= K, which is in
the form of
r(~b)− λK =
n∑
i 6=K
a2i wi, where wi =
λi − λK
~b∗~b
.
Given ~b = aK ~qK +
∑n
i 6=K ai ~qi, if ~b is close to ~qK , each ai for i 6= K is on
the order of ~b − ~qK , and hence a2i = O(‖~b − ~qK‖2) for i 6= K. Therefore, r(~b)
converges quadratically to the eigenvalue λK as ~b→ ~qK .
6.4. Symmetric Matrices and Rayleigh Quotient Iteration 121
Not surprisingly, applying the power iteration 6.37 to a symmetric matrix, it
will have linear convergence in the eigenvector estimate and quadratic conver-
gence in the eigenvalue estimate, given that the ratio between first and second
largest eigenvalue is not 1. A similar result holds for the inverse iteration as well.
Theorem 6.38
Given a symmetric matrix A ∈ Rn×n. Suppose its eigenvalues are ordered so
that
|λ1| ≥ |λ2| ≥ | · · · ≥ |λn|.
Let ~q1, . . . , ~qn denote (normalised) eigenvectors corresponding to each of the
eigenvalues. Suppose further we have an initial vector ~b(0) = ~x such that
~x∗~q1 6= 0. Then the vector ~b(k) in the power iteration converges as
‖~b(k) − (±~qK)‖ = O
(∣∣∣∣λ2λ1
∣∣∣∣k
)
,
and the estimated eigenvalue λ(k) converges as
|λ(k) − λK | = O
(∣∣∣∣λ2λ1
∣∣∣∣2k
)
.
Theorem 6.39
Given a symmetric matrix A ∈ Rn×n, suppose λK is the closest eigenvalue to
µ and λL is the second closest, that is,
|λK − µ| < |λL − µ| ≤ |λi − µ|, for each i 6= K.
Let ~q1, . . . , ~qn denote eigenvectors corresponding to each of the eigenvalues of
A. Suppose further we have an initial vector ~x such that ~x∗~qK 6= 0. Then the
vector ~b(k) in the inverse iteration converges as
‖~b(k) − (±~qK)‖ = O
(∣∣∣∣λK − µλL − µ
∣∣∣∣k
)
,
and the estimated eigenvalue λ(k) converges as
|λ(k) − λK | = O
(∣∣∣∣λK − µλL − µ
∣∣∣∣2k
)
.
6.4.3 Rayleigh Quotient Iteration
Once given a good estimate of eigenvalue, the inverse iteration demonstrates
great speed in finding the eigenvector, while the Rayleigh quotient estimates the
eigenvalue for a given vector. It is natural to combine both ideas. This leads to
the Rayleigh Quotient Iteration.
122 Chapter 6. Basic Algorithms for Eigenvalues
Algorithm 6.40: Rayleigh Quotient Iteration
Input: Matrix A ∈ Rn×n and an initial vector ~b(0) = ~x ∈ Rn where ‖~x‖ = 1.
Output: An eigenvalue λ(m) and its eigenvector ~b(m)
1: λ(0) =
(
~b(0)
)∗ (
A~b(0)
)
2: for k = 1, 2, . . . ,m do
3: Solve (A− λ(k−1)I)~w(k) = ~b(k−1) for ~w(k) . Apply (A− λ(k−1)I)−1
4: ~b(k) = ~w(k)/‖~w(k)‖ . Normalise
5: λ(k) =
(
~b(k)
)∗ (
A~b(k)
)
. Estimate eigenvalue
6: end for
In the Rayleigh quotient iteration, we first have an estimate of the eigenvalue
for the initial vector. Then, in each iteration, we feed the estimated eigenvalue
from the previous step into the shifted inverse iteration (for estimating eigenvec-
tor). This leads to spectacular convergence.
Theorem 6.41
Given a symmetric matrix A ∈ Rn×n. Suppose the initial vector is close to an
eigenvector ~qK corresponding to an eigenvalue λK . Then the vector ~b
(k) in the
Rayleigh quotient iteration converges cubically as
‖~b(k+1) − (±~qK)‖ = O
(∣∣∣‖~b(k) − (±~qK)‖∣∣∣3) ,
and the estimated eigenvalue λ(k) converges cubically as
|λ(k+1) − λK | = O
(
|λ(k) − λK |3
)
.
Note the ± sign on both sides are not necessarily the same in above equations.
Proof. Here we employ a rather restrictive assumption that the eigenvalue λK
is simple. Let ‖~b(k) − (±~qK)‖ = , and for sufficiently small , using Theorem
6.37 we can show that
|λ(k) − λK | = O(2).
Now consider taking one step of the inverse iteration, the error of eigenvector
estimates in adjacent steps can be written as
‖~b(k+1) − (±~qK)‖
‖~b(k) − (±~qK)‖
= O
(∣∣∣∣λK − λ(k)λL − λ(k)
∣∣∣∣) .
Since |λ(k) − λK | = O(2) and the right hand side of the above equation is on
the order of λK − λ(k), we have
‖~b(k+1) − (±~qK)‖ = O(‖~b(k) − (±~qK)‖2) = O(3).
This completes the proof of the first equation (convergence of the eigenvector
estimate is cubic). In the eigenvalue estimate at step k+ 1, since the Rayleigh
quotient is quadratically accurate, we have
|λ(k+1) − λK | = O(‖~b(k+1) − (±~qK)‖2) = O(6).
6.4. Symmetric Matrices and Rayleigh Quotient Iteration 123
Compare to the accuracy of the eigenvalue estimate at step k, which is O(2),
we can conclude that the second equation (convergence of the eigenvector es-
timate is cubic) also holds.
With a similar reasoning, we can show that the Rayleigh quotient iteration
converges quadratically on non-symmetric matrices.
6.4.4 Summary of Power, Inverse, and Rayleigh Quotient Iterations
The convergence of the power, inverse, and Rayleigh quotient iterations can be
summarised in the Table 6.1. We note that the Rayleigh quotient iteration may
Table 6.1: Let a =
∣∣∣λ2λ1 ∣∣∣ and b = ∣∣∣λK−µλL−µ ∣∣∣ as defined in the power iteration and
the inverse iteration.
Symmetric matrices Non-symmetric matrices
Eigenvector Eigenvalue Eigenvector Eigenvalue
Power Linear O(ak) Linear O(a2k) Linear O(ak) Linear O(ak)
Inverse Linear O(bk) Linear O(b2k) Linear O(bk) Linear O(bk)
Rayleigh Cubic Cubic Quadratic † Quadratic †
not always converge for non-symmetric matrices, the quadratic convergence can
be only obtained in limited cases.
In terms of operations counts, the power iteration requires O(n2) flops per
iteration for handling matrix-vector products. The inverse and Rayleigh quo-
tient iterations require solving a linear system for eigenvector estimation and
an additional matrix-vector-product for eigenvalue estimation. For a general
dense matrix, these two operations require O(n3) and O(n2) flops, respectively.
In general, if we can transform the input matrix into a reduced form, namely
a tridiagonal matrix (for the symmetric case) or a Hessenberg matrix (for the
general case), the order of the operations counts may be greatly reduced.
124 Chapter 6. Basic Algorithms for Eigenvalues
Chapter 7
QR Algorithm for
Eigenvalues
Many general purpose eigenvalue solvers are based on the Schur factorisation.
Recall that the Schur factorisation of a matrix A ∈ Rn×n takes the form
A = QTQ∗,
where T is upper-triangular and Q is unitary. The eigenvalues of T , and hence
the eigenvalues of A, are the entries on the main diagonal of T .
We aim to construct a sequence of elementary unitary transformationsQ∗kAQk,
so the product
Q∗k · · ·Q∗2Q∗1 AQ1Q2 · · ·Qk (7.1)
converges to an upper triangular matrix T as k →∞. Effectively, we construct
a unitary matrix Q in the form of
Q = Q1Q2 · · ·Qk,
in this process. For a real symmetric matrix A ∈ Rn×n, let each Qk ∈ Rn×n to
be an orthogonal (real) matrix, then Q∗k · · ·Q∗2Q∗1 AQ1Q2 · · ·Qk should also be
symmetric and real. Therefore, the same algorithm should produce an upper-
triangular and symmetric matrix T , which is diagonal.
7.1 Two Phases of Eigenvalue Computation
Definition 7.1
Hessenberg matrix is a nearly triangular square matrix. An upper Hessen-
berg matrix has zero entries below the first subdiagonal, and a lower Hessenberg
matrix has zero entries above the first superdiagonal, as shown below.
×××××
×××××
××××
×××
××

Upper Hessenberg

××
×××
××××
×××××
×××××

Lower Hessenberg
125
126 Chapter 7. QR Algorithm for Eigenvalues
The sequence (7.1) is usually split into two phases. In the first phase, a matrix
is transformed to an upper Hessenberg matrix by a direct method. In the second
phase, an iterative process (as described earlier on) is applied to transform the
Hessenberg matrix to an upper triangular matrix. The process looks like the
following:
×××××
×××××
×××××
×××××
×××××

A 6=A∗
Phase 1−−−−−→
Q∗0AQ0

×××××
×××××
××××
×××
××

Hessenberg
Phase 2−−−−−→
Q∗AQ

×××××
××××
×××
××
×

T
For a real symmetric matrix, Phase 1 will produce an upper Hessenberg and
symmetric matrix, which is tridiagonal. Phase 2 will produce a diagonal matrix
as previously discussed.
×××××
×××××
×××××
×××××
×××××

A=A∗
Phase 1−−−−−→
Q∗0AQ0

××
×××
×××
×××
××

Tridiagonal
Phase 2−−−−−→
Q∗AQ

×
×
×
×
×

T
Phase 1 uses a direct method that has the operation count comparable to QR
or LU factorisation. By transforming the matrix to an upper Hessenberg or
tridiagonal matrix, the operation count of matrix factorisations can be reduced
by utilising the structure of Hessenberg or tridiagonal matrix. This fact can be
used to greatly reduce the operation count of the iterative process in Phase 2.
7.2. Hessenberg Form and Tridiagonal Form 127
7.2 Hessenberg Form and Tridiagonal Form
To compute the Schur factorisation A = QTQ ∗, we would like to apply unitary
similarity transformation to A so that zeros below diagonal can be introduced.
×××××
×××××
×××××
×××××
×××××

A=QTQ ∗
Q ∗AQ−−−−→

×××××
××××
×××
××
×

T=Q ∗AQ
The first thought could be applying the Householder reflection to create such a
unitary Q that triangularise the matrix A, as in the QR factorisation case:
×××××
×××××
×××××
×××××
×××××

A=QR
Q ∗A−−−→

×××××
××××
×××
××
×

R=Q ∗A
.
However, this does not work in general.
Example 7.2
Consider the following symmetric matrix A,
A =

34 47 5 18 26
47 10 13 26 34
5 13 26 39 47
18 26 39 42 5
26 34 47 5 18
 .
It has a QR factorisation A = QR, where
Q =

−0.51315 0.50931 0.68177 −0.097865 −0.053739
−0.70936 −0.69517 −0.032877 −0.097865 −0.053739
−0.075464 0.22987 −0.36671 −0.59133 −0.67625
−0.27167 0.29808 −0.42781 −0.39282 0.70711
−0.39241 0.34006 −0.46541 0.69056 −0.19208
 ,
and
R =

−66.2571 −52.5982 −42.7879 −43.9953 −49.4287
0 39.2866 27.0942 14.2779 8.0216
0 0 −45.1121 −23.1797 −11.1436
0 0 0 −40.4136 −23.1985
0 0 0 0 −34.9301
 .
However, the resulting unitary similarity transformation defined by Q is
Q ∗AQ =

−14.9985 91.9334 68.0421 2.957 7.1403
36.6527 −30.0301 −14.7725 0.18308 9.0688
−27.8888 4.3505 37.7858 20.5252 7.1294
5.2017 5.2017 39.5858 −0.52871 −23.4519
1.8771 1.8771 23.6216 −24.6996 6.7092
 ,
128 Chapter 7. QR Algorithm for Eigenvalues
which clearly does not lead to a triangular matrix.
One step of the Householder reflection changes all the rows of A:
×××××
×××××
×××××
×××××
×××××

A
Q ∗1 A−−−→

×××××
0 ××××
0 ××××
0 ××××
0 ××××

Q ∗1 A
.
Now we multiply Q ∗1A with Q1 to complete the unitary transformation. Since
Q ∗1AQ1 = Q
∗
1 (Q
∗
1A)
∗,
we effectively apply the same Householder reflector to (Q ∗1A)
∗. This changes all
the rows of (Q ∗1A)
∗, or all the columns of Q ∗1A, so this may destroy the zeros
introduced previously.
× 0 0 0 0
×××××
×××××
×××××
×××××

(Q ∗1 A) ∗
Q ∗1 (Q
∗
1 A)
∗
−−−−−−−→

×××××
×××××
×××××
×××××
×××××

Q ∗1 (Q
∗
1 A)
∗
(·) ∗−−−→

×××××
×××××
×××××
×××××
×××××

Q ∗1 AQ1
Example 7.3
Consider the following symmetric matrix A, one step of householder reflection
(aiming at creating zeros below A(1,1))
A =

34 47 5 18 26
47 10 13 26 34
5 13 26 39 47
18 26 39 42 5
26 34 47 5 18
 ,
leads to a matrix
Q1A =

−66.2571 −52.5982 −42.7879 −43.9953 −49.4287
0 −36.6911 −9.4027 −3.0631 −1.3606
0 8.0329 23.6167 35.9082 43.2382
0 8.1183 30.4202 30.8695 −8.5423
0 8.1709 34.607 −11.0774 −1.5612
 ,
where
Q1 = I − 2u1u ∗1 , u1 =

0.86981
0.40776
0.043379
0.15617
0.22557
 .
7.2. Hessenberg Form and Tridiagonal Form 129
Now, multiply Q1A with Q
∗
1 , we have
Q1AQ
∗
1 =

105.8884 28.1027 −34.2027 −13.0886 −4.7856
28.1027 −23.5167 −8.0012 1.9824 5.9274
−34.2027 −8.0012 21.911 29.7675 34.3683
−13.0886 1.9824 29.7675 28.5196 −11.9367
−4.7856 5.9274 34.3683 −11.9367 −2.8022
 ,
which no longer has those zeros introduced by Q1A.
7.2.1 Householder Reduction to Hessenberg Form
Instead of directly transforming a matrix A to a triangular form, we can trans-
form it to a Hessenberg form (Phase 1 of the eigenvalue solvers), and then find
other ways to obtain the Schur factorisation of the Hessenberg matrix.
This can be archived by applying a Householder reflector to the second row of
the matrix A at the start. Consider a square matrix A ∈ Rn×n can be partitioned
as the following:
A =

A11 ~a
>
1
~b1 A2
 .
We want to first find a Householder reflector that transforms~b1 to−(~b1(1))‖~b1‖~e1,
which effectively create zeros below the first entry of ~b1. The Householder trans-
formation is defined by the unit vector
~u1 =
~v1
‖~v1‖ , where ~v1 =
~b1 + (~b1(1))‖~b1‖~e1,
that determines the reflection hyperplane. This way, we can create a unitary
matrix Q1 ∈ Rn×n that leaves the first row of A unchanged
Q1 =

1 · · · 0 · · ·
...
0 U1
...
 . (7.2)
where U1 = I − 2~u1~u ∗1 is the Householder transformation matrix constructed
with respect to ~b1.
After multiplying Q ∗1 on the left of A, which has the form of
Q ∗1A =

A11 ~a
>
1
±‖~b1‖
0 U ∗1 A2
...
 , (7.3)
130 Chapter 7. QR Algorithm for Eigenvalues
we multiply Q1 on the right of Q
∗
1A. This time, the matrix Q1 leaves the first
column of Q ∗1A unchanged, in which we have
Q ∗1AQ1 =

A11 ~a
>
1 U1
±‖~b1‖
0 U ∗1 A2U1
...
 . (7.4)
Let A˜2 = U
∗
1 A2U1, we can have
~b2 = A˜2(:,1), and repeat the above process.
Here the unitary matrix Q2 should takes the form of
Q2 =

I2 · · · 0 · · ·
...
0 U2
...
 . (7.5)
where U2 = I−2~u2~u ∗2 is the Householder transformation matrix defined by a unit
vector ~u2. The matrix Q2 leaves the first two rows and columns of Q
∗
1AQ1 by
multiplying with it on both sides. This process is called Householder reduction.
Example 7.4
Consider the following symmetric matrix A,
A =

34 47 5 18 26
47 10 13 26 34
5 13 26 39 47
18 26 39 42 5
26 34 47 5 18
 ,
one step of householder reduction (by using a transformation aiming at creating
zeros below A(2,1)) leads to a matrix
Q ∗1AQ1 =

34 −56.8683 0 0 0
−56.8683 63.585 −42.2045 −23.7277 −17.8221
0 −42.2045 20.561 26.5925 30.0411
0 −23.7277 26.5925 23.1555 −18.7528
0 −17.8221 30.0411 −18.7528 −11.3015
 ,
After two steps we have
Q ∗2Q
∗
1AQ1Q2 =

34 −56.8683 0 0 0
−56.8683 63.585 51.5932 0 0
0 51.5932 48.3358 −3.7237 4.6293
0 0 −3.7237 6.0401 −32.2763
0 0 4.6293 −32.2763 −21.961
 .
7.2. Hessenberg Form and Tridiagonal Form 131
7.2.2 Implementation and Computational Cost
Remark 7.5
In this section, since we are dealing with real square matrices, each Householder
transformation matrix and the resulting Q ∗k · · ·Q ∗1AQ1 · · ·Qk are real. This
way, the conjugate transpose is equivalent to transpose here.
To set all the entries below the first subdiagonal of a matrix zero, aka the
Hessenberg form, the Householder reduction has to be applied n− 2 steps. The
algorithm is formulated below.
Algorithm 7.6: Householder Reduction to Hessenberg Form
Input: A matrix A ∈ Rn×n
Output: A Hessenberg matrix A ∈ Rn×n and a sequence of vectors ~uk, k =
1, . . . , n− 2 that defines the sequence of unitary similarity transformations.
1: for k = 1, . . . , n− 2 do
2: ~b = A(k+1:n, k)
3: ~v = ~b+ sign(~b(1))‖~b‖~e1
4: ~uk = ~v/‖~v‖
5: A(k+1, k) = −sign(~b(1))‖~b‖
6: A(k+2:n, k) = 0
7: A(k+1:n, k+1:n) = A(k+1:n, k+1:n)− (2~uk)
(
~u>k A(k+1:n, k+1:n)
)
8: A(1:n, k+1:n) = A(1:n, k+1:n)− (A(1:n, k+1:n) ~uk)
(
2~u>k
)
9: end for
Remark 7.7
As in the case of applying Householder reflection for computing QR factorisa-
tion, the sequence of matrices Qk, k = 1, . . . , n−2 are not formulated explicitly
and can be reconstructed from ~uk, k = 1, . . . , n− 2 if necessary.
At k-th iteration of the above algorithm, the work required in computing the
unit vector ~uk is proportional to n− k (Steps 2-4). Similarly, the work required
in applying Householder reflection to ~b is about n− k flops (Steps 5 and 6). The
dominating cost lies in the last two lines inside the for loop.
In Step 7, the operations A(k+1:n, k+1:n)−· · · and (2~uk) (· · · ) requires (n−
k)2 flops, whereas
(
~u>k A(k+1:n, k+1:n)
)
requires 2(n− k)2 flops (multiplication
and addition). Thus, the work of Step 7 is about 4(n− k)2. Step 8 needs more
work, as the operations · · · (2~u>k ) and A(1:n, k+1:n)−· · · requires n(n−k) flops,
whereas (A(1:n, k+1:n) ~uk) requires 2n(n− k) flops. Thus, the work of Step 8 is
about 4n(n− k).
This way, the total work of applying the Householder reduction to transform
132 Chapter 7. QR Algorithm for Eigenvalues
matrix to the Hessenberg Form is about:
W =
n−2∑
k=1
4n(n− k) + 4(n− k)2 +O(n− k)
= 4n
n−2∑
k=1
(n− k) + 4
n−2∑
k=1
(n− k)2 +O
(
n−2∑
k=1
(n− k)
)
= 2n3 +
4
3
n3 +O(n2)
=
10
3
n3 +O(n2). (7.6)
As expected, the dominant term in the expression for the computational work
is proportional to n3. We say that the computational complexity of the trans-
formation to the Hessenberg Form is cubic in the size of the square matrix,
n.
7.2.3 The Symmetric Case: Reduction to Tridiagonal Form
If the matrix is symmetric, the above algorithm produces a tridiagonal matrix.
Theorem 7.8
The Householder reduction of a symmetric matrix takes a symmetric tridiag-
onal form.
Proof. Since A is symmetric, Q>AQ is also symmetric. A symmetric Hessen-
berg matrix T has zero entries below the first subdiagonal (by the definition
of Hessenberg matrix) and zero entries above the first superdiagonal (by sym-
metry), and thus is tridiagonal.
By using the symmetry, the cost of applying the left and right Householder
reflections (Steps 5-8) can be further reduced. The resulting algorithm is formu-
lated below.
7.2. Hessenberg Form and Tridiagonal Form 133
Algorithm 7.9: Householder Reduction to Tridiagonal Form
Input: A matrix A ∈ Rn×n
Output: A Hessenberg matrix A ∈ Rn×n and a sequence of vectors ~uk, k =
1, . . . , n− 2 that defines the sequence of unitary similarity transformations.
1: for k = 1, . . . , n− 2 do
2: ~b = A(k+1:n, k)
3: ~v = ~b+ sign(~b(1))‖~b‖~e1
4: ~uk = ~v/‖~v‖
5: A(k+1, k) = −sign(~b(1))‖~b‖
6: A(k, k+1) = A(k+1, k)
7: A(k+2:n, k) = 0
8: A(k, k+2:n) = 0
9: ~t = A(k+1:n, k+1:n)~uk
10: σ = 2~u ∗k~t
11: ~p = 2(~t− σ~uk)
12: A(k+1:n, k+1:n) = A(k+1:n, k+1:n)− ~p~u ∗k − ~uk~p ∗
13: end for
At iteration k, the matrix Q ∗k−1 · · ·Q ∗1AQ1 · · ·Qk−1 is symmetric and is tridi-
agonal in the submatrix A(1:k-1, 1:k-1). This way, the left and right multi-
plication with Qk effectively creates zeros below A(k + 1, k) and to the right
of A(k, k + 1), and then multiplies Uk on the left and right on the submatrix
A(k+1:n, k+1:n). The key to reducing the computational cost is to reformulate
the following operation:
U ∗k A(k+1:n, k+1:n)Uk =A(k+1:n, k+1:n) + 4~uk (~u
∗
kA(k+1:n, k+1:n)~uk) ~u
∗
k−
2~uk (~u
∗
kA(k+1:n, k+1:n))− 2 (A(k+1:n, k+1:n)~uk) ~u ∗k
by introducing
~t = A(k+1:n, k+1:n)~uk (7.7)
σ = 2~u ∗k~t (7.8)
~p = 2(~t− σ~uk) (7.9)
This way, we can rewrite U ∗k A(k+1:n, k+1:n)Uk as a rank-2 update in the form
of
U ∗k A(k+1:n, k+1:n)Uk = A(k+1:n, k+1:n)− ~p~u ∗k − ~uk~p ∗.
Since we only need to store and operate with the half number of entries of a
symmetric matrix, the work of the above operation is about 2(n − k)2 flops,
together with the 2(n−k)2 flops required by computing ~t. The dominating work
in each iteration is about 4(n − k)2, which brings the total work estimate to
∼ 43n3.
Remark 7.10
Algorithm 7.9 is provided as background information for intereted readers.
The key message here is that the symmetry can reduce the total work load to
∼ 43n3 by 1) avoiding unnecessry zeros and 2) only operating with either lower
or upper triangular part of the matrix.
134 Chapter 7. QR Algorithm for Eigenvalues
7.2.4 QR Factorisation of Hessenberg Matrices
The Hessenberg and tridiagonal matrices provide substantial computational ad-
vantages in computing matrix factorisations such as LU and QR compared with
applying such factorisations to general square matrices. Here we give examples
on QR factorisation to demonstrate the computational reduction of using the
Hessenberg and tridiagonal matrices in terms of operation counts.
Recall that that QR factorisation transform a matrix A ∈ Rn×n into the
product of an orthogonal matrix Q ∈ Rn×n and an upper-triangular matrix
R ∈ Rn×n. The Householder reflection finds a sequence of Q1, Q2, . . . and hence
the Q = Q1Q2 . . . to achieve this.
Given a Hessenberg matrix H ∈ Rn×n, we can partition H as
H =

H11 H12 H13 × · · · ×
H21 H22 H23 × × ×
H32 × × × ×
× × × ×
× × ×
× ×
 =

~h1 ~a
>
1
~01 H2
 . (7.10)
where ~h1 = H(1:2, 1) ∈ R2, ~a ∗1 = H(1, 2:end) ∈ Rn−1, ~01 ∈ Rn−2 and H2 =
H(2:end, 2:end) ∈ R(n−1)(n−1). Note that H2 is also a Hessenberg matrix.
Applying the first step of the Householder reflection, we aim to find Q1 to
create zeros below the first row of H(:,1). We need to have
Q1H(:,1) = Q1

H11
H21
~01
 =

±‖~h1‖
0
~01
 .
Fortunately the first column of H is filled by zeros below the second row. Thus
we only need to apply a 2-dimensional Householder reflection to ~h1 ∈ R2. This
way we want to find a 2-by-2 orthogonal matrix U1 such that
U1~h1 =
[
H11
H21
]
=
[
±‖~h1‖
0
]
.
Using the procedure introduced in Householder reflection, we have
~t = ~h1 − U1~h1 = ~h1 + (~h1(1)) ‖~h1‖~e1, (7.11)
~s = ~t/‖~t‖, (7.12)
U1 = I2 − 2~s~s ∗. (7.13)
All the above operations are carried in a 2-dimensional space. Then the matrix
Q1 takes the form of
7.2. Hessenberg Form and Tridiagonal Form 135
This way, the first full Householder transformation can be written as
Q1H = Q1

H11 H12 H13 × · · · ×
H21 H22 H23 × · · · ×
H32 × × × ×
× × × ×
× × ×
× ×
 =

r11 ~r
>
1
0
0
... H˜2
0
0

. (7.14)
Note that only the first two rows of the matrix H (marked in red) are modified
by Q1. The resulting H˜2 is also a Hessenberg matrix. In fact, H˜2 in (7.14)
and H2 in (7.10) only differ in the first row. Then we can repeatedly carry this
operation for n− 1 steps as in the QR factorisation.
At each step k, the dimension of the Hessenberg matrix to be transformed is
n−k+1, thus the amount of work required in applying Qk is ∼ 7(n−k+1)+O(1).
Overall, the work of applying n − 1 step of Householder transformation to a n-
by-n Hessenberg matrix requires ∼ 72n2 flops. We say that the computational
complexity of QR factorisation of a Hessenberg matrix is quadratic in the size
of the matrix.
If the matrix H is tridiagonal, the number of multiplication with the House-
holder matrix Qk required is 3 in each iteration, as shown below:
Q1H = Q1

× ×
× × ×
× × ×
× × ×
× × ×
× ×
 =

r11 ~r
>
1
0
0
... H˜2
0
0

. (7.15)
Therefore, the work of applying n − 1 step of Householder transformation to a
n-by-n tridiagonal matrix is linearly proportional to the size of the matrix, n.
We say that the computational complexity of QR factorisation of a tridiagonal
matrix is linear in the size of the matrix.
136 Chapter 7. QR Algorithm for Eigenvalues
7.3 QR algorithm without shifts
The QR algorithm, which iterative carries the QR factorisation at its core, is
one of the most celebrated algorithms in scientific computing. Here we show its
simplest form and look into several fundamental aspects of this algorithm.
Algorithm 7.11: QR Algorithm Without Shifts
Input: Matrix A ∈ Rn×n.
Output: A unitary matrix Q(k) and a matrix A(k)
1: A(0) = A
2: Q(0) = I
3: for k = 1, 2, . . . do
4: U (k)R(k) = A(k−1) . Apply the QR factorisation to A(k−1)
5: A(k) = R(k)U (k) . Recombine factors in reverse order
6: Q(k) = Q(k−1)U (k)
7: end for
At the core, all we do is compute the QR factorisation, and then multi-
ply R and U in the reverse order RU , and repeat. Using the identity R(k) =
(U (k))∗A(k−1), it can be shown that this algorithm is applying a sequence of
unitary similarity transformation to in the input matrix A, in the form of
A(k) = (U (k))∗A(k−1)U (k)
= (U (k))∗(U (k−1))∗ · · · (U (1))∗︸︷︷︸
(Q(k))∗
A U (1) · · ·U (k−1)U (k)︸︷︷︸
Q(k)
. (7.16)
Under certain assumptions, this simple algorithm converges to the Schur factori-
sation. That is, A(k) will be upper triangular if A is arbitrary, diagonal if A is
symmetric.
7.3.1 Connection with Simultaneous Iteration
One way to understand the QR algorithm is to relate it to the power iteration.
Here we consider applying power iteration to several vectors simultaneously.
This is also often referred to as block power iteration. Now consider we have a
set of orthonormal initial vectors {~p1, . . . ~ps}, we apply the power iteration to
this set of vectors P (such that P (:,j) = ~pj) and normalise the new set of vectors
AP (k−1) in each iteration using the QR factorisation. This leads to the following
algorithm.
Algorithm 7.12: Simultaneous Iteration
Input: Matrix A ∈ Rn×n and a set of orthonormal initial vectors P (0).
Output: A matrix P (k)
1: P (0) = I
2: for k = 1, 2, . . . do
3: Z(k) = AP (k−1) . Apply the matrix A
4: P (k)T (k) = Z(k) . QR factorisation
5: end for
As a result of P (k) = AP (k−1)(T (k))−1, we have
AkP (0) = P (k) T (k)T (k−1) · · ·T (1)︸︷︷︸
T (k)
.
7.3. QR algorithm without shifts 137
Using the following property of triangular matrices, we can show that the matrix
T (k) = T (k)T (k−1) · · ·T (1) is upper triangular. Therefore the simultaneous iter-
ation effectively computes (in exact arithmetic) the QR factorisation of AkP (0).
Remark 7.13: Properties of Triangular Matrices
The product of two upper triangular matrices is upper triangular and the
inverse of an triangular matrices is upper triangular.
Theorem 7.14
Given an initial matrix P (0) = I to the simultaneous iteration, it is equivalent
to the QR algorithm without shifts.
Proof. This can be shown by induction. Throughout the proof, we assume
the upper triangular matrices of the QR factorisations used by both the QR
algorithm and the simultaneous iteration have positive diagonal entries. We
carry the QR algorithm without shifts and the simultaneous iteration for the
first step. This leads to
QR algorithm:
A(0) = A,
Q(0) = I,
U (1)R(1) = A(0) = A, (7.17)
Q(1) = Q(0)U (1) = U (1), (7.18)
A(1) = R(1)U (1) = (Q(1))∗AQ(1), (7.19)
Simultaneous iteration:
P (0) = I,
Z(1) = AP (0) = A, (7.20)
P (1)T (1) = Z(1) = A. (7.21)
After the first iteration, we can verify that Q(1) = P (1) and R(1) = T (1), and
thus, these two algorithms are equivalent after the first iteration.
In the second iteration, these two algorithms are carried forward as following:
QR algorithm:
U (2)R(2) = A(1) = (U (1))∗AU (1) (7.22)
Q(2) = Q(1)U (2) = U (1)U (2), (7.23)
A(2) = R(2)U (2) = (Q(2))∗AQ(2), (7.24)
Simultaneous iteration:
Z(2) = AP (1) = AU (1), (7.25)
P (2)T (2) = Z(2) = AU (1). (7.26)
Since A(1) = (Q(1))∗AQ(1), we have A = Q(1)A(1)(Q(1))∗, and hence Equation
(7.26) can be written as
P (2)T (2) = Q(1)A(1).
138 Chapter 7. QR Algorithm for Eigenvalues
Multiplying both sides of the above equation by (Q(1))∗ leads to(
(Q(1))∗P (2)
)
T (2) = A(1)
From the QR algorithm, we have that U (2)R(2) = A(1). This leads to
T (2) = R(2) and (Q(1))∗P (2) = U (2),
and hence
P (2) = Q(1)U (2) = Q(2).
Thus, these two algorithms are equivalent after two iterations.
Suppose P (k−1) = Q(k−1) and T (k−1) = R(k−1) hold, at k-th iteration, these
two algorithms satisfy following:
QR algorithm:
U (k)R(k) = A(k−1) = (Q(k−1))∗AQ(k−1) (7.27)
Q(k) = Q(k−1)U (k), (7.28)
A(k) = (Q(k))∗AQ(k), (7.29)
Simultaneous iteration:
Z(k) = AP (1) = AQ(k−1), (7.30)
P (k)T (k) = Z(k) = AQ(k−1). (7.31)
Since A(k−1) = (Q(k−1))∗AQ(k−1), we have A = Q(k−1)A(k−1)(Q(k−1))∗, and
hence Equation (7.31) can be written as
P (k)T (k) = Q(k−1)A(k−1).
Multiplying both sides of the above equation by (Q(k−1))∗ leads to(
(Q(k−1))∗P (k)
)
T (k) = A(k−1).
Thus we can show that T (k) = R(k) and (Q(k−1))∗P (k) = U (k). The latter
leads to P (k) = Q(k−1)U (k) = Q(k).
The above proof employs a property of the QR factorisation:
Theorem 7.15
For any nonsingular matrix A, there exists a unique pair of unitary matrix
Q and upper triangular matrix R with positive diagonal entries such that
A = QR.
Remark
The product of two upper triangular matrices with positive diagonal entries
is also an upper triangular matrix with positive diagonal entries. The inverse
of an upper triangular matrix with positive diagonal entries is also an upper
7.3. QR algorithm without shifts 139
triangular matrix with positive diagonal entries.
Remark 7.16
At this point, we are able to show that the sequence of unitary similarity
transformations in the QR algorithm
A(k) = (Q(k))∗AQ(k), (7.32)
Q(k) = U (1) · · ·U (k−1)U (k), (7.33)
can be defined by the QR factorisation of Ak in the form of
Ak = Q(k)T (k) (7.34)
T (k) = R(k)R(k−1) · · ·R(1). (7.35)
This relation is the key to understand the QR algorithm and to analyse its
convergence.
7.3.2 Convergence to Schur Form
Yet the remaining question is that why the sequence of transformations A(k) =
(Q(k))∗AQ(k) is able to construct a Schur form?
This is not very surprising, since the sequence Q(k) = U (1) · · ·U (k−1)U (k) is
orthogonal and converges, then Q(k+1) = Q(k)U (k) should be arbitrarily close to
Q(k) for sufficiently large k. This way we have U (k) = I for sufficiently large k.
Recall that in each iteration of the QR algorithm we have the QR factorisation
U (k)R(k) = A(k−1), we can see that A(k−1) is upper triangular if U (k) = I. We
formalise this intuition below.
Theorem 7.17
Let A ∈ Rn×n be a real matrix with distinct eigenvalues and all eigenvalues
are greater than zero,
λ1 > λ2 > · · · > λn > 0.
Suppose A has the eigendecomposition A = V ΛV −1 and the matrix V has the
QR factorisation V = QR where R is upper triangular with positive entries.
Then Q(k) converges to the QR factorisation of V as
‖Q(k)D −Q‖ = O(σk),
for some diagonal matrix D such that Dii = ±1, where σ < 1 is a constant
such that
σ = max
{∣∣∣∣λ2λ1
∣∣∣∣ , · · · , ∣∣∣∣ λnλn−1
∣∣∣∣} .
Proof. Given the eigendecomposition A = V ΛV −1, we have that Ak =
V ΛkV −1. After k steps of simultaneous iteration Ak has the QR factorisation
Ak = Q(k)T (k). Thus the following relation holds:
V ΛkV −1 = Q(k)T (k).
140 Chapter 7. QR Algorithm for Eigenvalues
Considering the QR factorisation of V −1 and substituting V −1 = LU into the
above equation leads to
V ΛkLU = Q(k)T (k),
and then by multiplying U−1Λ−k on both sides of the equation, we have
V ΛkLΛ−k = Q(k)T (k)U−1Λ−k. (7.36)
Without loss of generality, we can assume that the diagonal entries of the
matrix L takes value ±1, and diagonal entries of the matrix U are positive.
ΛkLΛ−k =

±1, i = j
0, i < j
Lij
(
λi
λj
)k
, i > j
,
Thus, the ΛkLΛ−k converges to a diagonal matrix D where Dii = Lii as(
λi
λj
)k
→ 0. Since eigenvalues are ordered, the ratio λiλj , i > j is bounded from
the above by the pair of eigenvalues with the largest ratio λi+1λi .
This convergence is on the order of O(σk), where σ is the largest ratio between
a pair of distinct eigenvalues | λiλj |, i > j.
Since the left hand side of Equation (7.36) converges to V D as k → ∞ and
D2 = I, it can be expressed as
V =
(
Q(k)D
)(
DT (k)U−1Λ−kD
)
, k →∞,
where T (k)U−1Λ−k is upper triangular with positive diagonal entries and
DT (k)U−1Λ−kD is also upper triangular with positive diagonal entries. Thus,
this determines a unique QR factorisation of V as k → ∞. Therefore Q(k)D
converges to the orthogonal matrix of the QR factorisation of eigenvectors V .
The assumptions that all eigenvalues of A must be positive can be removed
using the absolute value of eigenvalues instead of eigenvalues in construction
Λ−k. We also do not have to assume that eigenvalues are non-repeating, as
we can specify orthogonal eigenvectors (basis vectors of the eigenspace) for an
eigenvalue with geometric multiplicity larger than one.
7.3.3 The Role of Hessenberg Form
As we discussed early on, transforming a matrix to the Hessenberg form allows for
a significant reduction in computing the QR factorisation—O(n2) for a general
matrices and O(n) for symmetric matrices. It seems we can use this fact to
reduce the operation counts in each iteration of the QR algorithm given that
each of the Hessenberg form can be retained in each iteration. That is if A(0) is
a Hessenberg matrix, then each A(k) is a Hessenberg matrix. Given a Hessenberg
matrix H ∈ Rn×n and its QR factorisation H = QR, we want to verify that if
RQ retains the Hessenberg form.
7.3. QR algorithm without shifts 141
QR
Recall that we can partition the matrix H as
H =

H11 H12 H13 × · · · ×
H21 H22 H23 × × ×
H32 × × × ×
× × × ×
× × ×
× ×
 =

~h1 ~a
>
1
~01 H2
 .
where ~h1 = H(1:2, 1) ∈ R2, ~a ∗1 = H(1, 2:end) ∈ Rn−1, ~01 ∈ Rn−2 and H2 =
H(2:end, 2:end) ∈ R(n−1)(n−1). Note that H2 is also a Hessenberg matrix.
To create zeros below H(1,1). We need to find a Householder matrix Q1 such
that
Q1H(:, 1) = Q1

H11
H21
~01
 =

±‖~h1‖
0
~01
 .
Effectively we only need to apply a 2-dimensional Householder reflection matrix
U1 to ~h1 ∈ R2 such that
U1~h1 =
[
H11
H21
]
=
[
±‖~h1‖
0
]
.
Then the matrix Q1 takes the form of
Q1 =

U1 0
0 I
 .
Only the top two rows of the matrix H will be modified by Q1H.
Every iteration of the QR factorisation picks the k-th column of the ma-
trix and aims to create zeros below the (k,k) entry of the matrix Hk−1 =
Qk−1 · · ·Q1H (the transformed matrix from the previous iteration) as shown
below.
Hk−1 =

k
× × × × × × ×
× × × × × ×
× × × × ×
k × × × ×
k + 1 × × × ×
× × ×
× ×

This can be archived by constructing a Householder matrix Uk w.r.t. the vector
Hk(k:k+1,k) (since all the entries of Hk below (k+1,k) are zero). This leads to
142 Chapter 7. QR Algorithm for Eigenvalues
a Householder matrix that can be applied to the original matrix,
Qk =

1 1
1
 −→ I1
Uk ←−
[
U11 U12
U21 U22
]
I2 ←−
[
1
1
]

where I1 is a k − 1 dimensional identity matrix and I2 is a n − k − 1 dimen-
sional identity matrix. The following equation demonstrates the multiplication
of QkHk−1, where red marks are entries being modified in the process.
QkHk−1 =

1 1
1
 −→ I1
Uk ←−
[
U11 U12
U21 U22
]
I2 ←−
[
1
1
]

k

× × × × × × ×
× × × × × ×
× × × × ×
× × × × k
× × × × k+1
× × ×
× ×
=
k

× × × × × × ×
× × × × × ×
× × × × ×
× × × × k
0 × × × k+1
× × ×
× ×
.
This way, we have the QR factorisation of the matrix H defined as
R = Qn−1 · · ·Q1︸︷︷︸
Q∗
H.
Note that we take the Qn out of the standard Householder reflection process, as
it only flips the value of bottom right entry of the matrix Hn−1.
RQ
Using this identity, we can express the matrix RQ as
RQ = RQ1Q2 · · ·Qn−1.
Note that we drop the (·)∗ here as each Householder reflection matrix Qj is
symmetric. Denote Rk = RQ1Q2 · · ·Qk and set R0 = R, in each multiplication
of RkQk+1, only two columns Rk(:,k:k+1) are modified by the matrix Uk+1.
This is summarised in the following equations.
7.3. QR algorithm without shifts 143
In the first step, entries below R1(2, 1:2) are zero as R is upper triangular.
The resulting matrix R1 has a Hessenberg form with the submatrix R1(2:n, 2:n)
is upper triangular, as shown in Equation (7.37).
R1 = R0Q1
=

1 2
1 × × × × × × ×
2 0 × × × × × ×
× × × × ×
× × × ×
× × ×
× ×
×


[
U11 U12
U21 U22
]
−→ U1
I ←−

1
1
1
1
1


=

1 2
1 × × × × × × ×
2 × × × × × × ×
× × × × ×
× × × ×
× × ×
× ×
×

(7.37)
If the matrix Rk−1 has a Hessenberg form, and the submatrix Rk−1(k:n,k:n)
is upper triangular, multiplying with Qk will produce a Hessenberg matrix
Rk with an upper triangular submatrix Rk(k+1:n, k+1:n)—only two columns
Rk−1(:,k:k+1) are modified by the matrix Uk in this step, and Rk(k+2:n,k:k+1)
are zero as Rk has a Hessenberg form.
Rk = Rk−1Qk
=

k k + 1
× × × × × × ×
× × × × × × ×
× × × × × ×
k × × × × ×
k + 1 0 × × ×
× ×
×


1 1
1
 −→ I1
Uk ←−
[
U11 U12
U21 U22
]
I2 ←−
[
1
1
]

=

k k + 1
× × × × × × ×
× × × × × × ×
× × × × × ×
k × × × × ×
k + 1 × × × ×
× ×
×

(7.38)
Also from this process we can conclude that computing QR = H and combining
factors in the reverse order RQ have the same total work. In QR = H, the
number of flops is about O(n− k) in iteration k, and hence a total of O(n2) for
144 Chapter 7. QR Algorithm for Eigenvalues
computing QR = H. Therefore, each step of the QR algorithm requires O(n2)
operation counts.
7.4. Shifted QR algorithm 145
7.4 Shifted QR algorithm
The QR algorithm without shift is able to iteratively decompose a matrix to a
Schur factorisation. Using its equivalence with the simultaneous iteration, we can
show its convergence property—they are equally slow. Like the Rayleigh quotient
iteration, this algorithm can be modified to incorporate shifted inverse iteration
and eigenvalue estimates. This new algorithm is outlined as the following:
Algorithm 7.18: Shifted QR Algorithm
Input: Matrix A ∈ Rn×n.
Output: A unitary matrix Q(k) and a matrix A(k)
1: A(0) = (Q(0))∗AQ(0) . Transform A to Hessenberg form.
2: for k = 1, 2, . . . do
3: Pick a shift µ(k) . E.g., µ(k) = A(k−1)(n, n)
4: U (k)R(k) = A(k−1) − µ(k)I . QR factorisation to A(k−1) − µ(k)I
5: A(k) = R(k)U (k) + µ(k)I . Recombine factors in reverse order
6: if any off diagonal entry A(k)(j+1, j) is sufficiently close to 0 then
7: Set A(k)(j+1, j) = 0 and partition H(k) as
A(k) =
[
A1 A3
0 A2
]
,
and apply the same QR algorithm to A1 and A2 separately.
8: end if
9: Q(k) = Q(k−1)U (k)
10: end for
Here Line 3 is picks the shift value, Lines 4 and 5 perform one step of inverse
iteration, and Lines 6-8 perform an operation called deflation. These steps will
be explained in the rest of this section. To keep the concept simple, we assume
the matrix A ∈ Rn×n is symmetric (and tridiagonal) and invertible in the rest of
this section. We will also only focus on the eigenvalues. The material in section
is based on [Trefethen and Bau III, 1997].
7.4.1 Connection with Inverse Iteration
To understand this algorithm, we will first find its connection with the power
iteration applied to the inverse of the matrix A−1, or inverse iteration without
shift. Recall the results from the last section, we have
Ak = Q(k)T (k), where T (k) = R(k)R(k−1) · · ·R(1),
as the result of the QR algorithm without shift. Inverting the above equation
and taking the transpose, we have
(
A−k
)>
=
((
T (k)
)−1 (
Q(k)
)>)>
.
Using the fact that A is symmetric, this leads to
A−k = Q(k)
(
T (k)
)−>
, (7.39)
146 Chapter 7. QR Algorithm for Eigenvalues
where the term
(
T (k)
)−>
is lower triangular.
Consider a permutation matrix P that reverses the row or column order
P =

1
1
...
1
 .
Remark 7.19
Multiplying P on the right of a matrix reverses the order of column order, and
multiplying P on the left of a matrix reverses the row order. This takes the
form of
A =
 ~a1 ~a2 · · · ~an−1 ~an
 , AP =
 ~an ~an−1 · · · ~a2 ~a1
 ,
and
B =

~b>1
~b>2
...
~b>n−1
~b>n
 , PB =

~b>n
~b>n−1
...
~b>2
~b>1
 .
We also have that P 2 = I, so P is orthogonal.
Multiplying both sides of Equation (7.39) by the permutation matrix P on
the right, we have
A−kP =
(
Q(k)P
)(
P
(
T (k)
)−>
P
)
, (7.40)
The first factor Q(k)P is orthogonal, and the second factor P
(
T (k)
)−>
P is
upper triangular (by revering the column and row orders of a lower triangular
matrix). Thus, Equation (7.40) can be interpreted as the QR factorisation of
A−kP . The QR algorithm without shift effectively also carries the simultaneous
iteration on A−1 with an initial matrix P . This can be expressed as the following
A−k

1
1
1
...
1
 =
~q(k)n ~q(k)n−1 · · · ~q(k)2 ~q(k)1

︸︷︷︸
Q(k)P
(
P
(
T (k)
)−>
P
)
︸︷︷︸
upper triangular
.
The last column of Q(k) is the result of applying the inverse iteration on ~en.
7.4. Shifted QR algorithm 147
7.4.2 Connection with Shifted Inverse Iteration
The significance of the inverse iteration is that it can be shifted to amplify
the difference between eigenvalues. Since the QR algorithm is both simul-
taneous iteration Ak = Q(k)T (k) and simultaneous inverse iteration A−kP =(
Q(k)P
)
(P (T (k))−>P ), we are able to incorporate shift into the QR algorithm
by simply carrying QR factorisation on the shifted matrix A− µI.
Let µ(k) denote the shift used in k-th step, one step of the shifted QR proceeds
as the following:
U (k)R(k) = A(k−1) − µ(k)I, (7.41)
A(k) = R(k)U (k) + µ(k)I. (7.42)
This implies
A(k) =
(
U (k)
)>
A(k−1)U (k),
and by induction
A(k) =
(
Q(k)
)>
AQ(k), (7.43)
Q(k) = U (1)U (2) · · ·U (k). (7.44)
Note that here each pair of Uk and R(k) is different from the QR algorithm
without shifts. Using a similar proof as in Theorem 7.14, we can show that the
shifted QR algorithm also has the following factorisation
(A− µ(k)I)(A− µ(k−1)I) · · · (A− µ(1)I) = Q(k)T (k), (7.45)
T (k) = R(k)R(k−1) · · ·R(1). (7.46)
Using the connection between the QR algortihm and simultaneous shifted
inverse iteration, we can show that
k∏
j=1
(
A− µ(j)I
)−1

1
1
1
...
1
 =
~q(k)n ~q(k)n−1 · · · ~q(k)2 ~q(k)1

︸︷︷︸
Q(k)P
(
P
(
T (k)
)−>
P
)
︸︷︷︸
upper triangular
.
Q(k) is the orthogonalisation of
∏1
j=k(A−µ(j)I), while Q(k)P is the orthogonal-
isation of
∏k
j=1(A − µ(j)I)−1. That is, the last column of Q(k) is the result of
applying inverse iteration (using the shift µ(k) to µ(1)) to the vector ~en. Generally
speaking, the last column of Q(k) converges fast to an eigenvector.
7.4.3 Connection with Rayleigh Quotient Iteration
To complete the loop, we need to pick a shift value to archive fast convergence
in the last column of Q(k), a natural choice is to use the Rayleigh quotient
µ(k) =
(~q
(k)
n )∗A~q
(k)
n
(~q
(k)
n )∗~q
(k)
n
= (~q(k)n )
∗A~q(k)n ,
148 Chapter 7. QR Algorithm for Eigenvalues
as Q(k) is orthogonal. Furthermore we have ~q
(k)
n = Q(k)~en, since
A = Q(k)A(k−1)
(
Q(k)
)>
,
we have
(~q(k)n )
∗A~q(k)n = (~en)
>A(k−1)~en = A(n, n).
Thus the (n, n) entry of the matrix A(k−1) gives an eigenvalue estimate for the
last column of Q(k) without any additional work. This is usually referred to as
the Rayleigh quotient shift.
7.4.4 Wilkinson Shift
The Rayleigh quotient shift does not guarantee the convergence. It may stall for
certain types of matrices. For example, the following matrix
A =
[
0 1
1 0
]
.
Applying the QR without shifts to this matrix does not converge as
A = QR =
[
0 1
1 0
] [
1 0
0 1
]
,
and
RQ =
[
0 1
1 0
]
= A.
The Rayleigh quotient is A(2, 2) = 0, and hence it does not shift the matrix
neither. The problem is that the matrix A has two eigenvalues 1 and -1, the
eigenvalue estimate 0 is between two eigenvalues. It has an equal tendency
towards both eigenvalues.
One particular method that can break the symmetry is call Wilkinson shift.
Instead of using the lower-rightmost entry of A, it uses the lower-rightmost 2-by-
2 submatrix of A, denoted by B = A(n-1:n, n-1:n). Suppose B takes the form
of
B =
[
a1 b1
b2 a2
]
The Wilkinson shift is the eigenvalue of B that is closer to a2. If there is a tie,
it will pick one of the two eigenvalues arbitrarily. A numerical stable formula of
µ = a2 − (δ)b1b2|δ|+√δ2 + b1b2
, where δ =
a1 − a2
2
,
where (δ) is set arbitrarily to either 1 or -1 if δ = 0. The Wilkinson shift provides
the same convergence as the Rayleigh quotient, cubic for symmetric matrices and
quadratic for general matrices. Its convergence is guaranteed.
7.4.5 Deflation
In Lines 6-8 of Algorithm 7.18, if any off diagonal entry A(j+1, j) is sufficiently
close to 0 then we can set A(j+1, j) = 0 and partition the matrix as following
A(k) =
[
A1 A3
0 A2
]
.
7.4. Shifted QR algorithm 149
This technique is call deflation. It divides the problem to sub-problems and tackle
them individually. Here we briefly explain the concept on general matrices.
Since det(A(k)) = det(A1) det(A2), finding the eigenvalues (or computing its
Schur form) of A(k) simply becomes computing the Schur form of A1 and A2
separately. Suppose we have computed the Schur factorisation of A1 and A2 in
the form of
A1 = U1T1U
∗
1 , (7.47)
A2 = U2T2U
∗
2 , (7.48)
respectively. We can construct a n-by-n unitary matrix
U (k+1) =
[
U1
U2
]
,
so that
(U (k+1))∗A(k)U (k+1) =
[
U∗1
U∗2
] [
A1 A3
0 A2
] [
U1
U2
]
=
[
T1 A˜3
0 T2
]
,
where A˜3 = U
∗
1A3U2.
150 Chapter 7. QR Algorithm for Eigenvalues
Chapter 8
Singular Value
Decomposition
8.1 Singular Value Decomposition
The singular value decomposition of a matrix is often referred to as the SVD.
The SVD factorizes a matrix A ∈ Rm×n into the product of three matrices
A = UΣV >
where U and V are orthogonal, and Σ is diagonal. Here A can be any matrix,
e.g., non-symmetric or rectangular.
8.1.1 Understanding SVD
A matrix A ∈ Rm×n is a linear transformation taking a vector ~x ∈ Rn in its row
space (or preimage), row(A), to a vector ~y = A~x in its column space (or range),
col(A). The SVD is motivated by the following geometric fact: the image of the
unit sphere under any m-by-n matrix is a hyper-ellipse.
Remark 8.1
The hyper-ellipse is a generalisation of an ellipse. In the space Rm, a
hyper-ellipse can be viewed as the surface obtained by stretching a unit
sphere in Rm by some factors σ1, σ2, . . . , σm, along some orthogonal direc-
tions ~u1, ~u2, . . . , ~um. Here each of the ~ui, i = 1, . . . ,m is a unit vector. The
vectors {σi~ui} are the principle semiaxes of the hyper-ellipse.
Figure 8.1.1: Geometrical interpretation of a linear transformation.
151
152 Chapter 8. Singular Value Decomposition
Figure 8.1.1 shows a unit sphere and the hyper-ellipse that is the image of
the unit sphere transformed by a matrix A ∈ Rm×n. Assume that m > n and
the matrix A has a rank r ≤ min(m,n), three key components of the SVD can
be defined as:
• The singular values of the matrix A are the lengths of the principle semi-
axes, σ1, σ2, . . . , σr. We often assume that the singular values are non-
negative and ordered as σ1 ≥ σ2 ≥ . . . ≥ σr > 0.
• The left singular vectors of A are orthogonal unit vectors ~u1, ~u2, . . . , ~ur
that are in the column space of A and oriented in the direction of principle
semiaxes.
• We also have the right singular vectors ~v1, ~v2, . . . , ~vr that are orthogonal
unit vectors in the row space of the matrix A such that
A~vi = ~uiσi, i = 1, . . . , r. (8.1)
The relationship between the right singular vectors, left singular vectors, and
singular values can be understood as the following: the first right singular vector
is a unit vector ~v such that the 2-norm of the vector A~v is maximised. This way,
we have
~v1 = argmax
‖~v‖=1
‖A~v‖.
The corresponding first singular value is defined as σ1 = ‖A~v1‖ and the first
left singular vector is A~v1σ1 . Then, the second right singular vector is defined as
the next unit vector ~v that is orthogonal to ~v1 and maximises the 2-norm of the
vector A~v. We have
~v2 = argmax
~v>~v1=0,‖~v‖=1
‖A~v‖.
The corresponding second singular value is σ2 = ‖A~v2‖ and the second left
singular vector is A~v2σ2 . Repeating this process we can define all the singular
values and singular vectors.
In summary, transforming a right singular vector ~vi using the matrix A leads
to the left singular ~ui multiplied with σi. Thus, right singular vectors and left
singular vectors characterise the principle directions (in row space and column
space) of the linear transformation defined by A. Singular values characterise
the “stretching” effect of this linear transformation.
Remark 8.2
The right and left singular vectors also satisfy the following duality:
A~vi = ~uiσi, A
>~ui = ~viσi,
for i = 1, . . . , r.
8.1. Singular Value Decomposition 153
8.1.2 Full SVD and Reduced SVD
Assuming m ≥ n, the collection of the equations (8.1) for all i = 1, . . . , r can be
expressed as a matrix equation
A

~v1 ~v2 . . . ~vr
 =

~u1 ~u2 . . . ~ur


σ1
σ2
. . .
σr
 , (8.2)
or
AVˆ = Uˆ Σˆ, (8.3)
in a matrix form. In this matrix form, Vˆ ∈ Rn×r and Uˆ ∈ Rm×r are matrices
with orthonormal columns, and Σˆ ∈ Rr×r is a diagonal matrix.
Columns of the matrices Vˆ ∈ Rn×r and Uˆ ∈ Rm×r are orthonormal vectors,
however, they do not form complete bases of Rn and Rm unless m = n = r. By
adding m− r unit vectors that are orthogonal to columns of Uˆ and adding n− r
unit vectors that are orthogonal to columns of Vˆ , we can extend the matrix Uˆ
to an orthogonal matrix U ∈ Rm×m and the matrix Vˆ to an orthogonal matrix
V ∈ Rn×n.
If Uˆ and Vˆ are replaced by U and V in Equation (8.3), then Σˆ will have to
change too. We can add an (m − r)× r block of zeros under the matrix Σˆ and
an m× (n− r) block of zeros on the right of Σˆ to form a new matrix Σ. This is
demonstrated as the following:
This way, we have
AV = UΣ, (8.4)
where both U and V are orthogonal matrices. This is exactly the same as
Equation (8.3) as those additional columns in U are multiplied with zeros, and
those additional columns of V are in the null space of A. Multiply both sides of
Equation (8.4) by V > on the right, we obtain the full SVD.
154 Chapter 8. Singular Value Decomposition
Definition 8.3
For a matrix A ∈ Rm×n, where m > n, the full singular value decompo-
sition is defined by an orthogonal matrix U ∈ Rm×m, an orthogonal matrix
V ∈ Rn×n, and a diagonal matrix Σ ∈ Rm×n with non-negative diagonal en-
tries in the form of
A = UΣV >. (8.5)
Definition 8.4
By eliminating those columns in U and V that are multiplied with zeros in Σ in
the full SVD, we can also define the reduced singular value decomposition
as
A = Uˆ ΣˆVˆ > =
r∑
i=1
σi ~ui ~v
>
i . (8.6)
The full SVD is often useful in deriving properties of a matrix, whereas the
reduced SVD is often very valuable for computational tasks. The full SVD and
the reduced SVD can be summarised by the following figure:
Remark 8.5
Considering the full SVD factorization, A = UΣV >, the linear transformation
defined by A can be decomposed into several steps (as shown in Figure 8.1.2:
1. Given a unit sphere in the row space of the matrix A.
2. Multiplication with V >. This is a rotation, since V is an orthogonal
matrix.
3. Multiplication with Σ. The diagonal matrix Σ stretches the new unit
sphere along its canonical basis vectors (grey lines) with singular values
σ1, σ2, . . ..
4. Multiplication with U . This is another rotation, since U is also an or-
thogonal matrix.
Thus, SVD connects the four fundamental subspaces of a linear transformation:
1. ~v1, ~v2, . . . , ~vr: an orthonormal basis for the row space of A, row(A)
2. ~u1, ~u2, . . . , ~ur: an orthonormal basis for the column space of A, col(A)
3. ~vr+1, . . . , ~vn: an orthonormal basis for the null space of A, null(A)
4. ~ur+1, . . . , ~um: an orthonormal basis for the left null space of A, null(A
>)
8.1. Singular Value Decomposition 155
Figure 8.1.2: Geometrical interpretation of the SVD.
Remark 8.6: The m < n case
For a matrix A ∈ Rm×n, where m < n, both reduced SVD and full SVD can
also be defined—a quick way of doing so is to apply the above process to the
matrix A>.
8.1.3 Properties of SVD
It is important to know that SVD exists for any general matrix A ∈ Rm×n.
Theorem 8.7
Every matrix A ∈ Rm×n has a singular value decomposition.
Proof. This can be shown using induction, we omit the proof here.
As stated in Remark 8.5, SVD can characterise all four fundamental sub-
spaces of a matrix. Here we use the full SVD of a matrix to explore some
important properties of a matrix.
Theorem 8.8
The rank of a matrix A ∈ Rm×n is equal to the number of its nonzero singular
values.
Proof. Consider the full SVD of A = UΣV >. Suppose that there are r
nonzero singular values, and hence rank(Σ) = r as the rank of a diagonal
matrix is equal to the number of nonzero entries. Since U and V are full rank,
we have rank(A) = rank(Σ) = r.
156 Chapter 8. Singular Value Decomposition
Theorem 8.9
The Frobenius norm of a matrix A ∈ Rm×n is equal to the square root of the
sum of square of its nonzero singular values, i.e.,
‖A‖F =
√√√√ r∑
i=1
σ2i .
Proof. Consider the full SVD of A = UΣV >. Since the Frobenius norm
is preserved under multiplication with orthogonal matrices, we have ‖A‖F =
‖Σ‖F . Given that ‖Σ‖F =
√∑r
i=1 σ
2
i , we have ‖A‖F =
√∑r
i=1 σ
2
i .
Theorem 8.10
The 2-norm of a matrix A ∈ Rm×n is equal to the largest singular value of the
matrix A, i.e.,
‖A‖2 = σ1.
Proof. Consider the full SVD of A = UΣV >. Since the 2-norm is preserved
under multiplication with orthogonal matrices, we have ‖A‖2 = ‖Σ‖2. Given
that ‖Σ‖2 = σ1, we have ‖A‖2 = σ1.
8.1.4 Compare SVD to Eigendecomposition
The theme of diagonalising a matrix by expressing it in terms of a new basis is
not new—it has already been discussed in eigendecomposition. A nondefective
square matrix can be transformed to a diagonal matrix of eigenvalues using a
similarity transformation defined by its eigenvectors. For a general nondefective
square matrix A ∈ Rn×n, its eigendecomposition takes the form of
A = WΛW−1,
where W is the matrix of n distinct eigenvectors and Λ is the diagonal matrix
consists of the eigenvalues of A.
SVD is fundamental different from the eigendecomposition in several aspects:
1. The SVD uses two bases U and V , whereas the eigendecomposition only
uses one.
2. The matrix W in the eigendecomposition may not be orthogonal, but the
matrices U and V in the SVD are always orthogonal.
3. The SVD does not require that the matrix A is a square matrix, as it works
for any matrices.
In applications, the eigendecomposition is usually more relevant to matrix func-
tions, e.g., Ak and exp(tA). The SVD is usually more relevant to the matrix
itself and its inverse.
8.1. Singular Value Decomposition 157
Real and symmetric matrices have a special eigendecomposition. We know
that (by Theorem 6.24) if A ∈ Rn×n is symmetric and real-valued, it has orthog-
onal eigenvectors and the eigendecomposition
A = QΛQ>
where Q is the matrix of n distinct eigenvectors and Λ is the diagonal matrix
consists of the eigenvalues of A. In this case, the singular values of A is just the
absolute value of the eigenvalues of A. Using the eigendecomposition of A, we
can express the SVD as
A = Q|Λ|sign(Λ)Q> = Q|Λ| (Q sign(Λ))> .
The left singular vectors are the same as eigenvectors and the right singular
vectors are eigenvectors flipped by the sign of the eigenvalues—if an eigenvalue
is negative, we set the singular value to be the absolute value of the eigenvalue,
and multiply the corresponding eigenvector(s) by -1 to obtain the right singular
vectors.
158 Chapter 8. Singular Value Decomposition
8.2 Computing SVD
8.2.1 Connection with Eigenvalue Solvers
Computing the orthonormal bases {ui}ri=1 and {vi}ri=1 for the column space and
the row space of a matrix A ∈ Rm×n is easy, e.g., Gram-Schmidt process can
be used for this purpose. However, in general, there is no reason to expect the
matrix A to transform an arbitrary choice of bases {vi}ri=1 to another orthogonal
bases. For a general rank-r matrix A with m rows and n columns, the SVD aims
at finding a set of orthonormal bases {vi}ri=1 for the row space of A that gets
transformed into a set of orthonormal bases {ui}ri=1 for the column space of A,
stretched by some {σi}ri=1, i.e.,
A~vi = ~uiσi, σi > 0, i = 1, . . . , r.
The key step towards finding the orthonormal matrices U and V is to use
the full SVD
A = UΣV >.
Rather than solving for U , V and Σ simultaneously, we can take the following
steps to obtain the SVD of a matrix A (assuming m > n):
1. Multiplying both sides by A> = V ΣU> on the left to get
A>A = V ΣU>UΣV >
= V Σ2V >
=
[
~v1 ~u2 . . . ~vn
]

σ21
σ22
. . .
σ2n
 [~v1 ~v2 . . . ~vn]> .
This problem can be solved by the eigendecomposition of the symmetric,
n×n matrix A>A, where {~vi}ni=1 are the eigenvectors and {σ2i }ni=1 are the
eigenvalues.
2. Compute the eigendecomposition of A>A = V ΛV >. Then set V be the
right singular vectors and Σ =
√
Λ be the singular values.
3. We can solve the linear system UΣ = AV to obtain the left singular vectors
U . In the absence of numerical error, this is equivalent to solving the
eigendecomposition of AA> = UΣ2U>.
Note that we have at most min{m,n} nonzero eigenvalues.
Remark 8.11
The above method is widely used in many areas for computing SVD of a matrix,
for example, in the principle component analysis. However, a major shortfall
of this method is that it is not numerically stable for computing singular values
σi ‖A‖.
Suppose we have an input matrix A. The floating point representation of A
has an error on the order of machine‖A‖. A numerical stable algorithm requries
8.2. Computing SVD 159
that the error of an estimated singular value σ˜i is on the order of machine‖A‖.
That is,
|σ˜i − σi| = O (machine ‖A‖) .
Consider the above process, the error in estimating the eigenvalues of A>A
(singular values squared) using a numerically stable eigenvalue solver is about
|σ˜2i − σ2i | = O
(
machine ‖A>A‖
)
.
The error of computing the square root to find σ˜i is on the order of
|σ˜2i−σ2i |
σi
.
Thus, the error of an estimated singular value σ˜i using the above process is
|σ˜i − σi| = O
(
machine ‖A>A‖
σi
)
= O
(
machine ‖A‖2
σi
)
.
An intuitive way to understand this is the following: the product A>A am-
plifies the numerical error quadratically in the eigenvalue estimation step, and
then the absolute error in computing a singular value (by solving the square root
of an eigenvalue) is on the order of the error of estimated eigenvalue divided by
σi. This way, the above method is usually fine for computing dominate singular
values, i.e., σi 0. However, for computing those singular values σi ‖A‖, the
resulting singular value estimate will be dominated by the error.
8.2.2 A Different Connection with Eigenvalue Solvers
An alternative way to computing the SVD of A ∈ Rm×n using eigendecomposi-
tion is to consider the following (n+m)-by-(n+m) matrix
S =
[
0 A>
A 0
]
.
The eigenvector and eigenvalue of the matrix S can be expressed as[
0 A>
A 0
] [
~v
~u
]
= λ
[
~v
~u
]
.
where ~v ∈ Rn and ~u ∈ Rm. This equation leads to{
A>~u = λ~v
A~v = λ~u
,
which implies A>A~v = λ2~v and AA>~u = λ2~u. Thus, if the matrix S has an
eigenvalue λ ≥ 0, then the corresponding eigenvector
[
~v
~u
]
defines a pair of right
and left singular vectors given that both ~v and ~u are unit vectors. The eigenvalue
λ ≥ 0 defines the corresponding singular value.
We note that if λ is an eigenvalue of S, then −λ is also an eigenvalue associ-
ated with an eigenvector
[
~v
−~u
]
. This can be easily verified by
[
0 A>
A 0
] [
~v
−~u
]
= −λ
[
~v
−~u
]
.
160 Chapter 8. Singular Value Decomposition
Thus, both the singular values of a matrix A and its negatives are eigenvalues of
S.
Now we can express the eigendecomposition of the matrix S by the SVD of
A = UΣV >, and vice versa. We consider that the matrix A is a square matrix
(i.e., m = n)—in fact, computing the SVD of a general matrix with m 6= n can
be effectively reduced to computing the SVD of a square matrix, this will be
shown in later part of this section. This way, we have[
0 A>
A 0
] [
V V
U −U
]
=
[
V V
U −U
] [
Σ 0
0 −Σ
]
.
Since singular vectors are unit vectors, we can normalise an eigenvector
[
~v
~u
]
or[
~v
−~u
]
by scaling it by a factor of 1/
√
2. Thus, using an orthogonal matrix
Q =
1√
2
[
V V
U −U
]
,
we can express the eigendecomposition of S in the form of
S = Q
[
Σ 0
0 −Σ
]
Q>.
Therefore, the SVD can be obtained by computing the eigendecomposition of
the matrix S. In contrast to the method using the eigendecomposition A>A,
the new method is numerically stable as it does not involve the square root of
eigenvalues.
Remark 8.12
In practice, the matrix S is never formed explicitly. Factorisations of S, such
as the QR factorisation and the eigendecomposition, can be obtained by using
the matrix A and the symmetry.
8.2.3 Bidiagonalisation
As in the eigenvalue solvers, algorithms for computing SVD are also often have
two phases: In the phase 1, the matrix A is reduced to a bidiagonal form, in
order to save floating point operations in computing the eigendecomposition of
S or A>A. In the phase 2, eigenvalue solvers such as the shifted QR algorithm
can be used to diagonalise S or A>A, and hence A, to find singular values. This
process is shown as the following:
××××
××××
××××
××××
××××
××××
××××

A
Phase 1−−−−−→
U>0 AV0

××
××
××
×

Bidiagonal B
Phase 2−−−−−→
U>BV

×
×
×
×

Diagonal Σ
We will focus on the phase 1 of this process and omit details of the phase 2.
8.2. Computing SVD 161
Remark 8.13
Suppose the matrix A ∈ Rm×n and m > n. In the bidiagonalisation step, both
U0 ∈ Rm×m and V0 ∈ Rn×n are orthogonal matrices, and the last m− n rows
of B have zero values, which can be shown as the following:
Consider the nonzero block of the matrix B, denoted by Bˆ, and its SVD,
Bˆ = UBΣˆV
>
B , where UB ,Σ, VB ∈ Rn×n. Constructing the orthogonal matrix
QB =
[
UB 0
0 I
]
,
and the zero padded matrix
Σ =
[
Σˆ
0
]
,
we can define the SVD of the matrix B as
B = QBΣV
>
B .
This is demonstrated as the following:
Since B = U>0 AV0, we have A = U0BV
>
0 , and thus
A = U0QBΣV
>
B V
>
0 = (U0QB)︸︷︷︸
U
Σ (V0VB)
>︸︷︷︸
V >
.
This way, computing the SVD of the original matrix A can be effectively
reduced to computing the SVD of a n-by-n matrix Bˆ.
Golub-Kahan Bidiagonalisation
The goal of bidiagonalisation is to multiply the matrix A by a sequence of uni-
tary/orthogonal matrices on the left, and another sequence of unitary/orthogo-
nal matrices on the right to obtain a bidiagonal matrix that has zeros below its
diagonal and zeros above its first superdiagonal.
162 Chapter 8. Singular Value Decomposition
This process is significantly different from the reduction of a matrix to the
tridiagonal form. In the reduction to the tridiagonal form, the input matrix
should be square and the same sequence of unitary/orthogonal matrices are ap-
plied on both sides of the matrix. In the bidiagonalisation, the input matrix does
not need to be a square matrix, and two different sequences of unitary/orthogo-
nal matrices are applied on the left and on the right of the matrix—the numbers
of matrices applied in the two sequences are not necessarily the same.
The simplest method for accomplishing this is the Golub-Kahan bidiagonali-
sation. It applies Householder reflection alternately on the left and on the right
of a matrix. The left Householder reflection aims to introduce zeros below the
diagonal, whereas the right Householder reflection aims to introduce zeros to the
right of the first superdiagonal. This way, zeros introduced by the left House-
holder reflection will not be modified by the right Householder reflection, and
previously introduced zeros will not be modified by later Householder reflections.
This process can be demonstrated by the following example.
Example 8.14
Consider a matrix A ∈ R7×4, applying Householder reflection alternately on
the left and on the right of A produces a bidiagonal form. This Golub-Kahan
bidiagonalisation can be shown as:

××××
××××
××××
××××
××××
××××
××××

A
U>1 (·)−−−−→

××××
×××
×××
×××
×××
×××
×××

U>1 A
(·)V1−−−→

××
×××
×××
×××
×××
×××
×××

U>1 AV1
U>2 (·)−−−−→

××
×××
××
××
××
××
××

U>2 U
>
1 AV1
(·)V2−−−→

××
××
××
××
××
××
××

U>2 U
>
1 AV1V2
U>3 (·)−−−−→

××
××
××
×
×
×
×

U>3 U
>
2 U
>
1 AV1V2
U>4 (·)−−−−→

××
××
××
×

U>4 U
>
3 U
>
2 U
>
1 AV1V2
.
The four left multiplications introduce zeros below diagonal, and the two right
multiplications introduce zeros above the first superdiagonal.
For a matrix A ∈ Rm×n, n Householder reflections have to be applied on the
left and n− 2 Householder reflections have to be applied on the right. The total
work of the Golub-Kahan bidiagonalisation is about doubling the work of the QR
factorisation—the left Householder reflections have the same work as computing
the QR factorisation of A, and the right Householder reflections have the same
work as computing the QR factorisation of A> except the first row. Thus the
total work of the Golub-Kahan bidiagonalisation is ∼ 4mn2 − 43n3 flops.
8.2. Computing SVD 163
Lawson-Hanson-Chan Bidiagonalisation
For the case where m n, the total work of the Golub-Kahan bidiagonalisa-
tion is unnecessarily high. If we know the matrix in a bidiagonal form has zeros
below its n-th row, then the right Householder reflections in the bidiagonalisa-
tion process should try to avoid modify those entries. This can be accomplished
by first applying a QR factorisation to the input matrix, and then apply the
Golub-Kahan bidiagonalisation to the upper triangular matrix to reduce it to
the bidiagonal form. This procedure is called the Lawson-Hanson-Chan (LHC)
bidiagonalisation. It can be demonstrated as the following:
In the LHC bidiagonalisation, the work of the QR step is ∼ 2mn2 − 23n3
flops, and the work of the subsequent bidiagonalisation of the upper triangular
matrix is about ∼ 4n3 − 43n3 = 83n3 flops. Thus, the total work of the LHC
bidiagonalisation is ∼ 2mn2 + 2n3 flops. This requires less operation counts
than the Golub-Kahan bidiagonalisation if m > 53n.
From Bidiagonal Form of A to Tridiagonal Form of A>A and S
We have seen that in the phase 1 of an eigenvalue solver, a symmetric matrix
can be reduced to a tridiagonal matrix. In computing SVD, reducing a matrix
to a bidiagonal form is an analogy of the phase 1 of eigenvalue solvers. In fact,
reducing a matrix A to a bidiagonal form is equivalent to reducing the matrices
S and A>A to a tridiagonal form.
As shown in Remark 8.13, computing SVD of a general matrix A ∈ Rm×n
with m > n can be effectively reduced to computing SVD of a square bidiagonal
matrix B ∈ Rn×n.
This way, computing SVD using the eigendecomposition of A>A is reduced
to finding the eigendecomposition of B>B. It is easy to verify that the matrix
B>B is a symmetric tridiagonal matrix.
For computing SVD using the eigendecomposition of the matrix S, we effec-
tively solving the eigendecomposition of the matrix
SB =
[
0 B>
B 0
]
.
This matrix SB has a tridiagonal form by swapping rows and columns using
an orthogonal similarity transformation defined by some permutation matrix.
Modified shifted QR algorithms (which can adapt to the structure of SB) are
developed to solve the eigendecomposition of SB . We leave it at this.
164 Chapter 8. Singular Value Decomposition
8.3 Low Rank Matrix Approximation using SVD
Recall the reduced singular value decomposition of a rank-k matrix A ∈ Rm×n,
A = Uˆ ΣˆVˆ > =
r∑
i=1
σi ~ui ~v
>
i .
This decomposition into a summation of rank-one matrices, σi ~ui ~v
>
i , has a cele-
brated property: the k-th partial sum captures the energy of the matrix A as much
as possible. Here the “energy” is define by either the 2-norm or the Frobenius
norm.
Definition 8.15
Given the SVD of the matrix A ∈ Rm×n, the truncated singular value
decomposition is defined by only retain the first k singular values, and first
k left and right singular vectors. Let A = UΣV >, and then the truncated SVD
takes the form of
A ≈ Ak := Uk︸︷︷︸
U(:,1:k)
Σk︸︷︷︸
Σ(1:k,1:k)
V >k︸︷︷︸
V (:,1:k)>
=
k∑
i=1
σi ~ui ~v
>
i , (8.7)
for k < r. The matrix Ak =
∑k
i=1 σi ~ui ~v
>
i is a rank-k approximation to A.
Theorem 8.16
Given a matrix A and its SVD, the rank-k approximation Ak where k < r
defined by the truncated SVD provides the best approximation to A in either
the 2-norm or the Frobenius norm. That is,
‖A−Ak‖2 ≤ ‖A−B‖2, for all B ∈ Rm×n with rank k.
and
‖A−Ak‖F ≤ ‖A−B‖F , for all B ∈ Rm×n with rank k.
Proof. This can be shown by contradiction. We omit the proof here.
Example 8.17
A natural application of this theorem is that we can compress a data set or a
picture using the truncated SVD. A matrix A ∈ Rm×n requires mn floating-
point numbers of memory to store, whereas its truncated SVD only requires
mk+nk+k = (m+n+1)k floating-point numbers. Following Theorem 8.9 and
8.10, the compression error, in terms of the Frobenius norm and the 2-norm
can be given by the residual singular values after the truncation, i.e.,
‖A−Ak‖F =
√√√√ r∑
i=k+1
σ2i ,
‖A−Ak‖2 = σk+1.
8.3. Low Rank Matrix Approximation using SVD 165
Considering the following grey scale picture (on the left) that consists of 900×
703 pixels,
we can treat it as a matrix, and hence the truncated SVD can be applied to
compress this image. The picture on the right shows the compressed image
created by the truncated SVD with k = 30.
166 Chapter 8. Singular Value Decomposition
8.4 Pseudo Inverse and Least Square Problems using SVD
Recall the linear least-squares (LS) problem:
Definition 8.18: Least-Squares Problem
Let A ∈ Rm×n with m > n. Find ~x that minimizes f(~x) = ‖~b−A~x‖22.
Example 8.19: Polynomial Least Square Fitting
Suppose we have m distinct points, s1, s2, . . . , sm ∈ R and data b1, b2, . . . , bm ∈
R observed at these points. We aim to find a polynomial of degree n− 1
p(x) = x1 + x2s · · ·+ xnsn−1 =
n∑
i=1
xis
i−1,
defined by coefficients {xi}ni=1, that best fits the data in the least square sense.
The relationship of the data {si}mi=1, {bi}mi=1 to the coefficients {xi}ni=1 can be
expressed by the Vandermonde system as:
1 s1 s
2
1 s
n−1
1
1 s2 s
2
2 s
n−1
2
1 s3 s
2
3 s
n−1
3
...
...
1 sm−1 s2m−1 s
n−1
m−1
1 sm s
2
m s
n−1
m

︸︷︷︸
A

x1
x2
x3
...
xn

︸︷︷︸
~x
=

b1
b2
b3
...
bm−1
bm

︸︷︷︸
~b
To determine the coefficients {xi}ni=1 from data, we can solve a least
square system A~x = ~b. The following figure presents an exam-
ple of this process. We have 51 data points which is the func-
tion sin(10s) observed at discrete points 0, 0.02, 0.04, . . . , 1, represented by
crosses. We construct a polynomial of degree 11 to fit this data set.
8.4. Pseudo Inverse and Least Square Problems using SVD 167
Pseudoinverse
One way to solve the least square problem is solving the normal equation
ATA~x = AT~b. (8.8)
This leads to the definition of pseudoinverse of a matrix.
Definition 8.20
For a full rank matrix A ∈ Rm×n, the matrix (A>A)−1A> is called pseudoin-
verse of A, denoted by A+,
A+ = (A>A)−1A> ∈ Rn×m.
Using the pseudoinverse, the solution of the normal equation can be expressed
as
~x = A+~b.
Defining the projector P = AA+ which is an orthogonal projector onto range(A),
the solution ~x minimising the least square problem satisfies that
A~x = P~b,
where the right hand side is the data projected onto the range of A.
Theorem 8.21
Given the pseudoinverse of matrix A, denoted by A+, the matrix P = AA+ is
an orthogonal projector onto range(A).
QR
Solving Equation (8.8) is computationally fast but can be numerically unstable.
The practical method for solving the least square problem uses the reduced QR
factorisation A = QˆRˆ. This way, the projection onto the range of A is defined
by P = QˆQˆ>. Then the equation A~x = P~b can be expressed as
QˆRˆ~x = QˆQˆ>~b,
and left-multiplication by Qˆ> leads to
Rˆ~x = Qˆ>~b. (8.9)
Remark 8.22
Multiplying by Rˆ−1 leads to an alternative definition of pseudoinverse in the
form of
A+ = Rˆ−1Qˆ>. (8.10)
SVD
Alternatively, SVD provides a geometrically intuitive way to understand and
solve the least square problem. This is particularly useful for rank-deficient
168 Chapter 8. Singular Value Decomposition
systems and the case m < n (e.g., the X-ray imaging). Suppose the matrix
A ∈ Rm×n has a rank-r reduced SVD
A = Uˆ ΣˆVˆ > =
r∑
i=1
σi ~ui ~v
>
i .
Recall that the columns of Vˆ span the row space of A, the columns of Uˆ spans
the column space (range) of A, and Σˆ representing the stretching effect of the
linear transformation.
The left singular vectors define an orthogonal projector P = Uˆ Uˆ>. The data
~b can be projected onto the range of A, spanned by the columns of Uˆ . The
projected data, P~b, can be expressed as a linear combination of the columns of
Uˆ—the associated coefficients is defined by the vector Uˆ>~b ∈ Rr.
Then the equation A~x = P~b can be expressed as
Uˆ ΣˆVˆ >~x = Uˆ Uˆ>~b,
and left-multiplication by Uˆ> leads to
ΣˆVˆ >~x = Uˆ>~b. (8.11)
Solving this equation we obtain the least square solution
~x = Vˆ Σˆ−1Uˆ>~b. (8.12)
This way, we know that the least solution ~x is a linear combination of the columns
of Vˆ—the associated coefficients is defined by the vector Vˆ >~x ∈ Rr—and hence
it is in the row space of A.
The least square system can be understood as the following: projecting the
data to the range of the matrix A (defining ~q = Uˆ>~b ∈ Rr), we seek a solution
~x to the least square problem in the row space of A. Expressing the solution ~x
as a linear combination of the columns of Vˆ ,
~x = Vˆ ~p, where ~p = Vˆ >~x ∈ Rr,
the least square problems reduce to a r-dimensional linear system
Σˆ~p = ~q.
Recall the geometric interpretation of SVD (Figure 8.1.2), solving the least
square problem effectively inverts the stretching effect of a linear transform
within the rank-r row space and column space of A. This will be the key to
understanding X-ray imaging in the next section.
Remark 8.23
SVD also defines the pseudoinverse of A in the form of
A+ = Vˆ Σˆ−1Uˆ>. (8.13)
8.4. Pseudo Inverse and Least Square Problems using SVD 169
Algorithm 8.24: Least Squares via SVD
Given a matrix A ∈ Rm×n and the data ~b ∈ Rm, the solution ~x of the least
square problem f(~x) = ‖~b−A~x‖22 can be obtained as following:
1. Compute the reduced SVD, A = Uˆ ΣˆVˆ >.
2. Compute the vector ~q = Uˆ>~b ∈ Rr.
3. Solving the linear system Σˆ~p = ~q.
4. Set ~x = Vˆ ~p.
170 Chapter 8. Singular Value Decomposition
8.5 X-Ray Imaging using SVD
In this section, we use an industrial process imaging problem as the example
to demonstrate the X-ray imaging. The setup of the problem is demonstrated
in Figure 8.5.1. The true object consists of three circular inclusions, each of
uniform density, inside an annulus. Ten X-ray sources are positioned on one
side of a circle, and each source sends a fan of 100 X-rays that are measured by
detectors on the opposite side of the object. Here, the 10 sources are distributed
evenly so that they form a total illumination angle of 90 degrees, resulting in a
limited-angle X-ray problem. The goal is to reconstruct the density of the object
(as an image) from measured X-ray signals.
Figure 8.5.1: Left: discretised domain, true object, sources (red dots), and de-
tectors corresponding to one source (black dots). The fan transmitted by one
source is illustrated in gray. The density of the object is 0.006 in the outer ring
and 0.004 in the three inclusions; the background density is zero. Right: the
noise free measurements (black line) and the noisy measurements (red dots) for
one source.
8.5.1 Mathematical Model
When an X-ray travels through a physical object along a straight line l(s), where
s is the spatial coordinate, interaction between radiation and matter lowers the
intensity of the ray. Suppose that an X-ray has initial intensity I0 at the radiation
source. The intensity measured at the detector I1 is smaller than I0, as the
intensity of the X-ray decreases proportionally to the relative intensity loss of
the matter along the line l. We can representing the relative intensity loss of the
matter by an attenuation coefficient function f(s), whose value gives the relative
intensity loss of the X-ray within a small distance ds,
dI
I
= −f(s)ds.
8.5. X-Ray Imaging using SVD 171
Density of material is often correlated with the relative intensity loss. Material
with a higher density (e.g., medal) often has higher attenuation coefficient than
material with a lower density (e.g., wood). Thus, recovering the unknown at-
tenuation coefficient function f(s) from X-ray signals is used as a surrogate for
reconstructing the actual material density.
Integration from the initial state to the final state along a line l(s) gives∫
l(s)
I ′(s)
I(s)
= −
∫
l(s)
f(s)ds,
where the left hand side gives log(I1)− log(I0) = log( I1I0 ). Thus we have
log(I0)− log(I1) =
∫
l(s)
f(s)ds.
Now the left hand side of the above equation is known from measurements (I0 by
the equipment setup and I1 from detector), whereas the right hand side consists
of integrals of the unknown function f(x) over straight lines.
8.5.2 Computational Model
Figure 8.5.2: Left: discretised object and an X-ray travelling through it. Right:
four pixels from the left side picture and the distances (in these pixels) travelled
by the X-ray corresponding to the measurement d7. Distance ai,j corresponds
to the element on the i-th row and j-th column of matrix F .
Computationally we can represent the continuous function f(s) by n pix-
els (or voxels in 3D), as shown in Figure 8.5.2. Now each component of ~x =
[x1, x2, . . . , xn]
> represents the value of the unknown attenuation coefficient func-
tion f(s) in the corresponding pixel. Assuming we have a measurement di of the
line integral of f(s) over line li(s), we can approximate
di =
∫
li(s)
f(s)ds =
n∑
j=1
ai,jxj ,
where ai,j is the distance that the line li(s) “travels” in the j-th pixel correspond-
ing to xj . If we have m measurements (m X-rays travels through the object),
then we have a linear equation
~d = F~x,
where Fij = ai,j and ~d = [d1, d2, . . . , dm]
>.
172 Chapter 8. Singular Value Decomposition
8.5.3 Image Reconstruction
We move from the problem of computing the observables ~d for a given an attenu-
ation coefficient function to the image reconstruction. Consider the measurement
process can be expressed as
~d = F~x+ ~e,
where ~e represents possible measurement noise of the instrument (as all the
real world measurements are noisy) and other source of errors in the modelling
process.
Remark 8.25
The error in the measurement process is not negligible.
The process of determining ~d given a known ~x is called the forward problem.
In contrast, image reconstruction is an inverse problem where we aim to recover
~x from measured data ~d.
In many cases, especially in industrial imaging, the x-rays travel through the
physical object only from a restricted angle of view and we often have m < n.
This way, the reconstruction process is very sensitive to measurement error. To
understand the reconstruction process and the role of measurement error, we
generate a noise free data by ~dt = F~xt and its noise corrupted version ~dn for a
given “true” test image ~xt, as shown in Figure 8.5.1. Furthermore, the reduced
SVD of F , F = Uˆ ΣˆVˆ >, will also be used.
Inverse Crime
Figure 8.5.3: Left: reconstruction from noise free data. Right: reconstruction
from noisy data.
Given a measured data set ~dn, a natural thing to try is to recover ~x by using
the pseudoinverse of F (as discussed in the previous section) as F ∈ Rm×n may
not be invertible. This way, we have the reconstructed image,
~x+ = F+~dn = Vˆ Σˆ
−1Uˆ>~dn.
To demonstrate the impact of measurement error, we consider the following
experiments:
1. Reconstruct ~x using the noise free data—this not realistic in practice.
2. Reconstruct ~x using the noisy data—the realistic case.
8.5. X-Ray Imaging using SVD 173
Figure 8.5.3 shows the reconstructed image for both experiments. The experi-
ment 1 is often referred to as the inverse crime, or too-good-to-be-true recon-
structions. This is a reconstruction given the perfect knowledge about measure-
ment process and provided noisy free data. In practice, a small error in the data
can lead to a rather large error in the reconstruction, as shown in the experiment
2. Thus, we aim to find reconstructions that is robust to error.
Reconstruction using truncated SVD
Consider the noisy data generated from a true image ~xt, ~dn = F~xt + ~e, the
reconstruction using the pseudoinverse can be expressed as
~x+ = F+ ~dn = Vˆ Σˆ
−1Uˆ>︸︷︷︸
F+
Uˆ Σˆ−1Vˆ >︸︷︷︸
F
~xt + Vˆ Σˆ
−1Uˆ>︸︷︷︸
F+
~e.
Thus, we have
~x+ = Vˆ Vˆ >~xt + F+~e.
The reconstructed image ~x+ consists of the true image ~xt projected onto the row
space of F and the pseudoinverse multiplied by the noise F+~e. This way the
reconstruction error ‖~x+ − ~xt‖ can be bounded as
‖~x+ − ~xt‖ = ‖(Vˆ Vˆ > − I)~xt + F+~e‖
≤ ‖(I − Vˆ Vˆ >)~xt‖+ ‖F+‖‖~e‖, (8.14)
by the triangle inequality.
We know that the 2-norm of the pseudoinverse, ‖F+‖, is given by 1/σr. Then
the error bound can be expressed as
‖~x+ − ~xt‖ ≤ ‖(I − Vˆ Vˆ >)~xt‖+ 1
σr
‖~e‖. (8.15)
Thus, the reconstruction error is subject to the smallest nonzero singular values.
The singular values of the example used here is shown in Figure 8.5.4.
Figure 8.5.4: Singular values of F .
174 Chapter 8. Singular Value Decomposition
To control the reconstruction error, one can use the truncated SVD to define
the approximated pseudoinverse of F . Given the truncated SVD
F ≈
k∑
i=1
σi ~ui ~v
>
i ,
for k < r, the rank-k approximated pseudoinverse can be defined as
F+k = UkΣkV
>
k =
k∑
i=1
1
σi
~ui ~v
>
i
This way, the corresponding reconstruction error bound of the reconstructed
image
~x+k = VkΣ
−1
k U
>
k
~d.
takes the form
‖~x+k − ~xt‖ ≤ ‖(I − VkV >k )~xt‖︸︷︷︸
Representation error
+
1
σk
‖~e‖. (8.16)
Figure 8.5.5 shows the reconstructed image using k = 50, k = 500, and k = 940.
The left reconstructed image in Figure 8.5.5 has a rather large representation
Figure 8.5.5: Left: k = 50. Middle: k = 500. Right: k = 940.
error as we truncated the SVD too aggressively. The right reconstructed image in
Figure 8.5.5 is not robust to noise as the last singular value σk in the truncated
SVD is too small. The middle reconstructed image in Figure 8.5.5 seems to
archive a suitable balance.
L-curve
We aim to find a k such that the reconstruction error is robust with respect to
the noise ~e, while has a minimal representation error. If the true image ~xt and
the noise ~e are known, we can precisely compute the reconstruction error to pick
the best k that reconstructs the optimal image. However, both ~xt and ~e are
unknown, so we have to derive heuristics for choosing the best k.
We can measure the representation error as how well the reconstructed image
fits the noisy data, in the form of
‖F~x+k − ~dn‖,
and measure the robustness of the reconstruction by ‖~x+k ‖, which is bounded as
‖~x+k ‖ ≤ ‖VkV >k ~xt‖+
1
σk
‖~e‖.
8.5. X-Ray Imaging using SVD 175
The smaller the former is, the better the reconstructed image explains the data,
while the smaller the latter is, the reconstruction is more robust.
For a suitably chosen k, the norm ‖F~x+k − ~dn‖ should close to the norm of
measurement noise ‖~e‖. For a rather small k, we expect the norm ‖F~x+k − ~dn‖
can be rather large. If we increase k, then the norm ‖F~x+k − ~dn‖ should decrease
until it reaches the order of measurement noise ‖~e‖. However, at the same time,
the robustness of the reconstruction decreases if k is too large. Thus, we expect
the norm of ~x+k increases drastically if the k is chosen such that σk is too small.
We often plot the norm ‖~x+k ‖ (on the horizontal axes) versus the norm ‖F~x+k −
~dn‖ (on the vertical axes) with different k values. This leads to the so-called L-
curve. Figure 8.5.6 shows the L-curve computed for the example used here.
The corner (represented by the black dot) represents a reasonable k value that
balance the norm ‖F~x+k −~dn‖ (fit to the data) and the norm ‖~x+k ‖ (reconstruction
robustness).
Figure 8.5.6: Top left: the L-curve. Top right: the reconstructed image using
k = 840, which the corner on the L-curve. Bottom left: reconstruction robustness
‖~x+k ‖ versus the rank k. Bottom right: fit to the data ‖F~x+k − ~dn‖ versus the
rank k.
176 Chapter 8. Singular Value Decomposition
Chapter 9
Krylov Subspace Methods
for Eigenvalues
In this chapter we will consider solving the eigenvalue problems for very large
martices A ∈ Rn×n. For example, in the X-ray imaging case, a 3-D image discre-
tised into 100 intervals on each dimension (not even a very fine resolution image)
has a million voxels to reconstruct and we may use the same order of number of
X-rays in the reconstruction. This leads to an eigenvalue problems with a million
dimensional matrix. In this scenario, it is no longer computationally feasible to
directly apply eigenvalue solvers such as the QR algorithm that operates on the
full matrix (with operation counts O(n3) in Phase 1 and O(n2) in each iteration
of Phase 2).
Instead of solving the original eigenvalue problem in Rn, we seek to project
the original problem onto a lower dimensional subspace, the Krylov subspace, and
then solve a reduced dimensional eigenvalue problem. In this Chapter, we will
discuss two algorithms for computing eigenvalues using the Krylov subspace, the
Arnoldi method and the Lanczos method, which are designed for general square
matrices and symmetric matrices, respectively.
9.1 The Arnoldi Method for Eigenvalue Problems
Objective
We recall that the CG method and the GMRES method for solving linear systems
with A~x = ~b minimises the residual A~x−~b projected onto the Krylov subspace
generated by the matrix A and the vector ~b:
Kk+1(~b,A) = span{~b,A~b,A2~b, . . . , Ak~b}.
Given a general square matrix A ∈ Rn×n, the goal of the Arnoldi method is to
construct an orthonormal basis Qk+1 of the Krylov subspace Kk+1(~b,A) for some
k > 0 such that the projection of the matrix A onto Kk+1(~b,A) with respect to
the basis of columns of Qk+1,
Hk+1 = Q
∗
k+1AQk+1, Hk+1 ∈ R(k+1)×(k+1),
is a Hessenberg matrix. Under certain technical conditions, the eigenvalues of
the Hessenberg matrix Hk+1 (the so-called Arnoldi eigenvalue estimates) can be
good approximations of the eigenvalues of A.
177
178 Chapter 9. Krylov Subspace Methods for Eigenvalues
In the rest of this section, we will show the Arnoldi procedure for constructing
such a matrix Qk+1 and some of its important properties for solving eigenvalue
problems and linear systems.
Arnoldi Procedure
Recall that a complete reduction of A ∈ Rn×n to a Hessenberg form by a unitary
similarity transformation can be written as
H = Q∗AQ, or AQ = QH.
In Phase 1 of the eigenvalue solvers we learned in Chapters 6 and 7, the matrix
Q is constructed by a sequence of n−2 Householder reflections. For a large n, it
is not feasible to apply this process which requires O(n3) operation counts. This
way, we can only focus on the first k + 1 columns of AQ = QH.
Furthermore, recall that for computing the QR factorisation of A, QR = A,
we have discussed two methods: Householder reflection and (modified) Gram-
Schmidt. While the former is more numerically stable, the (modified) Gram-
Schmidt has the advantage that it can be stopped part-way, leaving one with
a reduced QR factorisation. The process of using the Arnoldi procedure to
construct the first k + 1 columns of AQ = QH draws an analogy to this.
Arnoldi generates an orthonormal basis for the Krylov space Kk+1(~b,A) by
setting
~q0 = ~b/‖~b‖,
and applying modified Gram-Schmidt to orthogonalise the vectors
{~q0, A~q0, A~q1, . . . , A~qk}.
In every iteration, the Arnoldi method computes a vector A~qk, and orthog-
onalise this vector to the previous {~q0, ~q1, . . . , ~qk} using the modified Gram-
Schmidt process to generate a new vector ~qk+1. This is essentially subtracting
from A~qk the components in the directions of the previous ~qj :
~vk+1 = A~qk − h0,k~q0 − h1,k~q1 − . . .− hk,k~qk,
where the projection coefficients hj,k are determined as hj,i = (A~qk)
∗~qi. The
new orthonormal vector ~qk+1 is then determined by normalising ~vk+1:
~qk+1 = ~vk+1/hk+1,k
where hk+1,k = ‖~vk+1‖. So the basis vectors {~q0, ~q1, . . . , ~qk, ~qk+1} satisfy
hk+1,k~qk+1 = A~qk − h0,k~q0 − h1,k~q1 − · · · − hk,k~qk,
or
A~qk = h0,k~q0 + h1,k~q1 + · · ·+ hk,k~qk + hk+1,k~qk+1.
This procedure to generate an orthonormal basis of the Krylov space is called
the Arnoldi procedure. It can easily be shows that the resulting set of Arnoldi
vectors, {~q0, ~q1, . . . , ~qk, ~qk}, is a basis for Kk+1(~b,A) = span{~b,A~b,A2~b, . . . , Ak~b}.
Theorem 9.1
Let {~q0, . . . , ~qk} be the vectors generated by the Arnoldi procedure. Then
span{~q0, . . . , ~qk} = span{~b,A~b,A2~b, . . . , Ak~b}.
9.1. The Arnoldi Method for Eigenvalue Problems 179
Proof. This can be shown by induction. The case for k = 0 is trivial. For
k > 0, suppose that
span{~q0, . . . , ~qk} = span{~b,A~b,A2~b, . . . , Ak~b},
holds. Given the relationship between A~qk and the Arnoldi vectors:
A~qk = h0,k~q0 + h1,k~q1 + · · ·+ hk,k~qk + hk+1,k~qk+1,
we know that the vector A~qk is a linear combination of {~q0, . . . , ~qk, ~qk+1}. Thus,
we have
span{~q0, . . . , ~qk, , ~qk+1} = span{~b,A~b,A2~b, . . . , Ak~b,Ak+1~b}.
The Arnoldi procedure is given by:
Algorithm 9.2: Arnoldi Procedure for an Orthonormal Basis of Kk+1(~b0, A)
Input: matrix A ∈ Rn×n; vector ~b0
Output: vectors ~q0, . . . , ~qk that form an orthonormal basis of Kk+1(~b0, A)
1: ~q0 = ~b0/‖~b0‖
2: for i = 0 : (k − 1) do
3: ~v = A~qi
4: for j = 0 : i do
5: hj,i = ~q
∗
j ~v
6: ~v = ~v − hj,i~qj
7: end for
8: hi+1,i = ‖~v‖
9: if hi+1,i < tol then
10: Stop
11: end if
12: ~qi+1 = ~v/hi+1,i
13: end for
We can express the vectors and coefficients computed during the Arnoldi
procedure in a matrix form of:
A

~q0 ~q1 · · · ~qk

︸︷︷︸
n-by-(k+1)
=

~q0 ~q1 · · · ~qk ~qk+1

︸︷︷︸
n-by-(k+2)

h0,0 h0,1 h0,2 . . . h0,k
h1,0 h1,1 h1,2
h2,1 h2,2
. . .
...
h3,2
. . .
. . .
. . .
0 hk,k−1 hk,k
hk+1,k

︸︷︷︸
(k+2)-by-(k−1)
,
180 Chapter 9. Krylov Subspace Methods for Eigenvalues
or
AQk+1 = Qk+2H˜k+1.
Projection onto Krylov Subspaces
We can partition the matrix H˜k+1 as:
H˜k+1 =

h0,0 h0,1 h0,2 . . . h0,k
h1,0 h1,1 h1,2
h2,1 h2,2
. . .
...
h3,2
. . .
. . .
. . .
0 hk,k−1 hk,k
hk+1,k

=
[
Hk+1
hk+1,k~e
>
]
,
where Hk+1 is a (k + 1)-by-(k + 1) square Hessenberg matrix.
Note that the product Q∗k+1Qk+2 =
[
I ~0
]
, which is a (k + 1)-by-(k + 2)
identity matrix, i.e., a matrix with 1 on its main diagonal and zero elsewhere.
Then, we have
Q∗k+1AQk+1 = Q
∗
k+1Qk+2H˜k+1 = Hk+1.
The matrix Hk+1 can be interpreted as the representation in the basis of columns
of Qk+1 of the matrix A projected onto the Krylov subspace Kk+1.
Since the Hessenberg matrix Hk+1 is a projection of A, one might image
that the eigenvalue of Hk+1 can be related to the eigenvalue of A. In fact,
under certain conditions, the eigenvalues Hk+1 (the so-called Arnoldi eigenvalue
estimates) can be very accurate approximations of the eigenvalues of A. This
will be shown in later sections.
Breakdown
Note also that, when k+ 1 = n, the process terminates with hn,n−1 = ‖~vn‖ = 0,
because there cannot be more than n orthogonal vectors in Rn. At this point,
we obtain
AQ = QH,
where Q ∈ Rn×n is orthogonal and H ∈ Rn×n is a square Hessenberg matrix
with zeros below the first subdiagonal:
H =

h0,0 h0,1 h0,2 . . . h0,n−1
h1,0 h1,1 h1,2
h2,1 h2,2
. . .
...
h3,2
. . .
. . .
. . .
0 hn−1,n−2 hn−1,n−1

.
9.1. The Arnoldi Method for Eigenvalue Problems 181
In practice, the Arnoldi procedure will be terminated if the value hk+1,k =
‖~vk+1‖ is close to zero, say, below certain threshold (Lines 9-11 in Algorithm
9.2). This is called a breakdown of the Arnoldi procedure. Very often and
hopefully, the breakdown can occur before k + 1 = n. The breakdown means
exact eigenvalues (up to some numerical error) of A can be obtained from the
matrix Hk+1 and exact solutions of the linear system A~x = ~b can be obtained.
Remark 9.3
Once a breakdown occurs, we have hk+1,k = 0, and then
H˜k+1 =
[
Hk+1
0T
]
.
It then follows that
AQk+1 = Qk+2H˜k+1
= [Qk+1 | ~qk+1]
[
Hk+1
0T
]
= Qk+1Hk+1.
Remark 9.4
Consider that Qk+1 is the first k+ 1 columns of an unitary matrix Q that can
reduce the matrix A to a Hessenberg form, i.e., AQ = QH or Q∗AQ = H.
In this case the matrix H for the full Hessenberg reduction has the following
structure
H =
[
Hk+1 H12
0 H22
]
,
where H12 is a potentially full (k + 1) × (n − k − 1) matrix and H22 is an
(n − k − 1) × (n − k − 1) upper Hessenberg matrix. Thus, H is block upper
triangular. Then the union of the eigenvalues of Hk+1 and the eigenvalues of
H22 are the eigenvalues of A.
Remark 9.5
It is easy to verify that is Hk+1 has an eigenvalue λ with an eigenvector ~v,
then λ is an eigenvalue of A and A has a corresponding eigenvector Qk+1~v.
Proof. Let λ be an eigenvalue of Hk+1 with corresponding eigenvector ~v,
i.e., Hk+1~v = λ~v. Let ~y = Qk+1~v, then
A~y = AQk+1~v = Qk+1Hk+1~v,
as given in (i). Since Hk+1~v = λ~v, we have
A~y = λQk+1~v = λ~y,
Since ~v 6= 0, and since the columns of Qk+1 are linearly independent, it follows
that ~y 6= 0, and hence λ is an eigenvalue of A with eigenvector ~y.
182 Chapter 9. Krylov Subspace Methods for Eigenvalues
Theorem 9.6
Once a breakdown occurs at an iteration k, the Krylov subspace Kk+1(~b,A) =
span{~b,A~b,A2~b, . . . , Ak~b} is an invariant subspace of A, i.e., AKk+1 ⊆ Kk+1.
Proof. Let ~y be an arbitrary vector in AKk+1, then there exists a vector
~z ∈ Kk+1 such that ~y = A~z. Since Kk+1 = span{~q0, · · · , ~qk}, we can express
~z as a linear combination of {~q0, · · · , ~qk} in the form of ~z = Qk+1 ~w for some
~w ∈ Rk+1. It follows that ~y = AQk+1 ~w = Qk+1Hk+1 ~w. This implies that
~y ∈ span{~q0, · · · , ~qk}. Since ~y is arbitrary it follows that AKk+1 ⊆ Kk+1.
Theorem 9.7
Once a breakdown occurs at an iteration k, the Krylov subspaces of A gener-
ated by b, Kk+1(~b,A) = span{~b,A~b,A2~b, . . . , Ak~b}, have the following property:
Kk+1 = Kk+2 = Kk+3 = · · · .
Proof. First we have that Kk+1 ⊆ Kk+2 by the definition of Krylov subspace.
The Krylov subspace Kk+2 is the union of span{~q0} and the subspace AKk+1,
i.e.,
Kk+2 = span{~q0} ∪AKk+1.
After the breakdown, we have AKk+1 ⊆ Kk+1 as the result of Theorem 9.6.
Since span{~q0} ⊆ Kk+1 by definition, we have Kk+2 ⊆ Kk+1. Thus, Kk+1 =
Kk+2. Then, we can prove this theorem by induction.
Theorem 9.8
Suppose that the matrix A is nonsingular. Once a breakdown occurs, the solu-
tion to the linear system A~x = ~b lies in Kk+1.
Proof. If A is nonsingular, then by the result of Remark 9.4, zero can-
not be an eigenvalue of Hk+1. Therefore, Hk+1 is an invertible matrix and
AQk+1H
−1
k+1 = Qk+1. Since
~b = Qk+1
(
~e1‖~b‖
)
, it follows that
AQk+1H
−1
k+1
(
~e1‖~b‖
)
= Qk+1
(
~e1‖~b‖
)
= ~b.
Multiplying both sides on the left by A−1 we obtain
Qk+1H
−1
k+1
(
~e1‖~b‖
)
= A−1~b = ~x.
Thus, we have ~x ∈ Kk+1.
9.2. Lanczos Method for Eigenvalue Problems 183
9.2 Lanczos Method for Eigenvalue Problems
The Lanczos method is the Arnoldi method specialised to the case where the
matrix A is symmetric. If A = AT , then the Hessenberg matrix obtained by the
Arnoldi process (for the case k + 1 = n) satisfies
HT = (QTAQ)T = QTATQ = QTAQ = H,
so H is symmetric, which implies that it is tridiagonal.
Therefore, the Arnoldi update formula simplifies from
hk+1,k~qk+1 = A~qk − h0,k~q0 − h1,k~q1 − . . .− hk,k~qk.
to a three-term recursion relation
hk+1,k~qk+1 = A~qk − hk−1,k~qk−1 − hk,k~qk,
with
A

~q0 ~q1 · · · ~qk

︸︷︷︸
n-by-(k+1)
=

~q0 ~q1 · · · ~qk ~qk+1

︸︷︷︸
n-by-(k+2)

h0,0 h0,1 0
h1,0 h1,1 h1,2
h2,1 h2,2 h2,3
...
h3,2 h3,3
. . .
. . .
. . .
0 hk,k−1 hk,k
hk+1,k

︸︷︷︸
(k+2)-by-(k−1)
.
Taking into account the symmetry further and use the notation T˜ instead of
H˜ to denote a tridiagonal matrix, we have
T˜k+1 =

α0 β0
β0 α1 β1 0
β1 α2 β2
. . .
. . .
. . .
. . .
. . .
0 βk−1 αk
βk

.
In a matrix form, we have
AQk+1 = Qk+2T˜k+1.
The simplification of the Arnoldi procedure to compute the orthonormal basis
{~q0, . . . , ~qi} of the Krylov space based on
A~qk = βk−1~qk−1 + αk~qk + βk~qk+1,
184 Chapter 9. Krylov Subspace Methods for Eigenvalues
where αk = (A~qk)
∗~qk, βk−1 = (A~qk)∗~qk−1, and βk+1 is the 2-norm of A~qk −
βk−1~qk−1 − αk~qk. Note that βk−1 was obtained in the previous iteration. This
procedure is called the Lanczos procedure. It can be shown that the Lanczos
procedure is related to the CG algorithm (just like Arnoldi is used by GMRES).
Properties of the Arnoldi procedure, for example, Remark 9.3 - Theorem 9.7 still
hold for the Lanczos procedure.
The Lanczos procedure is given by:
Algorithm 9.9: Lanczos Procedure for an Orthonormal Basis of Kk+1(~b0, A)
Input: a symmetric matrix A ∈ Rn×n; vector ~b0
Output: vectors ~q0, . . . , ~qk that form an orthonormal basis of Kk+1(~b0, A)
1: β−1 = 0, ~q−1 = 0, ~q0 = ~b0/‖~b0‖
2: for i = 0 : (k − 1) do
3: ~v = A~qi
4: αi = ~q
∗
i ~v
5: ~v = ~v − αi~qi − βi−1~qi−1
6: βi = ‖~v‖
7: if βi < tol then
8: Stop
9: end if
10: ~qi+1 = ~v/βi+1
11: end for
Remark 9.10
Note that each iteration of the Lanczos procedure only operates with three
vectors, as opposite to the i+ 1 vectors used in the Arnoldi iteration.
9.3. How Arnoldi/Lanczos Locates Eigenvalues 185
9.3 How Arnoldi/Lanczos Locates Eigenvalues
Eigenvalues of a matrix is defined by the characteristic polynomial. The Arnoldi/
Lanczos procedure implicitly constructs a sequence of polynomials that approx-
imates the characteristic polynomial of a matrix.
The use of Arnoldi/Lanczos procedure for computing eigenvalues proceeds as
follows. For a matrix A ∈ Rn×n, after k Arnoldi/Lanczos iterations, the eigen-
values and eigenvector of the resulting Hessenberg matrix Hk+1 are computed
by standard eigenvalues solvers such as the shifted QR algorithm. These are the
Arnoldi estimates of eigenvalues.
For a large matrix, we often can only perform the Arnoldi/Lanczos proce-
dure k n number of iterations. In this case, we can only obtain estimates
to a maximum of k + 1 eigenvalues. Some of these eigenvalue estimates con-
verges faster to an eigenvalue of A and some of the estimates converges slower.
Typically, estimates of those “extreme” eigenvalues converges faster. That is,
eigenvalues near the edge of the spectrum, or eigenvalues have a big gap with
adjacent eigenvalues.
Here we want to illustrate the idea behind the Arnoldi/Lanczos procedure,
why it tends to find those extreme eigenvalues.
Arnoldi and Polynomial Approximation
Let ~x be a vector in the Krylov subspace Kk(~b,A) generated by the matrix A
and the vector ~b:
Kk(~b,A) = span{~b,A~b,A2~b, . . . , Ak−1~b}.
Such an ~x can be expressed a linear combination of powers of A times ~b,
~x = c0~b+ c1A~b+ · · ·+ ck−1Ak−1~b =
k−1∑
j=0
cjA
j−1~b.
This expression can also be defined as a polynomial of A multiplied by ~b. If p(z)
is a polynomial c0 + c1z + · · · + ck−1zk−1, then we have the matrix polynomial
of A in the form of
p(A) = c0 + c1A+ · · ·+ ck−1Ak−1 =
k−1∑
j=0
cjA
j−1.
This way, we have ~x = p(A)~b. Krylov subspace methods can be analysed in
terms of matrix polynomials.
Definition 9.11
A monic polynomial of degree k is defined as a polynomial
pk(z) = c0 + c1z + · · ·+ ck−1zk−1 + zk.
That is, the coefficient associated with degree k is 1.
Remark 9.12
The characteristic polynomial of a matrix A, pA(λ), is a monic polynomial.
186 Chapter 9. Krylov Subspace Methods for Eigenvalues
Theorem 9.13
Consider the characteristic polynomial of a matrix A, pA(λ), The Cayley-
Hamilton Theorem asserts that the matrix polynomial pA(A) = 0.
Proof. This can be easily verified for that case where the matrix A has an
eigendecomposition, A = V λV −1. We omit the general proof here.
Remark 9.14
The Arnoldi/Lanczos procedure finds a monic polynomial pk(·) such that
‖pk(A)~b‖ is minimised. (9.1)
Once a breakdown occurs, it is not hard to show that the Arnoldi procedure
obtains a monic polynomial such that ‖pk(A)~b‖ = 0. Here we want to look into
this problem before a breakdown.
Theorem 9.15
As long as the Arnoldi procedure does not breakdown (i.e., the Krylov sub-
space Kk(~b,A) is of rank k), the chracteristic polynomial of HK defines the
polynomial solving the problem (9.1).
Proof. We first note that if pk is a monic polynomial, then the vector pkA~b
can be written as
pk(A)~b =
k−1∑
j=0
cjA
j−1
~b
︸︷︷︸
=−Qk~y∈Kk(~b,A)
+Ak~b = Ak~b−Qk~y,
for some ~y ∈ Rk. Since Qk is full rank (of rank k), the problem (9.1) becomes
a least square problem of finding ~y such that
‖Ak~b−Qk~y‖
is minimised. The solution can be obtained at Q∗k(A
k~b−Qk~y) = 0, or equiva-
lently
Q∗kp
k(A)~b = 0.
Now the problem boils down to find the monic polynomial that solves the
above equation.
Consider the following unitary matrix
Q =
[
Qk U
]
,
where Qk is the the matrix consists of the Arnoldi vectors, the first column
of U is the Arnoldi vector ~qk+1 and the other columns of U are orthonormal
9.3. How Arnoldi/Lanczos Locates Eigenvalues 187
vectors. This way, we have the unitary similarity transformation of the matrix
A, which takes the form of
Q∗AQ =
[
Q∗kAQk Q
∗
kAU
U∗AQk U∗AU
]
.
Since AQk = Qk+1H˜k, we have Q
∗
kAQk = Hk and X1 = U
∗AQk = U∗Qk+1H˜k
is a matrix of dimension (n− k)-by-k, with all but the upper-right entry equal
to 0. Let X2 = Q
∗
kAU and X3 = U
∗AU , we have
Q∗AQ =
[
Hk X2
X1 X3
]
= H,
which is block Hessenberg.
Since A = QHQ∗, we can show that pk(A) = Qpk(H)Q∗. Thus, we have
Q∗kp
k(A)~b = Q∗kQp
k(H)Q∗~b.
Given ~b = Q~e1‖~b‖ and Q∗kQ =
[
Ik 0
]
, the above equation can be written
as
Q∗kp
k(A)~b =
[
Ik 0
]
pk(H)~e1‖~b‖,
which is essentially the first k entries of the first column of pk(H).
Because of the block Hessenberg structure of H, the first k entries of the first
column of pk(H) can be given by pk(Hk). If p
k(·) is the characteristic polyno-
mial of Hk, i.e., p
k(λ) = pHk(λ), then by the Cayley-Hamilton Theorem, the
matrix polynomial pk(Hk) equals to 0. This way, the characteristic polynomial
of Hk defines a polynomial solving the problem (9.1).
How Arnoldi/Lanczos Locates Eigenvalues
By projecting the matrix A onto the Krylov subspaceKk(~b,A) represented byQk,
we obtain a matrix Hk. The characteristic polynomial of Hk effectively solves
a polynomial approximation problem, or equivalently, a least square problem
involving the Krylov subspace.
What does the characteristic polynomial of Hk have to do with the eigen-
values of A, or equivalently, the characteristic polynomial of A? There is a
connection between these. If a polynomial pk(·) has the property that pk(A) is
small, effectively we can find the root of pk(·) that are close the roots of pA(·).
Remark 9.16
We can express the vector ~b as a linear combination of eigenvectors, ~v1, ~v2, . . .
associated with coefficients a1, a2, . . ., in the form of
~b =
n∑
i=1
aj~vj .
188 Chapter 9. Krylov Subspace Methods for Eigenvalues
Since
p(A)~vi =
k∑
j=1
cjA
j−1~vi =
k∑
j=1
cjλ
j−1
i ~vi = p(λi)~vi
the vector p(A)~b can be written as
p(A)~b =
n∑
i=1
aip(λi)~vi.
Thus, the eigenvalue estimates obtained from the the Arnoldi procedure de-
pend on the quality of the approximation to p(λi) weighed by ai.
Remark 9.17
If the vector ~b is the linear combination of a limited number of eigenvectors.
The Arnoldi will find the monic polynomial such that ‖p(A)~b‖ = 0, as soon
as p(A)~b can be contained by a Krylov subspace, which is exactly the Krylov
subspace after the breakdown, or equivalently, the subspace spanned by all the
eigenvectors used for constructing ~b.
Example
In general, the shape of the characteristic polynomial is dominated by “extreme”
eigenvalues. Here we illustrate this idea using the following example. Let A be
a 19-dimensional matrix
A = diag([0.1, 0.5, 0.6, 0.7, . . . , 1.9, 2.0, 2.5, 3.0]).
The spectrum of A consists of a dense collection of eigenvalues in the interval
[0.5, 2.0] and some outliers 0.1, 2.5, and 3.0, as shown below.
The crosses are the eigenvalue and the blue line is the characteristic polynomial.
We carry out the Lanczos procedure with a random starting vector ~b0.
Figure 9.3.1 plots the monic polynomials obtained in selected iterations the
Lanczos procedure and their roots. We can observe that the outlier eigenval-
ues are identified first, followed by the eigenvalues on the edge of the interval
[0.5, 2.0]. Those eigenvalues in the middle of the cluster are identified the last. In
summary, those eigenvalue estimates in the region where the characteristic poly-
nomial changes more rapidly converges faster than those in the region where the
characteristic polynomial is flat.
9.3. How Arnoldi/Lanczos Locates Eigenvalues 189
Figure 9.3.1: Estimated monic polynomials obtained by the Lanczos procedure.
190 Chapter 9. Krylov Subspace Methods for Eigenvalues
Chapter 10
Other Eigenvalue Solvers
So far all the eigenvalue solvers we have learned involves some polynomials of a
matrix A. For example, the power iteration, inverse iteration, or more advanced
QR algorithms raise the matrix to some power, and Krylov subspace methods
implicitly construct a complicated matrix polynomial. There is more to the
computation of eigenvalues than using matrix polynomials. Here we introduce
some alternatives for computing eigenvalues.
10.1 Jacobi Method
One of the oldest idea for computing eigenvalues of a matrix is the Jacobi method,
introduced by Jacobi in 1845. Consider a symmetric matrix that of dimension
5 or larger. We know that we have apply iterative method to approximate the
eigenvalues. We also know that a real valued symmetric matrix A has an eigen-
decomposition A = QΛQ>, where Q is orthogonal and Λ is diagonal. Now the
question is that can we create a sequence of orthogonal similarity transformations
such that each transformation will transform the matrix to a “more diagonal”
form? This way, the sequence of transformations will eventually produce a diag-
onal matrix.
The Jacobi method use a sequence of 2-by-2 rotation matrix, called the Jacobi
rotation, which are chosen to eliminate off-diagonal elements while preserving the
eigenvalues. Whilst successive rotations will undo previous introduced zeros, the
off-diagonal elements get smaller until eventually we are left with a diagonal
matrix. By accumulating products of the transformations as we proceed we
obtain the eigenvectors of the matrix.
Consider a 2-by-2 symmetric matrix,
A =
[
a d
d b
]
,
we aim to find a rotation matrix J such that
J>AJ =
[6= 0 0
0 6= 0
]
.
191
192 Chapter 10. Other Eigenvalue Solvers
Definition 10.1
A 2-by-2 rotation matrix is an orthogonal matrix
J =
[
cos(θ) sin(θ)
− sin(θ) cos(θ)
]
=
[
c s
−s c
]
,
for some θ.
It can be shown that for
θ = 0.5 tan−1
(
2d
b− a
)
,
the resulting rotation matrix J can diagonalise the 2-by-2 matrix A.
For a large matrix A ∈ Rn×n where n > 4, we cannot directly diagonalise
the matrix. However, we can diagonalise a 2-by-2 submatrix each time using
the abovementioned Jacobi rotation. Consider we want to rotate the submatrix
A(p,q : p,q). We can first create a 2-by-2 Jacobi rotation matrix based on the
θ angle evaluated using the submatrix A(p,q : p,q). Then, we can embed this
Jacobi matrix in a n-dimensional identity matrix
Qp,q,θ =

1
. . .
1
c s
1
−s s
1
. . .
1

,
where all diagonal elements are 1 apart from two elements c in rows p and
q, and all off-diagonal elements are zero apart from the elements s and −s in
rows and columns q and q. Then the orthogonal similarity transformation A˜ =
Q>p,q,θAQp,q,θ will modifies the p-th and q-th rows and columns of the matrix
A. This orthogonal similarity transformation A˜ = Q>p,q,θAQp,q,θ has several
important properties:
1. Eigenvalues are preserved as it is a similarity transformation.
2. Frobenius norm is preserved as it is an orthogonal transformation.
3. From 2, we know that for the 2-by-2 submatrices A˜(p,q : p,q) andA(p,q : p,q),
we have ∑
p,q
A˜2pq =
∑
p,q
A2pq.
Thus, the p-th and q-th diagonal elements of A˜ and A have the following
property
A˜2pp + A˜
2
qq ≥ A2pp +A2qq,
as A˜pq = A˜qp = 0.
10.1. Jacobi Method 193
Since the matrix property has been preserved in the orthogonal similarity trans-
formation, and the transformed matrix is “more diagonal” than the previous
one. If we repeatedly apply the Jacobi rotation to a matrix, the matrix will be
eventually diagonalised.
One benefit of Jacobi method is that it usually has a better accuracy than
QR algorithms. The Jacobi method is also very easy to parallelise, as we only
need to modify two rows and two columns in each operation. However, matrix
reductions such as tridiagonalisation can not be used, as the Jacobi rotation can
destroy the tridiagonal structure. In general, Jacobi method is computationally
less efficiently than the QR algorithms using the tridiagonal reduction.
194 Chapter 10. Other Eigenvalue Solvers
10.2 Divide-and-Conquer
The divide-and-conquer algorithm, based on a recursive subdivision of a sym-
metric tridiagonal eigenvalue problem into problems of smaller dimensions, rep-
resents the most important advances of eigenvalue problems since 1960s. For
symmetric matrices, the divide-and-conquer algorithm outperformed shifted QR
algorithm, particularly for cases both eigenvalues and eigenvectors are desired,
and became the industrial standard in late 1990s. Here illustrate the idea behind
this powerful method.
Consider we have a n-by-n symmetric tridiagonal matrix,
T =

a1 b1
b1 a2 b2
b2 a3
. . .
. . .
. . .
an−1 bn−1
bn−1 an

.
where all the entries on the subdiagonal and superdiagonal are nonzero, so that
the eigenvalue problem cannot be deflated. The matrix T can be split and par-
titioned into the following matrices:
Here T1 = T (1:k, 1:k) and T2 = T (k+1:n, k+1:n) are the upper-left princi-
pal submatrix and lower-right principal submatrix of T , respectively, and β =
T (k+1, k) = T (k, k+1). The only difference between Tˆ1 and T1 is that lower-
right entry of T1 is replaced by T1(k, 1)−β. A similar modification is also applied
to T2 to obtain Tˆ2.
Now we can write the tridiagonal matrix A as the summation of a 2-by-2
block-diagonal matrix with tridiagonal blocks and a rank one update. Since the
eigenvalues of Tˆ1 and Tˆ2 can be solved separately, we can first find the eigen-
vector and eigenvalues of two reduced dimensional matrices, and then express
the eigenvalues of T as a function of eigenvalues of Tˆ1 and Tˆ2 and the rank one
update. Since the submatrices Tˆ1 and Tˆ2 are also symmetric and tridiagonal, we
can recursively apply this procedure to divide the problem into eigenvalues prob-
lems of small matrices where we can apply either analytically formula or other
computational methods that are computationally efficient for small matrices.
The key step in this recursive process is to identify the eigendecomposition
of T given the eigendecompositions of Tˆ1 and Tˆ2. Consider we have computed
the eigendecompositions of Tˆ1 and Tˆ2,
Tˆ1 = Q1D1Q
>
1 , and Tˆ2 = Q2D2Q
>
2 .
Since we can express the matrix T as
T =
[
Tˆ1
Tˆ2
]
+ β~y~y>,
10.2. Divide-and-Conquer 195
where ~y = [~e>k ~e
>
1 ]
> is a vector that have all elements are zero valued except the
value 1 in k-th and (k + 1)-th entries. Introducing an orthogonal matrix
Q =
[
Q1
Q2
]
,
the matrix Q>TQ can be written as
Q>TQ =
[
Q>1
Q>2
]([
Tˆ1
Tˆ2
]
+ β
[
~ek
~e1
] [
~ek
~e1
]>)[
Q1
Q2
]
=
[
D1
D2
]
︸︷︷︸
D
+β
[
~z1
~z2
]
︸︷︷︸
~z
[
~z1
~z2
]>
,
where ~z1 is the last row of Q1 and ~z1 is the fir row of Q2. Now the problem is
reduced to find the eigenvalues and eigenvectors of
D + β~z~z>,
which is a diagonal matrix plus a rank one update.
Suppose all the entries of the vector ~z is nonzero, otherwise the eigenvalue
problem can be deflated. Let dj = D(j, j and zj = ~z(j). The eigenvalue of this
matrix is simply the roots of the polynomial
f(λ) = 1 + β
n∑
j=1
z2j
dj − λ.
The roots of this function is contained in intervals (dj , dj+1), as shown below.
The roots of this polynomial can be rapidly identified using methods such as the
Newton’s method, as we know exactly the intervals where each eigenvalue lies in
and the function f(λ) is a monotone function in each interval.
The above assertion can be justified by considering the eigenvalue and eigen-
vector of D + β~z~z>, which take the form of
(D + β~z~z>)~q = λ~q,
196 Chapter 10. Other Eigenvalue Solvers
which leads to
(D − λI)~q + β~z(~z>~q) = 0.
Remark 10.2
Here ~z>~q cannot be zero. We can show this by contradiction. If ~z>~q = 0, then
we have (D − λI)~q = 0, so the vector ~q is an eigenvector of D, which means
~q has only one nonzero element as D is diagonal. This way, ~z>~q 6= 0 as all
entries of ~z are nonzero.
Remark 10.3
We also note that λ cannot be eigenvalues of D. We can show this by con-
tradiction. If λ is an eigenvalue of D, then D − λI has zeros entries on the
diagonal, as eigenvalues of D are those entries on the diagonal. This way,
the vector (D − λI)~q has zero entries. Since ~z>~q 6= 0 and all entries of ~z
are nonzero, all entries of the vector β~z(~z>~q) are nonzero. Thus, the vector
(D− λI)~q + β~z(~z>~q) must have nonzero entries, which is contradiction to the
assumption that λ and ~q are eigenvalue and eigenvector of D + β~z~z>.
Use Remark 10.3, we know that D − λI is invertible, so multiplying both
sides of the above equation by (D − λI)−1 (on the right), we have
~q + β(D − λI)−1~z(~z>~q) = 0,
then multiplying both sides of the above equation by ~z> (on the right) leads to
~z>~q + β~z>(D − λI)−1~z(~z>~q) = (~z>~q) (1 + β~z>(D − λI)−1~z))︸︷︷︸
f(λ)
= 0.
Since ~z>~q 6= 0 by Remark 10.2, we have f(λ) = 0 for all eigenvalues.
Appendix A
Appendices
197
198 Appendix A. Appendices
A.1 Notation
A.1.1 Vectors and Matrices
• ~x is a column vector in Rn; xi is the ith component of ~x.
• We may also write ~x = (x1, . . . , xn)T , where ~xT = (x1, . . . , xn) is a row
vector.
• A is a matrix in Rm×n. The element of A in row i and column j is referred
to by aij .
• The jth column of matrix A is referred to by ~aj . So
A = [~a1| . . . |~an ] .
• Sometimes we use the notation (A)ij for the element of A in position ij.
For example, we can say (AT )ij = aji. We can also use this to refer to a
row of A (as a row vector): the ith row of A can be indicated by (A)i∗,
where the ∗ means all columns j.
Something to remember . . .
In these notes, all vectors ~x are column vectors.
A.1.2 Inner Products
We express the standard Euclidean inner product of vectors ~x, ~y ∈ Rn, in
one of the following equivalent ways:
~xT~y = < ~x, ~y > .
Of course, we also have
~xT~y = ~yT~x =< ~x, ~y >=< ~y, ~x > .
Similarly,
< ~x,A~y > = ~xTA~y
= (AT~x)T~y
=< AT~x, ~y >,
since (AB)T = BTAT .
A.1.3 Block Matrices
Example A.1: Matrix-Matrix Product in Block Form
Let E ∈ R5×7 and F ∈ R7×6. When performing the matrix product E F ,
we can divide E and F in blocks with compatible dimensions, and write the
matrix-matrix product in block form as
A.1. Notation 199
200 Appendix A. Appendices
A.2 Vector Norms
A.2.1 Vector Norms
Definition A.2: Norm on a Vector Space
Let V be a vector space over R. The function ‖ · ‖ : V → R is a norm on V
if ∀ ~x, ~y ∈ V and ∀ a ∈ R, the following hold:
1. ‖~x‖ ≥ 0, and ‖~x‖ = 0 iff ~x = 0
2. ‖a~x‖ = |a|‖~x‖
3. ‖~x+ ~y‖ ≤ ‖~x‖+ ‖~y‖
Definition A.3: p-Norms on Rn
Let ~x ∈ Rn. We consider the following vector norms ‖~x‖p, for p = 1, 2,∞:
‖~x‖2 =
√√√√ n∑
i=1
x2i =
√
~xT~x
‖~x‖1 =
n∑
i=1
|xi|
‖~x‖∞ = max
1≤i≤n
|xi|
Theorem A.4: Cauchy-Schwarz Inequality
Let ~x, ~y ∈ Rn. Then
|~xT~y| ≤ ‖~x‖2‖~y‖2.
A.2.2 A-Norm
The vector 2-norm is induced by the Euclidean inner product:
‖~x‖2 =
√
~xT~x =
√
< ~x, ~x >.
More generally, if A ∈ Rn×n is symmetric positive definite, it can be used to
define an A-inner product, which induces the A-norm.
Definition A.5: A-Inner Product
Let A ∈ Rn×n be symmetric positive definite, and ~x, ~y ∈ Rn. Then
< ~x, ~y >A =< ~x,A~y >
= ~xTA~y
is called the A-inner product of ~x and ~y.
A.2. Vector Norms 201
Definition A.6: A-Norm on Rn
Let A ∈ Rn×n be symmetric positive definite. Then
‖~x‖A =
√
< ~x, ~x >A
=
√
~xTA~x
is a norm on Rn, called the A-norm.
Note that we recover the 2-norm for A = I.
202 Appendix A. Appendices
A.3 Orthogonality
Definition A.7: Orthogonal Vectors
~x, ~y ∈ Rn are orthogonal if
~xT~y = 0.
We may also write ~xT~y = 0 as < ~x, ~y >= 0.
Theorem A.8: Pythagorean Law
If ~x and ~y are orthogonal, then
‖~x+ ~y‖22 = ‖~x‖22 + ‖~y‖22.
Proof.
‖~x+ ~y‖22 =< ~x+ ~y, ~x+ ~y >
=< ~x, ~x > +2 < ~x, ~y > + < ~y, ~y >
= ‖~x‖22 + ‖~y‖22
Definition A.9: Orthogonal Matrices
A ∈ Rn×n is called an orthogonal matrix if
ATA = I.
This means that the columns of A are of length 1 and mutually orthogo-
nal. (So the term ‘orthogonal matrix’ is really a misnomer; in a perfect world
these matrices would be called ‘orthonormal matrices’.) ATA = I implies that
det(A)2 = 1, so A−1 exists and AT = A−1. Also, then, AAT = I, meaning that
the rows of an orthogonal matrix are also orthogonal.
A.4. Matrix Rank and Fundamental Subspaces 203
A.4 Matrix Rank and Fundamental Subspaces
Definition A.10: Range and Nullspace
Let A ∈ Rm×n.
The range or column space of A is defined as
range(A) = {~y ∈ Rm|~y = A~x =
n∑
i=1
~aixi for some ~x ∈ Rn}.
The kernel or null space of A is defined as
null(A) = {~x ∈ Rn|A~x = 0}.
Similarly, the row space of A (the space spanned by the rows of A) is, in
fact, the column space of AT , i.e., range(AT ).
The rank r of a matrix A is the dimension of the column space:
Definition A.11: Rank
rank(A) = dim(range(A))
This is the number of linearly independent columns of A. It can be shown that
this equals the number of linearly independent rows ofA, i.e., r = dim(range(A)) =
dim(range(AT )).
Theorem A.12: Dimensions of Fundamental Subspaces
Let A ∈ Rm×n. Then
1. dim(range(A)) + dim(null(AT )) = m
2. dim(range(AT )) + dim(null(A)) = n
Theorem A.13: Orthogonality of Fundamental Subspaces
range(A) and null(AT ) are orthogonal subspaces of Rm
Proof. If ~yr ∈ range(A), then ~yr = A~x for some ~x. If ~yn ∈ null(AT ), then
AT~yn = 0. Then
~yTr ~yn = (A~x)
T~yn = ~x
TAT~yn = 0.
204 Appendix A. Appendices
A.5 Matrix Determinants
Definition A.14
The determinant of a matrix A ∈ Rn×n is given by
det(A) =
n∑
j=1
(−1)i+jaij det(Aij), for fixed i,
with
Aij =

a11 a12 · · · a(1)(j−1) a(1)(j+1) · · · a1n
a21 a22 · · · a(2)(j−1) a(2)(j+1) · · · a2n
...
...
...
...
...
a(i−1)(1) a(i−1)(2) · · · a(i−1)(j−1) a(i−1)(j+1) · · · a(i−1)(n)
a(i+1)(1) a(i+1)(2) · · · a(i+1)(j−1) a(i+1)(j+1) · · · a(i+1)(n)
...
...
...
...
...
an1 an2 · · · a(n)(j−1) a(n)(j+1) · · · ann

.
i.e. the matrix Aij is an (n− 1)× (n− 1) matrix obtained by removing row i
and column j from the original matrix A.
Theorem A.15
If A ∈ Rn×n is a triangular matrix, then
det(A) =
n∏
i=1
aii.
A.6. Eigenvalues 205
A.6 Eigenvalues
We consider square real matrices A ∈ Rn×n.
A.6.1 Eigenvalues and Eigenvectors
Definition A.16: Eigenvalues and Eigenvectors
Let A ∈ Rn×n. λ is called an eigenvalue of A if there is a vector ~x 6= 0 such
that
A~x = λ~x,
where ~x is called an eigenvector associated with λ.
Notes:
• The eigenvalue may equal zero, but the eigenvector is required to be
nonzero.
• If ~x is an eigenvector of A with associated eigenvalue λ, then a~x for any
a ∈ R \ 0 is also an eigenvector of A, associated with the same eigenvalue.
Definition A.17: Characteristic Polynomial
Let A ∈ Rn×n. The degree-n polynomial
p(λ) = det(A− λI)
is called the characteristic polynomial of A.
The characteristic polynomial can be factored as p(λ) = (λ1−λ) . . . (λn−λ),
where λ1, . . . , λn are the n eigenvalues of A, which we order as
|λ1| ≤ |λ2| ≤ . . . ≤ |λn|.
Note that some eigenvalues may occur multiple times, and some may be complex
(in which case they occur in complex conjugate pairs).
Definition A.18: Algebraic and Geometric Multiplicity
Let A ∈ Rn×n. The algebraic multiplicity of an eigenvalue λi of A, µA(λi),
is the multiplicity of λi as a root of p(λ).
The geometric multiplicity of λi, µG(λi), is the number of linearly indepen-
dent eigenvectors associated with λi.
In other words, the geometric multiplicity µG(λi) = dim(E), where E =
{~x | (A− λiI)~x = 0} is the eigenspace associated with λi.
Theorem A.19: Relation of Algebraic and Geometric Multiplicities
Let A ∈ Rn×n. The algebraic and geometric multiplicities of the eigenvalues
satisfy the following properties.
1. µA(λi) ≥ µG(λi) ≥ 1 for all i = 1, . . . , n
2. A has n linearly independent eigenvectors iff µA(λi) = µG(λi) for all
i = 1, . . . , n.
If A has n linearly independent eigenvectors, it can be diagonalised.
206 Appendix A. Appendices
A.6.2 Similarity Transformations
Definition A.20: Similarity Transformation
Let A,B ∈ Rn×n with B nonsingular. Then the transformation from A to
B−1AB is called a similarity transformation of A. A and B−1AB are
called similar.
Theorem A.21: Eigenvalues of Similar Matrices
Let A,B ∈ Rn×n with B nonsingular. Then A and B−1AB have the same
eigenvalues (with the same algebraic and geometric multiplicities).
This can be shown using that
A~x = λ~x, ~x 6= 0
is equivalent with
AB~y = λB~y, ~y 6= 0,
for ~y given by ~y = B−1~x. This is equivalent with
(B−1AB)~y = λ~y, ~y 6= 0,
so any eigenvalue of A is also an eigenvalue of B−1AB.
A.6.3 Diagonalisation
Definition A.22: Diagonalisable and Defective Matrices
Let A ∈ Rn×n. A is called diagonalisable if it has n linearly independent
eigenvectors; otherwise, it is called defective.
Suppose A ∈ Rn×n has n linearly independent eigenvectors ~xi. Let X be the
matrix with the eigenvectors as its columns:
X = [~x1| . . . |~xn] .
Then
AX = X Λ,
with
Λ =

λ1 0 · · · 0
0 λ2 · · · 0
...
...
. . .
...
0 0 · · · λn
 ,
or
X−1AX = Λ,
i.e., the similarity transformation with X diagonalises A.
If A is defective, it can be transformed into the so-called Jordan form (which, in
some sense, is almost diagonal), using its n generalised eigenvectors. We won’t
need to consider the Jordan form in these notes.
A.6. Eigenvalues 207
A.6.4 Singular Values of a Square Matrix
Let A ∈ Rn×n. Let λi(ATA) and λi(AAT ), i = 1, . . . , n, be the eigenvalues of
ATA and AAT , respectively, numbered in order of decreasing magnitude. Note
that ATA and AAT are symmetric, so their eigenvalues are real, and they are
positive semi-definite, so their eigenvalues are nonnegative. It can be shown they
have the same eigenvalues.
Definition A.23: Singular Values of a Square Matrix
Let A ∈ Rn×n. Then
σi(A) =
√
λi(ATA)
=
√
λi(AAT ), i = 1, . . . , n,
are called the singular values of A.
208 Appendix A. Appendices
A.7 Symmetric Matrices
We consider square matrices A ∈ Rn×n.
Definition A.24
A ∈ Rn×n is called symmetric if
A = AT .
Theorem A.25: Eigenvalues and Eigenvectors of a Symmetric Matrix
Let A ∈ Rn×n. If A is symmetric, then the eigenvalues of A are real and A
has n linearly independent eigenvectors that can be chosen orthogonally.
Definition A.26
A ∈ Rn×n is called symmetric positive definite (SPD) if
A is symmetric and ~xTA~x > 0 for all ~x 6= 0.
Theorem A.27: Eigenvalues of an SPD Matrix
A symmetric matrix A ∈ Rn×n is SPD iff
λi > 0 for all i = 1, . . . , n.
Proof.
⇒ Assume A is SPD. Then ~xTA~x > 0 for all ~x 6= 0. Thus, ~xTi A~xi =
λi‖~xi‖22 > 0 for any eigenvalue λi with associated eigenvector ~xi since ~xi 6= 0.
This implies that λi > 0.
⇐ Assume λi > 0 for all i. A has n mutually orthogonal eigenvectors ~xi since
it is symmetric, and any ~x 6= 0 can be expressed in the basis of the orthogonal
eigenvectors. So ~x =
∑n
i=1 ci~xi where at least one of the ci 6= 0. Thus, for any
~x 6= 0,
~xTA~x = (
n∑
i=1
ci~x
T
i )(
n∑
j=1
cjA~xj)
= (
n∑
i=1
ci~x
T
i )(
n∑
j=1
cjλj~xj)
=
n∑
i=1
n∑
j=1
cicjλj~x
T
i ~xj
=
n∑
i=1
c2iλi~x
T
i ~xi (due to orthogonality)
=
n∑
i=1
c2iλi‖~xi‖22 > 0,
so A is SPD.
A.7. Symmetric Matrices 209
Note that an SPD matrixA is nonsingular (it does not have a zero eigenvalue).
Definition A.28
A ∈ Rn×n is called symmetric positive semi-definite (SPSD) if
A is symmetric and ~xTA~x ≥ 0 for all ~x 6= 0.
Theorem A.29: Eigenvalues of an SPSD Matrix
A symmetric matrix A ∈ Rn×n is SPSD iff
λi ≥ 0 for all i = 1, . . . , n.
210 Appendix A. Appendices
A.8 Matrices with Special Structure or Properties
Some matrices have a special structure, which may imply special properties.
A.8.1 Diagonal Matrices
Definition A.30
Let A ∈ Rn×n. Then
1. A is called a diagonal matrix if aij = 0 for all i 6= j. With ~a the
diagonal of a diagonal matrix A, we also write A = diag(~a). For any
matrix A (also nondiagonal), we indicate its diagonal by ~a = diag(A).
2. A is called a tridiagonal matrix if aij = 0 for all i, j satisfying |i−j| >
1.
A.8.2 Triangular Matrices
Definition A.31
1. U ∈ Rn×n is called an upper triangular matrix if uij = 0 for all i > j.
2. L ∈ Rn×n is called a unit lower triangular matrix if lij = 0 for all
i < j, and lii = 1 for all i.
Note that det(U) =
∏n
i=1 uii and det(L) = 1.
A.8.3 Permutation Matrices
Definition A.32
P ∈ Rn×n is called a permutation matrix if P can be obtained from the
n× n identity matrix I by exchanging rows.
Note that P has exactly one 1 in each row and column, and is otherwise 0.
Note also that permutation matrices are orthogonal, i.e., PPT = I, or P−1 =
PT , and det(P ) = ±1, depending on the parity of the permutation.
A.8.4 Projectors
Definition A.33
P ∈ Rn×n is called a projector if
P 2 = P.
I − P is also a projector, called the complementary projector to P .
Note: P separates Rn into two subspaces, S1 = range(P ) and S2 = null(P ).
We have ~x = P~x+(I−P )~x, where P~x ∈ S1, and (I−P )~x ∈ S2 since P (I−P )~x =
(P − P 2)~x = 0. P projects ~x into S1 along S2. For example, P (~x + ~y) = P~x if
~y ∈ S2 = null(P ).
A.8. Matrices with Special Structure or Properties 211
Definition A.34
P ∈ Rn×n is called an orthogonal projector if
P 2 = P and PT = P.
If P is an orthogonal projector, S1 = range(P ) and S2 = null(P ) are orthog-
onal: (P~x)T~y = ~xTPT~y = ~xTP~y = 0 if ~y ∈ null(P ). So P projects ~x into S1
along S2, where S2 is orthogonal to S1.
212 Appendix A. Appendices
A.9 Big O Notation
A.9.1 Big O as h→ 0
Consider scalar functions f(x) and g(x) of a real variable x.
Definition A.35
f(h) = O(g(h)) as h→ 0+ if
∃ c > 0, ∃h0 > 0: |f(h)| ≤ c |g(h)| ∀h with 0 ≤ h ≤ h0
Example A.36
Let
f(h) = 3h2 + 4h3.
Then
f(h) = O(h2) as h→ 0
6= O(h3)
= O(h).
In words: f(h) approaches 0 at least as fast as h2 (up to a multiplicative
constant), but not as fast as h3, and, clearly, also at least as fast as h. Note
that 3h2 is the dominant term in f(h) as h→ 0.
A.9.2 Big O as n→∞
Consider scalar functions f(n) and g(n) of an integer variable n.
Definition A.37
f(n) = O(g(n)) as n→∞ if
∃ c > 0, ∃N0 ≥ 0: |f(n)| ≤ c |g(n)| ∀n ≥ N0
Example A.38
Let
f(n) = 3n2 + 4n3.
Then
f(n) = O(n3) as n→∞
6= O(n2)
= O(n4).
In words: f(n) approaches ∞ not faster than n3 (up to a multiplicative con-
stant), but faster than n2, and, clearly, also not faster than n4. Note that 4n3
is the dominant term in f(n) as n→∞.
A.10. Sparse Matrix Formats 213
A.10 Sparse Matrix Formats
When matrices are sparse, it is often advantageous to store them in computer
memory using sparse matrix formats. This can save large amounts of memory
space, and it can also make computations faster if one implements methods
that eliminate multiplications or additions with 0 (e.g., when computing matrix-
vector or matrix-matrix products).
Consider, for example, the following sparse matrix, of which we will only
store the nonzero elements and their locations:
A =

16 0 −18 0
0 12 0 0
0 0 14 18
0 12 11 10
 . (A.1)
In all what follows, i refers to rows, and j refers to columns.
A.10.1 Simple List Storage
A simple sparse storage format is to store the (i, j, value) triplets in a list, e.g.,
ordered by row starting from row 1 and from left to right:
val 16 -18 12 14 18 12 11 10
i 1 1 2 3 3 4 4 4
j 1 3 2 3 4 2 3 4
A.10.2 Compressed Sparse Column Format
An alternative with some advantages is the Compressed Sparse Column (CSC)
format, which Matlab uses internally.
In this format, the val array stores the nonzero values, ordered by column,
starting from column 1, and from top to bottom within a column. The i val
array stores the row index for each nonzero value.
The j ptr array saves on storage versus the j array in the simple list storage,
as follows: j ptr has one entry per column, and the entry indicates for each
column where it starts in the val and i val arrays. The j ptr array has one
additional entry at the end, which contains nnz(A) + 1.
val 16 12 12 -18 14 11 18 10
i val 1 2 4 1 3 4 3 4
j ptr 1 2 4 7 9
As such, j ptr(k) indicates where column k starts in the val and i val arrays,
and j ptr(k+1)-j ptr(k) indicates how many elements there are in row k.
Some advantages of the Compressed Sparse Column format:
• saves on storage space versus dense format, and, in many practical cases,
versus simple list storage
• finding all nonzeros in a given column of A is very fast
Note, however, that finding all nonzero elements in a row of a sparse Matlab
matrix can be very time-consuming! (Because the elements are stored per col-
umn.) So if one needs to access rows of a sparse A repeatedly, it can be much
faster to store AT as a sparse matrix instead and access its columns.
214 Appendix A. Appendices
Bibliography
[Ascher and Greif, 2011] Ascher, U. M. and Greif, C. (2011). A first course on numer-
ical methods. SIAM, http://epubs.siam.org.ezproxy.lib.monash.edu.au/doi/book/
10.1137/9780898719987.
[Bjo¨rck, 2015] Bjo¨rck, A˚. (2015). Numerical methods in matrix computations.
Springer, https://link-springer-com.ezproxy.lib.monash.edu.au/book/10.1007/
978-3-319-05089-8.
[Demmel, 1997] Demmel, J. W. (1997). Applied numerical linear algebra. SIAM, http:
//epubs.siam.org.ezproxy.lib.monash.edu.au/doi/book/10.1137/1.9781611971446.
[Gander et al., 2014] Gander, W., Gander, M. J., and Kwok, F. (2014). Sci-
entific computing-An introduction using Maple and MATLAB, volume 11.
Springer, https://link-springer-com.ezproxy.lib.monash.edu.au/book/10.1007/
978-3-319-04325-8.
[Linge and Langtangen, 2016] Linge, S. and Langtangen, H. P. (2016). Program-
ming for Computations-MATLAB/Octave: A Gentle Introduction to Numer-
ical Simulations with MATLAB/Octave. Springer, https://link-springer-
com.ezproxy.lib.monash.edu.au/book/10.1007/978-3-319-32452-4.
[Quarteroni et al., 2010] Quarteroni, A., Sacco, R., and Saleri, F. (2010).
Numerical mathematics, volume 37. Springer, https://link-springer-
com.ezproxy.lib.monash.edu.au/book/10.1007/b98885.
[Saad, 2003] Saad, Y. (2003). Iterative methods for sparse linear systems. SIAM,
http://www-users.cs.umn.edu/~saad/IterMethBook 2ndEd.pdf.
[Saad, 2011] Saad, Y. (2011). Numerical Methods for Large Eigenvalue Problems: Re-
vised Edition. SIAM, http://www-users.cs.umn.edu/~saad/eig book 2ndEd.pdf.
[Trefethen and Bau III, 1997] Trefethen, L. N. and Bau III, D. (1997). Numerical lin-
ear algebra, volume 50. SIAM, on overnight reserve in library.
[Winlaw et al., 2015] Winlaw, M., Hynes, M. B., Caterini, A., and De Sterck, H.
(2015). Algorithmic acceleration of parallel ALS for collaborative filtering: Speeding
up distributed big data recommendation in Spark. In 2015 IEEE 21st International
Conference on Parallel and Distributed Systems (ICPADS), pages 682–691. IEEE.
215