Lecture Notes for

MTH3320

Computational Linear Algebra

Tiangang Cui & Hans De Sterck

May 13, 2019

2

Contents

Preface vii

I Linear Systems of Equations 1

1 Introduction and Model Problems 3

1.1 A Simple 1D Example from Structural Mechanics . . . . . . 3

1.1.1 Discretising the ODE . . . . . . . . . . . . . . . 4

1.1.2 Formulation as a Linear System . . . . . . . . . 5

1.1.3 Solving the Linear System . . . . . . . . . . . . . 6

1.2 A 2D Example: Poisson’s Equation for Heat Conduction . . 7

1.2.1 Discretising the PDE . . . . . . . . . . . . . . . 8

1.2.2 Formulation as a Linear System . . . . . . . . . 8

1.2.3 Solving the Linear System . . . . . . . . . . . . . 10

1.3 An Example from Data Analytics: Netflix Movie Recommen-

dation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3.1 Movie Recommendation using Linear Algebra and

Optimisation . . . . . . . . . . . . . . . . . . . . 12

1.3.2 An Alternating Least Squares Approach to Solv-

ing the Optimisation Problem . . . . . . . . . . 14

2 LU Decomposition for Linear Systems 17

2.1 Gaussian Elimination and LU Decomposition . . . . . . . . 17

2.1.1 Gaussian Elimination . . . . . . . . . . . . . . . 17

2.1.2 LU Decomposition . . . . . . . . . . . . . . . . . 18

2.1.3 Implementation of LU Decomposition and Com-

putational Cost . . . . . . . . . . . . . . . . . . . 20

2.2 Banded LU Decomposition . . . . . . . . . . . . . . . . . . . 23

2.3 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3.1 Definition of Matrix Norms . . . . . . . . . . . . 25

2.3.2 Matrix Norm Formulas . . . . . . . . . . . . . . 26

2.3.3 Spectral Radius . . . . . . . . . . . . . . . . . . 28

2.4 Floating Point Number System . . . . . . . . . . . . . . . . 29

2.4.1 Floating Point Numbers . . . . . . . . . . . . . . 29

2.4.2 Rounding and Unit Roundoff . . . . . . . . . . . 29

2.4.3 IEEE Double Precision Numbers . . . . . . . . . 30

2.4.4 Rounding and Basic Arithmetic Operations . . . 32

2.5 Conditioning of a Mathematical Problem . . . . . . . . . . . 32

i

ii Contents

2.5.1 Conditioning of a Mathematical Problem . . . . 32

2.5.2 Conditioning of Elementary Operations . . . . . 33

2.5.3 Conditioning of Solving a Linear System . . . . . 38

2.6 Stability of a Numerical Algorithm . . . . . . . . . . . . . . 40

2.6.1 A Simple Example of a Stable and an Unstable

Algorithm . . . . . . . . . . . . . . . . . . . . . . 40

2.6.2 Stability of LU Decomposition . . . . . . . . . . 42

3 Least-Squares Problems and QR Factorisation 45

3.1 Gram-Schmidt Orthogonalisation and QR Factorisation . . . 45

3.1.1 Gram-Schmidt Orthogonalisation . . . . . . . . . 45

3.1.2 QR Factorisation . . . . . . . . . . . . . . . . . . 47

3.1.3 Modified Gram-Schmidt Orthogonalisation . . . 47

3.2 QR Factorisation using Householder Transformations . . . . 48

3.2.1 Householder Reflections . . . . . . . . . . . . . . 49

3.2.2 Using Householder Reflections to Compute the

QR Factorisation . . . . . . . . . . . . . . . . . . 51

3.2.3 Computing Q . . . . . . . . . . . . . . . . . . . . 52

3.2.4 Computational Work . . . . . . . . . . . . . . . 53

3.3 Overdetermined Systems and Least-Squares Problems . . . . 54

3.3.1 The Normal Equations – A Geometric View . . . 55

3.3.2 The Normal Equations . . . . . . . . . . . . . . 55

3.3.3 Computational Work for Forming and Solving

the Normal Equations . . . . . . . . . . . . . . . 57

3.3.4 Numerical Stability of Using the Normal Equations 57

3.4 Solving Least-Squares Problems using QR Factorisation . . . 57

3.4.1 Geometric Interpretation in Terms of Projection

Matrices . . . . . . . . . . . . . . . . . . . . . . . 58

3.5 Alternating Least-Squares Algorithm for Movie Recommen-

dation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5.1 Least-Squares Subproblems for Movie Recommen-

dation . . . . . . . . . . . . . . . . . . . . . . . . 60

4 The Conjugate Gradient Method for Sparse SPD Systems 63

4.1 An Optimisation Problem Equivalent to SPD Linear Systems 64

4.2 The Steepest Descent Method . . . . . . . . . . . . . . . . . 64

4.3 The Conjugate Gradient Method . . . . . . . . . . . . . . . 68

4.4 Properties of the Conjugate Gradient Method . . . . . . . . 70

4.4.1 Orthogonality Properties of Residuals and Step

Directions . . . . . . . . . . . . . . . . . . . . . . 70

4.4.2 Optimal Error Reduction in the A-Norm . . . . 73

4.4.3 Convergence Speed . . . . . . . . . . . . . . . . . 75

4.5 Preconditioning for the Conjugate Gradient Method . . . . . 77

4.5.1 Preconditioning for Solving Linear Systems . . . 77

4.5.2 Left Preconditioning for CG . . . . . . . . . . . 78

4.5.3 Preconditioned CG (PCG) Algorithm . . . . . . 79

4.5.4 Preconditioners for PCG . . . . . . . . . . . . . 82

4.5.5 Using Preconditioners as Stand-Alone Iterative

Methods . . . . . . . . . . . . . . . . . . . . . . 83

Contents iii

5 The GMRES Method for Sparse Nonsymmetric Systems 87

5.1 Minimising the Residual . . . . . . . . . . . . . . . . . . . . 87

5.2 Arnoldi Orthogonalisation Procedure . . . . . . . . . . . . . 88

5.3 GMRES Algorithm . . . . . . . . . . . . . . . . . . . . . . . 90

5.4 Convergence Properties of GMRES . . . . . . . . . . . . . . 92

5.5 Preconditioned GMRES . . . . . . . . . . . . . . . . . . . . 93

5.6 Lanczos Orthogonalisation Procedure for Symmetric Matrices 93

II Eigenvalues and Singular Values 97

6 Basic Algorithms for Eigenvalues 99

6.1 Example: Page Rank and Stochastic Matrix . . . . . . . . . 99

6.2 Fundamentals of Eigenvalue Problems . . . . . . . . . . . . . 104

6.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . 104

6.2.2 Eigenvalue and Eigenvector . . . . . . . . . . . . 105

6.2.3 Similarity Transformation . . . . . . . . . . . . . 106

6.2.4 Eigendecomposition, Diagonalisation, and Schur

Factorisation . . . . . . . . . . . . . . . . . . . . 107

6.2.5 Extending Orthogonal Vectors to a Unitary Matrix110

6.3 Power Iteration and Inverse Iteration . . . . . . . . . . . . . 112

6.3.1 Power Iteration . . . . . . . . . . . . . . . . . . . 112

6.3.2 Convergence of Power Iteration . . . . . . . . . . 113

6.3.3 Shifted Power Method . . . . . . . . . . . . . . . 115

6.3.4 Inverse Iteration . . . . . . . . . . . . . . . . . . 115

6.3.5 Convergence of Inverse Iteration . . . . . . . . . 116

6.4 Symmetric Matrices and Rayleigh Quotient Iteration . . . . 119

6.4.1 Rate of Convergence . . . . . . . . . . . . . . . . 119

6.4.2 Power Iteration and Inverse Iteration for Sym-

metric Matrices . . . . . . . . . . . . . . . . . . . 119

6.4.3 Rayleigh Quotient Iteration . . . . . . . . . . . . 121

6.4.4 Summary of Power, Inverse, and Rayleigh Quo-

tient Iterations . . . . . . . . . . . . . . . . . . . 123

7 QR Algorithm for Eigenvalues 125

7.1 Two Phases of Eigenvalue Computation . . . . . . . . . . . . 125

7.2 Hessenberg Form and Tridiagonal Form . . . . . . . . . . . . 127

7.2.1 Householder Reduction to Hessenberg Form . . . 129

7.2.2 Implementation and Computational Cost . . . . 131

7.2.3 The Symmetric Case: Reduction to Tridiagonal

Form . . . . . . . . . . . . . . . . . . . . . . . . 132

7.2.4 QR Factorisation of Hessenberg Matrices . . . . 134

7.3 QR algorithm without shifts . . . . . . . . . . . . . . . . . . 136

7.3.1 Connection with Simultaneous Iteration . . . . . 136

7.3.2 Convergence to Schur Form . . . . . . . . . . . . 139

7.3.3 The Role of Hessenberg Form . . . . . . . . . . . 140

7.4 Shifted QR algorithm . . . . . . . . . . . . . . . . . . . . . . 145

7.4.1 Connection with Inverse Iteration . . . . . . . . 145

7.4.2 Connection with Shifted Inverse Iteration . . . . 147

7.4.3 Connection with Rayleigh Quotient Iteration . . 147

iv Contents

7.4.4 Wilkinson Shift . . . . . . . . . . . . . . . . . . . 148

7.4.5 Deflation . . . . . . . . . . . . . . . . . . . . . . 148

8 Singular Value Decomposition 151

8.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . 151

8.1.1 Understanding SVD . . . . . . . . . . . . . . . . 151

8.1.2 Full SVD and Reduced SVD . . . . . . . . . . . 153

8.1.3 Properties of SVD . . . . . . . . . . . . . . . . . 155

8.1.4 Compare SVD to Eigendecomposition . . . . . . 156

8.2 Computing SVD . . . . . . . . . . . . . . . . . . . . . . . . . 158

8.2.1 Connection with Eigenvalue Solvers . . . . . . . 158

8.2.2 A Different Connection with Eigenvalue Solvers . 159

8.2.3 Bidiagonalisation . . . . . . . . . . . . . . . . . . 160

8.3 Low Rank Matrix Approximation using SVD . . . . . . . . . 164

8.4 Pseudo Inverse and Least Square Problems using SVD . . . 166

8.5 X-Ray Imaging using SVD . . . . . . . . . . . . . . . . . . . 170

8.5.1 Mathematical Model . . . . . . . . . . . . . . . . 170

8.5.2 Computational Model . . . . . . . . . . . . . . . 171

8.5.3 Image Reconstruction . . . . . . . . . . . . . . . 172

9 Krylov Subspace Methods for Eigenvalues 177

9.1 The Arnoldi Method for Eigenvalue Problems . . . . . . . . 177

9.2 Lanczos Method for Eigenvalue Problems . . . . . . . . . . . 183

9.3 How Arnoldi/Lanczos Locates Eigenvalues . . . . . . . . . . 185

10 Other Eigenvalue Solvers 191

10.1 Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . . . 191

10.2 Divide-and-Conquer . . . . . . . . . . . . . . . . . . . . . . . 194

A Appendices 197

A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

A.1.1 Vectors and Matrices . . . . . . . . . . . . . . . 198

A.1.2 Inner Products . . . . . . . . . . . . . . . . . . . 198

A.1.3 Block Matrices . . . . . . . . . . . . . . . . . . . 198

A.2 Vector Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 200

A.2.1 Vector Norms . . . . . . . . . . . . . . . . . . . . 200

A.2.2 A-Norm . . . . . . . . . . . . . . . . . . . . . . . 200

A.3 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 202

A.4 Matrix Rank and Fundamental Subspaces . . . . . . . . . . 203

A.5 Matrix Determinants . . . . . . . . . . . . . . . . . . . . . . 204

A.6 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

A.6.1 Eigenvalues and Eigenvectors . . . . . . . . . . . 205

A.6.2 Similarity Transformations . . . . . . . . . . . . 206

A.6.3 Diagonalisation . . . . . . . . . . . . . . . . . . . 206

A.6.4 Singular Values of a Square Matrix . . . . . . . . 207

A.7 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . 208

A.8 Matrices with Special Structure or Properties . . . . . . . . 210

A.8.1 Diagonal Matrices . . . . . . . . . . . . . . . . . 210

A.8.2 Triangular Matrices . . . . . . . . . . . . . . . . 210

A.8.3 Permutation Matrices . . . . . . . . . . . . . . . 210

Contents v

A.8.4 Projectors . . . . . . . . . . . . . . . . . . . . . . 210

A.9 Big O Notation . . . . . . . . . . . . . . . . . . . . . . . . . 212

A.9.1 Big O as h→ 0 . . . . . . . . . . . . . . . . . . . 212

A.9.2 Big O as n→∞ . . . . . . . . . . . . . . . . . . 212

A.10 Sparse Matrix Formats . . . . . . . . . . . . . . . . . . . . . 213

A.10.1 Simple List Storage . . . . . . . . . . . . . . . . 213

A.10.2 Compressed Sparse Column Format . . . . . . . 213

Bibliography 215

vi Contents

Preface

This document contains lecture notes for MTH3320 – Computational

Linear Algebra. Since MTH3320 is offered for the first time in 2017 S2, the

notes will be built up as the term progresses.

• Part I of the unit covers numerical methods for solving linear systems

A~x = ~b (weeks 1-6).

• Part II of the unit covers numerical methods for computing eigenvalues

and singular values (weeks 7-12).

• The Appendix of the notes covers a brief and condensed review of back-

ground material in linear algebra (which may be reviewed in the lectures

with some more detail, as needed).

These notes are intended to be used in conjunction with the lectures. In their

first incarnation, these notes will be quite dense, and, depending on the topic,

more details and explanations may be provided in the lectures.

Useful reference books on numerical linear algebra include [Saad, 2003],

[Trefethen and Bau III, 1997], [Bjo¨rck, 2015], [Linge and Langtangen, 2016],

[Gander et al., 2014], [Demmel, 1997], [Saad, 2011], [Quarteroni et al., 2010],

[Ascher and Greif, 2011].

vii

viii Preface

Synopsis of MTH3320

The overall aim of this unit is to study the numerical methods for matrix com-

putations that lie at the core of a wide variety of large-scale computations and

innovations in the sciences, engineering, technology and data science. Students

will receive an introduction to the mathematical theory of numerical methods

for linear algebra (with derivations of the methods and some proofs). This will

broadly include methods for solving linear systems of equations, least-squares

problems, eigenvalue problems, and other matrix decompositions. Special at-

tention will be paid to conditioning and stability, dense versus sparse problems,

and direct versus iterative solution techniques. Students will learn to imple-

ment the computational methods efficiently, and will learn how to thoroughly

test their implementations for accuracy and performance. Students will work on

realistic matrix models for applications in a variety of fields. Applications may

include, for example: computation of electrostatic potentials and heat conduc-

tion problems; eigenvalue problems for electronic structure calculation; ranking

algorithms for webpages; algorithms for movie recommendation, classification of

handwritten digits, and document clustering; and principal component analysis

in data science.

Part I

Linear Systems of Equations

Chapter 1

Introduction and Model

Problems

Objectives of this chapter

1. Motivation: In Part I of the unit we will develop, analyse, and implement

numerical methods to solve large linear systems A~x = ~b.

2. This introductory chapter gives some examples of linear systems in real-

life applications.

3. These examples will be used as model problems throughout Part I of the

unit.

1.1 A Simple 1D Example from Structural Mechanics

Consider a string of unit length under tension T, which is subjected to a trans-

verse distributed load of magnitude p(x) per unit length (see figure). Let u(x)

denote the vertical displacement at point x. We choose signs such that both p(x)

and u(x) are positive in the upward direction.

For small displacements, the vertical displacement u(x) is governed by the

ordinary differential equation (ODE)

du(x)

dx2

= −p(x)

T

.

3

4 Chapter 1. Introduction and Model Problems

Since the string is fixed on the left and right, we can use boundary conditions

u(0) = 0 and u(1) = 0.

The problem of finding the displacement u(x) is fully specified by the follow-

ing ODE boundary value problem (BVP):

BVP

du(x)

dx2

= −p(x)

T

x ∈ [0, 1]

u(0) = 0

u(1) = 0

We can approximate the solution to this problem numerically by discretising

the ODE and solving the resulting linear system A~v = ~b.

1.1.1 Discretising the ODE

We discretise the ODE by deriving a finite different approximation for the second

derivative in the ODE, using Taylor series expansions:

u(x+ h) = u(x) + u′(x)h+ u′′(x)h2/2 + u′′′(x)h3/6 +O(h4),

u(x− h) = u(x)− u′(x)h+ u′′(x)h2/2− u′′′(x)h3/6 +O(h4).

Summing these up gives

u(x+ h) + u(x− h) = 2u(x) + u′′(x)h2 +O(h4),

from which we obtain

u′′(x) =

u(x+ h)− 2u(x) + u(x− h)

h2

+O(h2). (1.1)

We consider a grid that divides the problem domain [0, 1] into N + 1 intervals

of equal length

∆x = h =

1

N + 1

with N + 2 equally spaced grid points xi given by

xi = ih i = 0, . . . , N + 1

(i.e., there are two boundary points x0 and xN+1 at x = 0 and x = 1, and there

are N interior points). We then approximate the unknown function u(x) (the

exact solution to the BVP) at the grid points by discrete approximations vi:

vi ≈ u(xi),

using the finite difference formula. That is, we solve the following discretised

BVP for the unknown numerical approximation values vi:

discretised BVP

vi+1 − 2vi + vi−1

h2

= −p(xi)

T

(i = 1, . . . , N)

xi = ih

v0 = 0

v1 = 0.

1.1. A Simple 1D Example from Structural Mechanics 5

1.1.2 Formulation as a Linear System

This discretised BVP can be written as a linear system

A~v = ~b

with N equations for the N unknowns vi (i = 1, . . . , N) at the interior points

of the problem domain. We normally consider square matrices of size n× n, so

for this problem, the total number of unknowns n equals the number of interior

grid points, i.e., we have n = N .

We write the discretised BVP as A~v = ~b with the matrix A ∈ Rn×n given by

the so-called 1-dimensional (1D) Laplacian matrix:

Definition 1.1: 1D Laplacian Matrix (Model Problem 1)

A =

−2 1 0 . . . 0

1 −2 1 0 . . . 0

0 1 −2 1 0 . . . 0

...

. . .

. . .

. . .

...

0 . . . 0 1 −2 1 0

0 . . . 0 1 −2 1

0 . . . 0 1 −2

(1.2)

The vectors ~v and ~b in A~v = ~b are given by

~v =

v1

v2

...

...

vn−1

vn

~b = −h

2

T

p(x1)

p(x2)

...

...

p(xn−1)

p(xn)

,

with h = 1/(n+ 1).

Note that the matrix A is tridiagonal, and it is very sparse: it has very few

nonzero elements (close to 3 per row, on average).

Definition 1.2: Sparse Matrix

Let A ∈ Rm×n.

1. nnz(A) is the number of nonzero elements of A

2. A is called a sparse matrix if

nnz(A) mn.

Otherwise, A is called a dense matrix.

Efficient numerical methods for this problem should exploit this sparsity,

and the study of efficient numerical methods for sparse matrix problems is an

important focus of this unit.

6 Chapter 1. Introduction and Model Problems

1.1.3 Solving the Linear System

Suppose the transverse load in the above problem is given specifically by

p(x) = −(3x+ x2) exp(x),

and T = 100.

The figure below shows the numerical approximation ~v obtained from solving

A~v = ~b, for n = N = 2, 4, 8, 16.

0 0.2 0.4 0.6 0.8 1

-5

-4

-3

-2

-1

0 #10

-3

0 0.2 0.4 0.6 0.8 1

-5

-4

-3

-2

-1

0 #10

-3

0 0.2 0.4 0.6 0.8 1

-5

-4

-3

-2

-1

0 #10

-3

0 0.2 0.4 0.6 0.8 1

-5

-4

-3

-2

-1

0 #10

-3

As it happens, the exact solution to this problem can also be obtained in

closed form:

u(x) = x (x− 1) exp(x)/100,

(it is shown in blue in the figure). This allows us to verify the accuracy of

the numerical approximation, and it can be shown theoretically and verified

numerically that the error u(xi)− vi = O(h2).

Problem 1.3: Vertical Displacement in a String (not examinable)

Can you show that the vertical displacement u(x) is governed by the ODE

uxx = −p(x)/T?

(Hint: Assume that displacements are small, so that the tension T can be

taken as constant over the whole string, and so that the angle θ can be consid-

ered small (θ is measured from the horizontal in counter-clockwise direction).

Consider vertical force equilibirum.)

1.2. A 2D Example: Poisson’s Equation for Heat Conduction 7

1.2 A 2D Example: Poisson’s Equation for Heat

Conduction

We first consider models for heat flow in a metal plate.

The flow of heat in a metal plate can be modeled by the heat equation,

which is a partial differential equation (PDE) that describes the evolution of the

temperature in the plate, u(x, y, t), in space and time:

∂u

∂t

= κ

(

∂2u

∂x2

+

∂2u

∂y2

)

+ g(x, y). (1.3)

Here, κ is the heat conduction coefficient, and g(x, y) is a heat source or sink.

We consider the specific problem of determining the stationary temperature

distribution in a square domain Ω of length 1 m, (x, y) ∈ Ω = [0, 1] × [0, 1],

with the temperature on the four boundaries fixed at u = u0 where u0 = 600

Kelvin, and with a heat source g(x, y) with Gaussian profile centered at (x, y) =

(3/4, 3/4) :

g(x, y) = 10, 000 exp

(

− (x− 3/4)

2 + (y − 3/4)2

0.01

)

.

For simplicity we set the heat conduction coefficient to κ = 1. Since we seek a

stationary solution, we can set the time derivate in Eq. (1.3) equal to zero, and

solve

∂2u

∂x2

+

∂2u

∂y2

= f(x, y), (1.4)

with f(x, y) = −g(x, y)/κ.

The problem of finding the stationary temperature profile u(x, y) is then fully

specified by the following PDE boundary value problem (BVP):

BVP

∂2u

∂x2

+

∂2u

∂y2

= −g(x, y)

(x, y) ∈ Ω = [0, 1]× [0, 1]

u(x, y) = u0 on ∂Ω,

where ∂Ω denotes the boundary of the spatial domain Ω.

We can approximate the solution to this problem numerically by discretising

the PDE and solving the resulting linear system A~v = ~b.

Eq. (1.4) is called Poisson’s equation, and it arises in many areas of applica-

tion, including Newtonian gravity, electrostatics, or elasticity. When g(x, y) = 0,

the equation is called Laplace’s equation. The symbol ∆ is often used as a short-

hand notation for the differential operator in Eq. (1.4), and

∆u =

∂2u

∂x2

+

∂2u

∂y2

is called the Laplacian of u. Note that the 1D string problem described in the

previous section features the 1D version of the Laplacian operator. The Laplacian

operator can clearly also be extended to dimension 3 and higher.

8 Chapter 1. Introduction and Model Problems

1.2.1 Discretising the PDE

We discretise the PDE by using finite difference approximations for the second-

order partial derivatives that are similar to Eq. (1.1):

∂2u(x, y)

∂x2

=

u(x+ h, y)− 2u(x, y) + u(x− h, y)

h2

+O(h2),

∂2u(x, y)

∂y2

=

u(x, y + h)− 2u(x, y) + u(x, y − h)

h2

+O(h2).

We consider a regular Cartesian grid that partitions the problem domain into

squares of equal size by dividing both the x-range and the y-range into N + 1

intervals of equal length

∆x = ∆y = h =

1

N + 1

.

The grid points xi and yj are given by

xi = ih i = 0, . . . , N + 1,

yj = jh j = 0, . . . , N + 1,

(i.e., there are layers of boundary points at x0, xN+1, y0, and yN+1, and there

are N2 interior points). We then approximate the unknown function u(x, y) (the

exact solution to the BVP) at the grid points by discrete approximations wi,j :

wi,j ≈ u(xi, yj),

using the finite difference formula. That is, we solve the following discretised

BVP for the unknown numerical approximation values wi,j :

discretised BVP

wi+1,j + wi,j+1 − 4wi + wi−1,j + wi,j−1

h2

= −g(xi, yj)

(i, j = 1, . . . , N)

xi = ih, yi = jh

w0,j = wN+1,j = wi,0 = wi,N+1 = u0.

(1.5)

1.2.2 Formulation as a Linear System

Similar to the 1D model problem, the 2D discretised BVP can be written as a

linear system

A~v = ~b,

with now N2 equations for the N2 unknowns wi,j (i, j = 1, . . . , N) at the interior

points of the problem domain. Here, A ∈ Rn×n with total number of unknowns

n = N2.

We first have to assemble the N2 unknowns wi,j (i, j = 1, . . . , N) into a

single vector ~v. We can do this using lexicographic ordering by rows, in which

we assemble rows of wi,j in the spatial domain into ~v, from top to bottom

1.2. A 2D Example: Poisson’s Equation for Heat Conduction 9

starting from row j = 1, and from left to right within each row. For example,

when N = 3, the vector ~v is given by

~v =

w1,1

w2,1

w3,1

w1,2

w2,2

w3,2

w3,1

w3,2

w3,3

.

Next, if we want to write the BVP as a linear system A~v = ~b of the N2

interior unknowns, the values of wi,j at boundary points of the domain need to

be moved to the right-hand side (RHS) of the discretised PDE in Eq. (1.5). If

we do this, the system matrix in A~v = ~b is given by the so-called 2-dimensional

(2D) Laplacian matrix:

Definition 1.4: 2D Laplacian Matrix (Model Problem 2)

A =

T I 0 . . . 0

I T I 0 . . . 0

0 I T I 0 . . . 0

...

. . .

. . .

. . .

...

0 . . . 0 I T I 0

0 . . . 0 I T I

0 . . . 0 I T

∈ Rn×n, (1.6)

where n = N2 and T and I are block matrices ∈ RN×N :

T =

−4 1 0 . . . 0

1 −4 1 0 . . . 0

0 1 −4 1 0 . . . 0

...

. . .

. . .

. . .

...

0 . . . 0 1 −4 1 0

0 . . . 0 1 −4 1

0 . . . 0 1 −4

∈ RN×N , (1.7)

and I is the N ×N identity matrix.

The vector ~b in A~v = ~b is given by −h2g(x, y) evaluated in xi and yj , plus a

contribution of −u0 for every neighbour of wi,j that lies on the boundary. For

the simple example with N = 3 (where only the midpoint of the grid does not

10 Chapter 1. Introduction and Model Problems

have neighbour points on the boundary), ~b is given by

~b =

−h2g1,1 − 2u0

−h2g2,1 − u0

−h2g3,1 − 2u0

−h2g2,1 − u0

−h2g2,2

−h2g2,3 − u0

−h2g3,1 − 2u0

−h2g3,2 − u0

−h2g3,3 − 2u0

,

where gi,j = g(xi, yj) and h = 1/(N + 1).

Note that the matrix A is block tridiagonal, and it is very sparse: it has very

few nonzero elements (close to 5 per row, on average). Again, it is essential that

efficient numerical methods for this problem exploit this sparsity, and the study

of efficient numerical methods for sparse matrices like the 2D Laplacian matrix

is an important focus of this unit.

For example, a 2D resolution of 1000 × 1000 grid points is quite modest for

scientific applications on current-day computers. In this case, A ∈ Rn×n with

n = N2 = 106. Using Gaussian elimination (or, equivalently, LU decomposition)

in a naive fashion (without taking advantage of the zeros in the sparse matrix),

the number of floating point operations required, W , would scale like W =

O(n3) = O(1018), which would take a very large amount of time. In this unit we

will pursue methods for sparse matrices with work complexity approaching W =

O(n). Such methods power many of today’s advances in science, engineering and

technology.

1.2.3 Solving the Linear System

When considering the linear system in Matlab using N = 64, we obtain the

following plots for the source term and for the approximation of the temperature

profile (surface and contour plots, using Matlab’s mesh and contour):

0

1

2000

0.8

4000

1

6000

0.6 0.8

source

8000

0.60.4

10000

0.4

0.2 0.2

0 0

1.2. A 2D Example: Poisson’s Equation for Heat Conduction 11

source

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

600

1

610

620

630

0.8 1

640

650

0.6 0.8

approximate temperature profile

660

670

0.60.4

680

0.4

0.2 0.2

0 0

approximate temperature profile

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

12 Chapter 1. Introduction and Model Problems

1.3 An Example from Data Analytics: Netflix Movie

Recommendation

In 2006, the online DVD-rental and video streaming company Netflix launched a

competition for the best collaborative filtering algorithm to predict user ratings

for films, based on a training data set of previous ratings.

Netflix provided a training data set of 100,480,507 ratings that 480,189 users

gave to 17,770 movies, with ratings from 1 to 5 (integral) stars. Let the number

of users be given by m = 480, 189, and the number of movies by n = 17, 770.

Each rating consists of a triplet (i, j, v), where i is the user ID, j is the movie ID,

and v is the rating value in the range 1–5. The training ratings can be stored in

a sparse ratings matrix R ∈ Rm×n. The set of matrix indices with known values

is indicated by index set R = {(i, j)}. For example, a simple ratings matrix R

with m = 7 users and n = 4 movies could be given by

R =

2

3

5

1

1 5

1 5

2

,

with index set R = {(1, 2), (2, 2), (3, 1), (4, 2), (5, 3), (5, 4), (6, 1), (6, 4), (7, 3)}.

(To be precise, the ratings matrix is actually not a usual sparse matrix, in which

values that are not stored are assumed to be zero, but rather an incomplete

matrix, with values that are not stored considered unknown.)

The goal of a collaborative filtering algorithm is to predict the unknown

ratings in R based on the training data in R. These predicted ratings can then

be used to recommend movies to users. In linear algebra, this type of problem

is known as a matrix completion problem.

The recommendation problem (for movies, music, books, . . . ) can be seen as a

problem in the field of machine learning, which studies algorithms that can learn

from and make predictions on data. In the sub-category of supervised learning,

the computer is presented with example inputs and their desired outputs (the

training data set), and the goal is to learn a general rule that maps inputs to

outputs.

1.3.1 Movie Recommendation using Linear Algebra and Optimisation

A powerful approach to attack the matrix completion problem is to seek matrices

U ∈ Rf×m and V ∈ Rf×n, with f a small integer m,n, such that UTM

approximates the ratings matrix R on the set of known ratings, R. Pictorially,

1.3. An Example from Data Analytics: Netflix Movie Recommendation 13

we seek U and M such that

R

≈

UT

M

(1.8)

In practice, we will seek U and M that are dense, and we will allow their elements

to assume any real value. Each row in these matrices represents a latent feature

or factor of the data. The UTM decomposition of R effectively seeks to provide

a model that with a small number of features, f (typically chosen ≤ 50), is able

to provide good predictions for the unknown values in R.

The user and movie matrices U and M have shape

U =

~ui

, (1.9)

M =

~mj

. (1.10)

The column vectors of U , ~ui ∈ Rf , are called the user feature vectors, and the

column vectors of M , ~mj ∈ Rf , are called the movie feature vectors. With

f m,n, the interpretation of the approximation UTM is that, for each user i

and movie j, their affinity for each of the f latent ‘feature categories’ is encoded

in the vectors ~ui and ~mj . (For instance, if feature k were to represent the

‘commedy’ category, uk,i would express to which degree user i is into comedies,

and mk,j would express to which degree movie j is a comedy.)

The approximation UTM ofR with small f is called a low-rank approximation

of R, since

UT

M

= f∑

k=1

(UT )∗k(M)k∗, (1.11)

where (UT )∗k is the kth column of UT and (M)k∗ is the kth row of M , and the

terms (UT )∗k(M)k∗ are m× n matrices of matrix rank 1.

We can seek user and movie matrices U and M that optimally approximate

the rating matrix R, if we choose a specific sense in which UTM should approx-

imate R. We define the Frobenius norm of a matrix by

14 Chapter 1. Introduction and Model Problems

Definition 1.5: Frobenius Norm of a Matrix

Let A ∈ Rm×n. Then the Frobenius norm of A is given by

‖A‖F =

√√√√ m∑

i=1

n∑

j=1

a2ij .

It is natural, then, to seek U and M such that the following measure of the

difference between UTM and R is minimised:

g(U,M) = ‖R− UTM‖2F,R , (1.12)

where the ‖·‖F,R norm is a partial Frobenius norm, summed only over the known

entries of R, as given by the index set R.

In practice, it is necessary to add a regularisation term to g(U,M), to ensure

the optimisation problem is well-posed and gives useful results. So the final

optimization problem we seek to solve for the recommendation task is

min

U,M

g(U,M) =

∑

(i,j)∈R

(

rij − ~uTi ~mj

)2

+ λ

m∑

i=1

nnz((R)i∗) ‖~ui‖22 +

n∑

j=1

nnz((R)∗j) ‖~mj‖22

, (1.13)

where nnz((R)i∗) is the number of movies ranked by user i, and nnz((R)∗j) is

the number of users that ranked movie j. The regularisation parameter λ is

a fixed number that can be chosen by trial-and-error or by techniques such as

cross-validation.

1.3.2 An Alternating Least Squares Approach to Solving the

Optimisation Problem

We seek U ∈ Rf×m and V ∈ Rf×n that minimize g(U,M). A popular way of

solving the optimisation problem is to determine U and M in an alternating

fashion: starting from an initial guess for M , determine the optimal U with M

fixed, then determine the optimal M with U fixed, and so forth. As it turns

out, each subproblem of determining U with fixed M (and vice versa) in this

alternating algorithm boils down to a (regularized) linear least-squares problem,

and the resulting procedure is called Alternating Least Squares (ALS). Also,

with fixed M , each column of U can be determined independent of the other

columns (and vice versa for M , with fixed U). This means that ALS can be

executed efficiently in parallel, which makes it suitable for big data sets.

The figure below (from [Winlaw et al., 2015]) shows the performance of ALS

on a small ratings matrix of size 400 × 80. Typically ALS requires quite a few

iterations to reach high accuracy, and it is possible to improve its convergence

behaviour, for example, as shown for the ALS-NCG method.

1.3. An Example from Data Analytics: Netflix Movie Recommendation 15

One of the focus areas of Part I of this unit is to solve least-squares problems

in accurate and efficient ways. We will return to the movie recommendation

problem in that context. In particular, we will learn how to derive the formulas

for determining U with fixed M , and vice versa, and will use them to solve movie

recommendation problems.

PS: On September 21, 2009, the grand prize of US$1,000,000 for the Net-

flix prize competition was given to the BellKor’s Pragmatic Chaos team which

bested Netflix’s own algorithm for predicting ratings by 10.06% (using a blend of

approaches, including multiple variations of the matrix factorization approach).

PPS: Although the Netflix prize data sets were constructed to preserve cus-

tomer privacy, in 2007, two researchers showed it was possible to identify indi-

vidual users by matching the data sets with film ratings on the Internet Movie

Database. On December 17, 2009, four Netflix users filed a class action law-

suit against Netflix, alleging that Netflix had violated U.S. fair trade laws and

the Video Privacy Protection Act by releasing the data sets. The sequel to the

Netflix prize was canceled. We are living in a crazy world.

16 Chapter 1. Introduction and Model Problems

Chapter 2

LU Decomposition for

Linear Systems

2.1 Gaussian Elimination and LU Decomposition

We consider nonsingular linear systems A~x = ~b where A ∈ Rn×n. We recall the

following theorem about solvability of linear systems.

Theorem 2.1

Let A ∈ Rn×n be nonsingular (i.e., det(A) 6= 0), and let ~b ∈ Rn. Then the

linear system A~x = ~b has a unique solution, given by ~x = A−1~b.

If A is singular, A~x = ~b either has infinitely many solutions (if ~b ∈ range(A)),

or no solution (if ~b /∈ range(A)).

2.1.1 Gaussian Elimination

We first consider standard Gaussian elimination (GE) and assume that no zero

pivot elements are encountered, so no pivoting (switching of rows) is required.

Example 2.2: One Step of Gaussian Elimination

Let

A =

2 3 46 8 4

8 9 0

In the first step of GE, 2 is the pivot element, and we add -6/2 times row 1 to

row 2, and add -8/2 times row 1 to row 3, resulting in

A =

2 3 40 −1 −8

0 −3 −16

17

18 Chapter 2. LU Decomposition for Linear Systems

For the case of a general system A~x = ~b, we can write the result of one step of

GE for

a11 ~r

T

1

~c1 A

(2)

x1

~x(2)

=

b1

~b(2)

,

as

a11 ~r

T

1

0 A(2) − ~c1

a11

~rT1

x1

~x(2)

=

b1

~b(2) − ~c1

a11

b1

.

2.1.2 LU Decomposition

The following theorem and its proof show us that Gaussian elimination on A ∈

Rn×n (when no zero pivots are encountered) is equivalent to decomposing A as

the product LU of two triangular matrices, and tell us how to construct the L

and U factors.

Theorem 2.3: LU Decomposition

Let A ∈ Rn×n be a nonsingular matrix. Assume no zero pivots arise when

applying standard Gaussian elimination (without pivoting) to A.

Then A can be decomposed as A = LU , where L ∈ Rn×n is unit lower trian-

gular, and U ∈ Rn×n is upper triangular and nonsingular.

Proof. The proof proceeds by mathematical induction on n.

Base Case: The statement holds for n = 1, since for any a ∈ R1 = R, with

a 6= 0 (i.e., 1/a exists so a is nonsingular), the LU decomposition

a = l u

exists, with l = 1 and u = a nonsingular.

Induction step: we show that, if the statement of the theorem holds for

n− 1, then it holds for n.

We perform one step of Gaussian elimination on n× n matrix

A =

a11 ~r

T

1

~c1 A

(2)

,

which is assumed nonsingular and such that no zero pivots arise when applying

GE to it.

2.1. Gaussian Elimination and LU Decomposition 19

Since a11 6= 0 we can define the Gauss transformation matrix

M (1) =

1 0

~m1 I

(2)

,

with ~m1 = −~c1/a11 and I(2) the identity matrix of size (n−1)× (n−1). Then

the first step of Gaussian elimination can be written as

M (1)A =

a11 ~r

T

1

0 A(2) + ~m1~r

T

1

=

a11 ~r

T

1

0 A˜

,

where A˜ is an (n − 1) × (n − 1) matrix for which no zero pivots arise when

applying GE to it (since the same holds for A); this also implies A˜ is nonsin-

gular.

By the induction hypothesis, A˜ can be decomposed as L˜U˜ , which leads to

M (1)A =

a11 ~r

T

1

0 L˜U˜

=

1 0

0 L˜

a11 ~r

T

1

0 U˜

or

A =

(

M (1)

)−1

1 0

0 L˜

a11 ~r

T

1

0 U˜

.

The inverse of M (1) is easily obtained from observing that

1 0

−~m1 I(2)

1 0

~m1 I

(2)

= I = (M (1))−1M (1).

Then

A =

1 0

−~m1 I(2)

1 0

0 L˜

a11 ~r

T

1

0 U˜

= LU,

20 Chapter 2. LU Decomposition for Linear Systems

with

L =

1 0

−~m1 L˜

. (2.1)

Matrix U is nonsingular: det(U) = a11 det(U˜) 6= 0, since U˜ is nonsingular and

a11 6= 0.

This proves the induction step and completes the proof.

Eq. (2.1) shows that L can be obtained by inserting the multiplier elements

−~m1 = ~c1/a11 in its columns, for every step of Gaussian elimination.

Note also that, by the construction in the proof, the LU decomposition is unique

(when no pivoting is performed).

If pivoting is employed during Gaussian elimination (e.g., when zero pivots

arise), a similar theorem holds:

Theorem 2.4: LU Decomposition

For any A ∈ Rn×n, a decomposition PA = LU exists where P ∈ Rn×n is

a permutation matrix, L ∈ Rn×n is unit lower triangular, and U ∈ Rn×n is

upper triangular.

Here, P encodes the row permutations of the pivoting operations. The PA =

LU decomposition is unique when P is fixed, and this theorem also holds for

singular A.

Remark 2.5: Solving a Linear System using LU Decomposition

We can solve A~x = ~b in three steps:

1. compute L and U in the decomposition A = LU , leading to LU~x = ~b

2. solve L~y = ~b using forward substitution

3. solve U~x = ~y using backward substitution

2.1.3 Implementation of LU Decomposition and Computational Cost

Implementation of LU Decomposition

A basic implementation of LU decomposition in Matlab-like pseudo-code is given

by

2.1. Gaussian Elimination and LU Decomposition 21

Algorithm 2.6: LU decomposition, kij version

Input: Matrix A

Output: L and U

U=A;

L=I ;

for k=1:n−1 % p i v o t k

for i=k+1:n % row i

m=u( i , k )/u(k , k ) ;

u ( i , k )=0;

for j=k+1:n % column j

u( i , j )=u( i , j )−m∗u(k , j ) ;

end

l ( i , k)=m;

end

end

However, we can also implement LU decomposition in-place:

Algorithm 2.7: LU decomposition, kij version, in-place

Input: Matrix A

Output: L and U stored in A

for k=1:n−1 % p i v o t k

for i=k+1:n % row i

a ( i , k)=a ( i , k )/ a (k , k ) ;

for j=k+1:n % column j

a ( i , j )=a ( i , j )−a ( i , k )∗ a (k , j ) ;

end

end

end

Also, we can depart from the standard order of operations in Gaussian elimi-

nation and consider to do all the operations for row i of A at once, in the so-called

ikj version of the algorithm:

Algorithm 2.8: LU decomposition, ikj version, in-place

Input: Matrix A

Output: L and U stored in A

for i =2:n % row i

for k=1: i−1 % p i v o t k

a ( i , k)=a ( i , k )/ a (k , k ) ;

for j=k+1:n % column j

a ( i , j )=a ( i , j )−a ( i , k )∗ a (k , j ) ;

end

end

end

22 Chapter 2. LU Decomposition for Linear Systems

Computational Work for LU Decomposition

We now consider the amount of computational work that is spent by the LU

decomposition algorithm, in terms of the number of floating point operations

(flops) performed to decompose an n × n matrix A. We count the number

of additions, and subtractions (which we indicate by A) and the number of

multiplications, divisions, and square roots (indicated by M). We assume that

these operations take the same amount of work, which is a reasonable assumption

for modern computer processors.

The following summation identities are useful when determining computa-

tional work:

n−1∑

p=1

1 = n− 1,

n−1∑

p=1

p =

1

2

n(n− 1),

n−1∑

p=1

p2 =

1

6

n(n− 1)(2n− 1).

We consider the kij version of the algorithm and sum over the three nested

loops to determine the work W of LU decomposition:

W =

n−1∑

k=1

n∑

i=k+1

(1M +

n∑

j=k+1

(1M + 1A))

=

n−1∑

k=1

n∑

i=k+1

(1 + 2(n− k))

=

n−1∑

k=1

n∑

i=k+1

(1 + 2n− 2k)

=

n−1∑

k=1

(1 + 2n− 2k)(n− k)

=

n−1∑

k=1

((n+ 2n2)− k(2n+ 1 + 2n) + 2k2)

= (n− 1)(n+ 2n2)− (4n+ 1)(n− 1)n/2 + 2n(n− 1)(2n− 1)/6

= (2− 2 + 2/3)n3 +O(n2)

=

2

3

n3 +O(n2)flops.

As expected, the dominant term in the expression for the computational work is

proportional to n3, since LU decomposition entails three nested loops that are

of (average) length proportional to n, roughly speaking. We say that the com-

putational complexity of LU decomposition is cubic in the number of unknowns,

n. For example, for the 2D model problem of Eq. (1.6), with n = N2, we have

W = O(n3) = O(N6). For large problems, cubic complexity is often prohibitive,

and we will seek to exploit structural properties like sparsity to obtain methods

with lower computational complexity.

A similar computation shows that Forward substitution L~y = ~b and Back-

ward substitution U~x = ~y have computational work

W = n2 +O(n)flops.

2.2. Banded LU Decomposition 23

LU Decomposition for Symmetric Positive Definite Matrices

Finally, we note that if the matrix A is symmetric positive definite (SPD), pivot-

ing is never required in the LU decomposition and the symmetry can be exploited

to save about half the work.

Theorem 2.9

If A ∈ Rn×n is SPD, the decomposition A = LU , where L is unit lower

triangular and U is upper triangular, exists and is unique.

The above theorem implies that no zero pivot elements can occur in the LU

decomposition algorithm for SPD matrices (in exact arithmetic).

Theorem 2.10: Cholesky decomposition

If A ∈ Rn×n is SPD, the decomposition A = L̂L̂T , where L̂ is a lower triangular

matrix with strictly positive diagonal elements, exists and is unique.

In fact, it can be shown that L̂ = L

√

D and L̂T =

√

D−1 U , where D is the

diagonal matrix containing the diagonal elements of U , which are strictly positive

for an SPD matrix such that their square root can be taken in the diagonal matrix√

D. The work to compute the Cholesky decomposition is W = 13n

3 +O(n2).

2.2 Banded LU Decomposition

In this section, we consider special versions of the LU algorithm that save work

for sparse matrices that are zero outside a band around the diagonal.

Definition 2.11

A banded matrix A ∈ Rn×n is a sparse matrix whose nonzero entries are

confined to a band around the main diagonal. I.e.,

∃K < n s.t. aij = 0 ∀i, j s.t. |i− j| > K.

The smallest such K is called the bandwidth of A.

For example, for a diagonal matrix we have K = 0. For a tridiagonal matrix,

we have K = 1. For our 2D model problem, we have K = N − 1.

It turns out that, if A has bandwidth K, then we need to compute the U and

L factors only within the band. This can be proved formally, but it can also be

seen intuitively by considering, e.g., the kij version as in Algorithm 2.7. First,

the statement a(i,k)=a(i,k)/a(k,k) cannot create new nonzeros. Second, the

statement a(i,j)=a(i,j)-a(i,k)*a(k,j) can only create new nonzeros if both

the multiplier a(i,k) and the element in the pivot row a(k,j) are nonzero. It

turns out that a(i,k)=0 outside the band for all rows i on the left, and a(k,j)=0

outside the band for all rows i on the right. So the banded structure maintains

additional zero elements in L and U according to

lij = 0 if i− j > B,

uij = 0 if j − i > B,

so nonzeros in row i of L don’t occur before column j = i−B, and nonzeros in

row i of U don’t occur after column j = i+B.

24 Chapter 2. LU Decomposition for Linear Systems

This means that, for banded matrices with bandwidth B, we can safely mod-

ify the ranges of the loops in the the ikj version of the LU algorithm as follows:

Algorithm 2.12: Banded LU decomposition, ikj version, in-place

Input: Matrix A with bandwidth B

Output: L and U stored in A

for i =2:n % row i

for k=max(1 , i−B) : i−1 % p i v o t k

a ( i , k)=a ( i , k )/ a (k , k ) ;

for j=k+1:min( i+B, n) % column j

a ( i , j )=a ( i , j )−a ( i , k )∗ a (k , j ) ;

end

end

end

Computational Work for Banded LU Decomposition

The amount of computational work for banded LU can be estimated as follows.

We sum over the three nested loops and obtain an upper bound for the work:

W =

n∑

i=2

i−1∑

k=max(1,i−B)

(1 +

min(i+B,n)∑

j=k+1

2) flops

≤

n∑

i=2

i−1∑

k=max(1,i−B)

(1 +

i+B∑

j=k+1

2)

=

n∑

i=2

i−1∑

k=max(1,i−B)

(1 + 2(i+B − k))

≤

n∑

i=2

i−1∑

k=i−B

(1 + 2(i+B)− 2k)

=

n∑

i=2

B(1 + 2(i+B))− 2

i−1∑

k=i−B

k

=

n∑

i=2

B(1 + 2(i+B))− 2(

i−1∑

k=1

k −

i−B−1∑

k=1

k)

=

n∑

i=2

B(1 + 2(i+B))− (i(i− 1)− (i−B)(i−B − 1))

=

n∑

i=2

B(1 + 2B + 2i))− (2Bi−B −B2)

=

n∑

i=2

3B2 + 2B

≤ n(3B2 + 2B),

2.3. Matrix Norms 25

so

W = O(B2n).

Notes:

• For the 1D model problem, B = 1, so we get

W ≤ n(3 + 2) = 5n,

i.e.,

W = O(n).

(This boils down to the so-called Thomas algorithm.)

• For the 2D model problem, with n = N2, we have B = N , so

W = O(B2n)

= O(N2n)

= O(n2)

= O(N4),

which is much better than the W = O(n3) = O(N6) cost of the regular

LU algorithm. (E.g., compare for N = 103; you save a factor 106 in work.)

• Further improvements in cost for the 2D model problem can be obtained

using more advanced techniques, which reorder the variables and equations

to minimize the bandwidth, or, more generally, to minimize the fill-in (i.e.,

the creation of new non-zeros) in the L and U factors. For example, the so-

called nested dissection algorithm obtains W = O(n3/2) for the 2D model

problem.

Still, it is possible to do better (up to W = O(n)) using iterative methods,

which, rather than direct methods like Gaussian elimination that solve

A~x = ~b exactly (in exact arithmetic) after n steps, solve the problem

iteratively starting from an initial guess ~x0 that is iteratively improved

over a number of steps until a desired accuracy is reached, typically in a

number of steps that is much smaller than n. These iterative methods are

the subject of the last three chapters in Part I of these notes.

2.3 Matrix Norms

In order to discuss accuracy and stability of algorithms for solving linear system,

we need to define ways to measure the size of a matrix. For this reason, we

consider the following matrix norms.

2.3.1 Definition of Matrix Norms

Definition 2.13: Natural or Vector-Induced Matrix Norm

Let ‖ · ‖p be a vector p-norm. Then for A ∈ Rn×n, the matrix norm induced

by the vector norm is given by

‖A‖p = max

~x6=0

‖A~x‖p

‖~x‖p .

26 Chapter 2. LU Decomposition for Linear Systems

Note: alternatively, we may also write

‖A‖p = max‖~x‖p=1 ‖A~x‖p.

Theorem 2.14

Let A ∈ Rn×n. The vector-induced matrix norm function ‖A‖p is a norm on

the vector space of real n × n matrices over R. That is, ∀A,B ∈ Rn×n and

∀ a ∈ R, the following hold

1. ‖A‖ ≥ 0, and ‖A‖ = 0 iff A = 0

2. ‖aA‖ = |a|‖A‖

3. ‖A+B‖ ≤ ‖A‖+ ‖B‖.

In addition, the following properties also hold:

Theorem 2.15

1. ‖A~x‖p ≤ ‖A‖p‖~x‖p

2. ‖AB‖p ≤ ‖A‖p‖B‖p.

Here we only prove part 1.

Proof. If ~x = 0, the inequality holds.

For any ~x 6= 0, we have

‖A~x‖p

‖~x‖p ≤ max~x6=0

‖A~x‖p

‖~x‖p = ‖A‖p,

by the definition of matrix norm. Hence, ‖A~x‖p ≤ ‖A‖p‖~x‖p.

Note: the Frobenius norm introduced in Def. 1.5 is an example of a matrix

norm that is not induced by a vector norm.

2.3.2 Matrix Norm Formulas

We can derive the following specific expressions for some commonly used matrix

p-norms.

2.3. Matrix Norms 27

Theorem 2.16

Let A ∈ Rn×n.

1.

‖A‖∞ = max

1≤i≤n

n∑

j=1

|aij |

“maximum absolute row sum”

2.

‖A‖1 = max

1≤j≤n

(

n∑

i=1

|aij |

)

“max absolute column sum”

3.

‖A‖2 = max

1≤i≤n

√

λi(ATA)

= max

1≤i≤n

√

λi(AAT )

= max

1≤i≤n

σi

where λi(A

TA) are the eigenvalues of ATA and σi are the singular values

of A.

Here we only prove part 1.

Proof. We will derive the formula for the matrix infinity norm using the

second variant of the defition,

‖A‖∞ = max‖~x‖∞=1 ‖A~x‖∞.

Also, observe that ‖~x‖∞ = 1 iff max1≤i≤n |xi| = 1.

Let

r = max

1≤i≤n

n∑

j=1

|aij |

(maximum absolute row sum).

We first show that ‖A‖∞ ≤ r. This follows from ‖A~x‖∞ ≤ r if ‖~x‖∞ = 1,

since then

|(A~x)i| =

∣∣∣∣∣∣

n∑

j=1

aijxj

∣∣∣∣∣∣ ≤

n∑

j=1

|aij ||xj | ≤

n∑

j=1

|aij | ≤ r for any i.

Now, to show that ‖A‖∞ = r, it is sufficient to find a specific ~y s.t. ‖~y‖∞ = 1

and ‖A~y‖∞ = r. Let ν be the index of a row in A with maximum absolute

row sum, meaning that

n∑

j=1

|aνj | = r.

28 Chapter 2. LU Decomposition for Linear Systems

Define ~y as follows:

yj := sign(aνj) =

1 if aνj > 00 if aνj = 0−1 if aνj < 0

This ~y converts each aνjyj into |aνj | in the formula for the νth component of

the product A~y, so we have:

|(A~y)ν | =

∣∣∣∣∣∣

n∑

j=1

aνjyj

∣∣∣∣∣∣ =

∣∣∣∣∣∣

n∑

j=1

|aνj |

∣∣∣∣∣∣ =

n∑

j=1

|aνj | = r.

Therefore ‖A~y‖∞ = r with ‖~y‖∞ = 1, and so ‖A‖∞ = r.

2.3.3 Spectral Radius

Definition 2.17

Let A ∈ Rn×n with eigenvalues λi, i = 1, . . . , n. The spectral radius ρ(A) of A

is given by

ρ(A) = max

1≤i≤n

|λi|.

Theorem 2.18

Let A ∈ Rn×n. For any matrix p-norm, it holds that

ρ(A) ≤ ‖A‖p.

Remark 2.19

The matrix 2-norm formula simplifies as follows when A is symmetric:

‖A‖2 = max

1≤i≤n

√

λi(ATA)

= max

1≤i≤n

√

λi(A2)

= max

1≤i≤n

√

λi(A)2

= max

1≤i≤n

|λi(A)| = ρ(A).

2.4. Floating Point Number System 29

2.4 Floating Point Number System

2.4.1 Floating Point Numbers

Definition 2.20

The floating point number system F (β, t, L, U) consists of the set of floating

point numbers x of format

x = ±d1.d2d3 · · · dt βe = m βe

where

m = ±d1.d2d3 · · · dt

is called the mantissa, β is called the base, e is called the exponent, and t is

called the number of digits in the mantissa. The digits di are specified by

di ∈ {0, 1, . . . , β − 1} (i = 2, . . . , t)

d1 ∈ {1, . . . , β − 1}

and the exponent satisfies

L ≤ e ≤ U.

Note: The mantissa is normalised, by requiring d1 to be nonzero.

2.4.2 Rounding and Unit Roundoff

Definition 2.21

Let x ∈ R. The rounded representation of x in F (β, t, L, U) is indicated by

fl(x).

Most computer systems use the rounding rule round to nearest, tie to even,

as in the following example. (The tie-to-even part serves to avoid bias up or

down.)

Example 2.22

Consider floating point number system F (β = 10, t = 4, L = −10, U = 10).

Some examples illustrating the round to nearest, tie to even rule:

x = 123.749 fl(x) = 1.237 102

x = 123.751 fl(x) = 1.238 102

x = 123.750 fl(x) = 1.238 102 (tie!)

x = 123.850 fl(x) = 1.238 102 (tie!)

30 Chapter 2. LU Decomposition for Linear Systems

Theorem 2.23

Consider a floating point number system F (β, t, L, U) with a rounding-to-

nearest rule. Let fl(x) be the rounded representation of x ∈ R, x 6= 0. Then

the relative error in the representation of x is bounded by

|x− fl(x)|

|x| ≤ µ =

1

2

β−t+1.

Here, µ is called the unit roundoff (also, sometimes, machine precision or

machine epsilon).

Proof. Let

x = m βe

and

fl(x) = m βe.

Since

m = ± d1.d2d3 · · · dtdt+1 . . .

= ± d1 + d2β−1 + d3β−2 + · · ·+ dtβ−t+1 + dt+1β−t + . . . ,

and rounding to nearest with t digits is used, we have

|m−m| ≤ 1

2

β−t+1,

so

|x− fl(x)| ≤ 1

2

β−t+1βe,

or

|x− fl(x)|

|x| ≤

1

2

β−t+1

|m|

βe

βe

≤ 1

2

β−t+1 = µ,

because m ≥ 1.

Note: We can also write

fl(x) = x(1 + ν) with |ν| ≤ µ,

because ν = (fl(x)− x)/x so |ν| ≤ µ.

2.4.3 IEEE Double Precision Numbers

The IEEE double precision standard is being used on most computers for repre-

senting floating point numbers in hardware and carrying out computations with

them. For instance, Matlab normally uses double precision numbers. Higher

precision numbers can be represented in software, but are much slower to work

with than the native hardware representations.

2.4. Floating Point Number System 31

Example 2.24: IEEE Double Precision Numbers

The IEEE double precision floating point number system is based on F (β =

2, t = 53, L = −1022, U = 1023). It is a binary system with 53 digits in the

mantissa, and exponent range from -1022 to 2023. It represents numbers in

the format

x = 1.01001 · · · 001 2e = m βe.

Here, the first digit of the mantissa

m = ±1.f

does not need to be stored because it is always 1 (due to the normalisation).

The fraction f has 52 digits. The sign of the mantissa is stored in a sign bit

s. A shifted form of the exponent is stored:

E = e+ 1023,

such that E is an integer between 1 and 2046, which can be represented by

11 bits (211 = 2048). In total, storing an IEEE double precision number in

computer memory requires 64 bits (i.e., 8 bytes):

s f E

1 bit 52 bits 11 bits

Numbers with E in the range 1 ≤ E ≤ 2046 represent the standard normalised

numbers. The values E = 0 and E = 2047 are used to represent special

numbers:

1 ≤ E ≤ 2046 : x = (−1)s(1.f)2E−1023 (normalised numbers)

E = 2047 : f 6= 0 =⇒ x = NaN (not a number, e.g. 0/0)

f = 0 =⇒ x = (−1)sInf (infinity, e.g. 1/0)

E = 0 : f = 0 =⇒ x = 0

f 6= 0 =⇒ (denormalised numbers:

mantissa is not normalised)

e.g. x = 0.0001011010 . . . 0 2−1022

With β = 2 and t = 53, the unit roundoff is

µ =

1

2

β−t+1 =

1

2

β−53+1 = 2−53 ≈ 1.1 10−16,

which is roughly equivalent to β = 10, t = 16 (then µ = 0.5 10−16+1 = 5 10−16.

We say that double precision binary numbers have between 16 and 17 decimal

digits of (relative) accuracy.

The smallest positive nonzero (normalised) number (realmin in Matlab) is

1.0 . . . 0 2−1022 ≈ 2.2 10−308,

and the largest positive number (realmax in Matlab) is

1.1 . . . 1 21023 = (2− 252) 21023 ≈ 21024 ≈ 1.8 10308.

Note, in Matlab, eps is the distance from 1 to the next larger floating point

number. We have eps=2µ.

32 Chapter 2. LU Decomposition for Linear Systems

2.4.4 Rounding and Basic Arithmetic Operations

Basic arithmetic operations such as addition, subtraction, multiplication, divi-

sion, and square root, are implemented in computer hardware such that the

rounded representation of the exact result is obtained. (This is achieved by us-

ing additional digits of precision when computing intermediate results.)

More precisely, assume x and y are floating point numbers stored in computer

memory, after rounding (i.e., x = fl(x), and y = fl(y)). Let x+ y be the result

computed and stored by the computer (after rounding). Then the IEEE stan-

dard requires that the + operation be implemented in computer hardware such

that

x+ y = fl(x+ y),

i.e., the result of x+ y evaluated on the computer is the exact x + y, rounded

to its floating point representation. This is a stringent requirement! This also

implies

x+ y = (x+ y)(1 + ν) with |ν| ≤ µ.

Similary, we have

x− y = fl(x− y),

x ∗ y = fl(x ∗ y),

x/y = fl(x/y),

√

x = fl(

√

x).

Other standard functions like sin(x) and exp(x) are typically implemented in

software, and don’t have the same accuracy guarantees. When they are evalu-

ated, we can normally assume that the relative errors satisfy bounds like

sin(x) = sin(x)(1 + c1ν),

exp(x) = exp(x)(1 + c2ν),

with |ν| ≤ µ and c1 and c2 constants not much larger than 1, see, e.g., https://

blogs.mathworks.com/cleve/2017/01/23/ulps-plots-reveal-math-function-

accurary.

2.5 Conditioning of a Mathematical Problem

2.5.1 Conditioning of a Mathematical Problem

Consider the mathematical problem P to find output ~z from input ~x with the

relation by ~z and ~x given by the function f :

Problem 2.25: Mathematical Problem P

P: ~z = f(~x)

The concept of “conditioning” of problem P relates to the sensitivity of ~z to

changes in ~x. We perturb ~x by ∆~x and investigate the effect of this perturbation

on ~z:

~z + ∆~z = f(~x+ ∆~x).

2.5. Conditioning of a Mathematical Problem 33

Definition 2.26

Consider mathematical problem P: ~z = f(~x) with perturbed input:

~z + ∆~z = f(~x+ ∆~x).

1. Problem P is called ill-conditioned with respect to absolute errors if the

absolute condition number

κA =

‖∆~z‖

‖∆~x‖ (∆~x 6= 0)

satisfies

κA 1.

P is called well-conditioned otherwise.

2. Problem P is called ill-conditioned with respect to relative errors if the

relative condition number

κR =

‖∆~z‖

‖~z‖

‖∆~x‖

‖~x‖

(∆~x 6= 0, ~z 6= 0, ~x 6= 0)

satisfies

κR 1.

P is called well-conditioned otherwise.

Note: Ill-conditioning is often considered relative to the precision of the com-

puter and number system being used. For example, for double precision numbers,

the unit roundoff µ ≈ 1.1 10−16, indicating that number representation and el-

ementary computations have a relative accuracy of 16 decimal digits. If the

problem is ill-conditioned with κR ≈ 1/µ ≈ 1016, you cannot expect any correct

digits in your computation. If κR ≈

√

1/µ ≈ 108, you can expect about half of

the digits in the computed result to be correct (if you use an algorithm that is

numerically stable, see the next section). If κR ≈ 1, you can expect almost all

digits to be correct when using a stable algorithm.

Note: We did not specify in which norm to evaluate the condition numbers.

Depending on the problem, some norms may be easier to work with than others.

2.5.2 Conditioning of Elementary Operations

Example 2.27: Conditioning of the Sum Operation

We investigate the conditioning of mathematical problem

P: z = x+ y.

We have

z + ∆z = x+ ∆x+ y + ∆y,

leading to

∆z = ∆x+ ∆y.

34 Chapter 2. LU Decomposition for Linear Systems

Using the 1-norm, we find for the absolute condition number

κA =

|∆z|

‖(∆x,∆y)‖1

=

|∆z|

|∆x|+ |∆y|

=

|∆x+ ∆y|

|∆x|+ |∆y|

≤ |∆x|+ |∆y||∆x|+ |∆y| = 1,

so addition is well-conditioned w.r.t. the absolute error: the absolute error in

z is never much larger than the absolute errors in x or y.

However, again using the 1-norm, we find for the relative condition number

κR =

|∆z|

|z|

‖(∆x,∆y)‖1

‖(x,y)‖1

=

|∆x+∆y|

|x+y|

‖(∆x,∆y)‖1

‖(x,y)‖1

=

|x|+ |y|

|x+ y|

|∆x+ ∆y|

|∆x|+ |∆y|

≤ |x|+ |y||x+ y| .

The upper bound for κR shows that the problem is well-conditioned as long

as x+ y 6≈ 0. However, the relative condition number can be arbitrarily large

when x + y ≈ 0, i.e., when one subtracts two numbers of almost equal size,

x ≈ −y. In this case, the relative error in z can be much greater than the

relative error in x and y. When x ≈ −y, addition is ill-conditioned w.r.t.

the relative error. This blow-up of the relative error, and the loss of relative

accuracy that goes along with it, is referred to as catastrophic cancellation.

Example 2.28: An Example of Catastrophic Cancellation

Compute

z = x+ y

with

x = 1.000002, ∆x = 10−6, x+ ∆x = 1.000003,

|∆x|

|x| ≈ 10

−6,

y = −1.000013, ∆y = −2 10−6, y + ∆y = −1.000015, |∆y||y| ≈ 2 10

−6,

where ∆x and ∆y may be due, for example, to floating point rounding on a

computer.

2.5. Conditioning of a Mathematical Problem 35

We have

z = x+ y = −0.000011,

∆z = ∆x+ ∆y = −10−6,

z + ∆z = −0.000012,

so |∆z|

|z| = 0.09,

i.e., we have a 9% relative error in z, whereas the relative error in x and y

was only of the order of 0.0001%. This blow-up in relative error is due to

catastrophic cancellation.

Example 2.29: A Second Example of Catastrophic Cancellation

In the context of perturbations due to rounding in a floating point system, we

can consider the following example of catastrophic cancellation.

Consider floating point system

F (β = 10, t = 5, L = −10, U = 10),

with t = 5 digits in the mantissa and unit roundoff

µ =

1

2

β−t+1 = 0.00005.

We compute

z = x− y

for the following numbers x ≈ y with rounded floating point representation

x = fl(x) and y = fl(y):

x = 1.23456789, x = fl(x) = 1.2346,

y = 1.23111111, y = fl(y) = 1.2311.

The absolute and relative errors in x and y due to rounding are

∆x = fl(x)− x ≈ 3.2 · 10−5, |∆x||x| ≈ 2.6 · 10

−5 ≤ µ,

∆y = fl(y)− y ≈ −1.1 · 10−5, |∆y||y| ≈ 9.0 · 10

−6 ≤ µ.

Computing the difference of x and y, we have for the exact z and floating point

result z:

z = x− y = 0.00345678,

z = fl(fl(x)− fl(y)) = fl(0.0035) = 0.0035,

36 Chapter 2. LU Decomposition for Linear Systems

so

∆z = z − z ≈ 4.3 · 10−5, |∆z||z| ≈ 0.013,

i.e., we obtain a result with a 1% relative error in z, whereas the relative error

in x and y was only of the order of 0.005%. This blow-up of the relative

error is due to catastrophic cancellation. Equivalently, we can see that we only

have two correct digits in z, while we had 5 correct digits in x and y, and the

computer used can represent 5 correct digits. So when computing z, 3 of the

5 digits in relative accuracy were lost due to catastrophic cancellation.

Something to remember . . .

When devising numerical algorithms, avoid steps where two almost equal num-

bers are subtracted, if you can. (This ill-conditioned step in the algorithm may

cause the algorithm to be numerically unstable, due to blow-up of the relative

error, as explained in Section 2.6.)

Example 2.30: Conditioning of the Division Operation

We investigate the conditioning of mathematical problem

P: z =

x

y

(y 6= 0).

We have

z + ∆z =

x+ ∆x

y + ∆y

,

or

∆z = −z + x(1 + ∆x/x)

y(1 + ∆y/y)

,

which leads to

∆z

z

= −1 + 1 + ∆x/x

1 + ∆y/y

=

−1−∆y/y + 1 + ∆x/x

1 + ∆y/y

or

∆z

z

=

∆x

x

− ∆y

y

1 +

∆y

y

. (2.2)

In terms of relative conditioning, Eq. (2.2) shows immediately that ∆z/z

can only be large if ∆x/x or ∆y/y are large, which means that the relative error

does not blow up in a division operation and the problem is well-conditioned.

(Note that the relative condition number

κR =

|∆z|

|z|

‖(∆x,∆y)‖

‖(x,y)‖

2.5. Conditioning of a Mathematical Problem 37

does not easily lead to a useful bound in this case.)

In terms of absolute conditioning, however, we have

κA =

|∆z|

‖(∆x,∆y)‖

=

∣∣∣∣xy

∣∣∣∣ ∣∣∣∣∆xx − ∆yy

∣∣∣∣∣∣∣∣1 + ∆yy

∣∣∣∣ (|∆x|+ |∆y|) .

Assuming that ∆x/x and ∆y/y are small, κA can be arbitrarily large if y

approaches 0. This means that the absolute error may blow up if y ≈ 0 (as

can also be seen directly from Eq. (2.2)). Note that large |x| may also lead

to large κA, but if x is large, |∆x| can often also be expected to be large

proportional to x (in particular, if |∆x| is due to rounding in a floating point

number system), which would make κA small again.

In summary, divison is ill-conditioned with respect to absolute error when

y ≈ 0; in that case the absolute error of the result blows up.

(Note that this can be seen very easily by considering division by a small y

without error: in that case

z + ∆z =

x+ ∆x

y

,

so

∆z =

∆x

y

and ∆z clearly blows up when y ≈ 0.)

Example 2.31: Ill-Conditioning when Dividing by a Small Number

Compute

z =

x

y

with

x = 1, ∆x = 10−3, x+ ∆x = 1.001,

|∆x|

|x| = 10

−3,

y = 10−6, ∆y = 10−12, y + ∆y ≈ 10−6, |∆y||y| = 10

−6.

Then

z = x/y = 106,

z + ∆z = (x+ ∆x)/(y + ∆y) ≈ 106 + 103,

so

∆z ≈ 103 ≈ ∆x

y

,

while

∆x = 10−3,

i.e., the absolute error in x/y is 106 times greater than the absolute error in x.

38 Chapter 2. LU Decomposition for Linear Systems

Something to remember . . .

When devising numerical algorithms, avoid steps where you divide by a number

that is small in absolute value, if you can. (This ill-conditioned step in the

algorithm may cause the algorithm to be numerically unstable, due to blow-up

of the absolute error, see also Section 2.6.)

2.5.3 Conditioning of Solving a Linear System

We investigate the conditioning of solving linear system A~x = ~b for ~x, given A

and ~b.

Example 2.32: Conditioning of A~x = ~b, case ∆A = 0, ∆~b 6= 0

We consider mathematical problem

P: ~x = A−1~b = f(A,~b).

We perturb A and ~b in

(A+ ∆A)(~x+ ∆~x) = ~b+ ∆~b.

For simplicity, we first consider the case that ∆A = 0, ∆~b 6= 0. In this case,

we have

A(~x+ ∆~x) = ~b+ ∆~b

or

A∆~x = ∆~b.

We want to find an upper bound for

κR =

‖∆~x‖

‖~x‖

‖∆~b‖

‖~b‖

. (2.3)

From ∆~x = A−1∆~b we have

‖∆~x‖ = ‖A−1∆~b‖ ≤ ‖A−1‖ ‖∆~b‖, (2.4)

and from A~x = ~b we have

‖~b‖ = ‖A∆~x‖ ≤ ‖A‖ ‖~x‖,

or

1

‖~x‖ ≤

‖A‖

‖~b‖

. (2.5)

Plugging Eqs. (2.4) and (2.5) into Eq. (2.3), we obtain the upper bound

κR ≤

‖A−1‖ ‖∆~b‖‖A‖

‖~b‖

‖∆~b‖

‖~b‖

= ‖A‖ ‖A−1‖. (2.6)

2.5. Conditioning of a Mathematical Problem 39

Definition 2.33: Matrix Condition Number

Let A ∈ Rn×n be a nonsingular matrix. Then

κ(A) = ‖A‖ ‖A−1‖

is called the condition number of A.

Theorem 2.34

Let A ∈ Rn×n be a nonsingular matrix. The following property holds:

κ(A) = ‖A‖ ‖A−1‖ ≥ 1.

Proof. This simply follows from

1 = ‖I‖ = ‖AA−1‖ ≤ ‖A‖ ‖A−1‖ = κ(A),

for any vector-induced matrix norm.

We see that the relative condition number of problem P : ~x = A−1~b is

bounded above by the matrix condition number, ‖A‖ ‖A−1‖ (if we assume ∆A =

0). The matrix condition number also appears in a bound for κR for the general

problem when both A and ~b are perturbed:

Example 2.35: Conditioning of A~x = ~b, case ∆A 6= 0, ∆~b 6= 0

We consider mathematical problem

P: ~x = A−1~b = f(A,~b),

perturbing A and ~b as in

(A+ ∆A)(~x+ ∆~x) = ~b+ ∆~b.

It can be shown that

κR =

‖∆~x‖

‖~x‖

‖∆A‖

‖A‖ +

‖∆~b‖

‖~b‖

≤ κ(A) 1

1− τ , (2.7)

if

τ = ‖A−1‖ ‖∆A‖ < 1.

We say that matrix A is ill-conditioned when κ(A) 1, and

well-conditioned otherwise. Linear systems with a well-conditioned matrix

can be solved accurately on computers (because rounding errors in the input

do not disproportionally affect the computed result). Linear systems with ill-

conditioned matrices, however, are prone to inaccurate numerical solutions on

computers.

For the 2-norm matrix condition number we have the following explicit for-

mulas:

40 Chapter 2. LU Decomposition for Linear Systems

Theorem 2.36

Let A ∈ Rn×n be a nonsingular matrix. Then

κ2(A) = ‖A‖2 ‖A−1‖2 =

√

λmax(AAT )√

λmin(AAT )

=

σmax(A)

σmin(A)

.

If A is symmetric, then

κ2(A) =

|λ|max(A)

|λ|min(A) .

2.6 Stability of a Numerical Algorithm

If a mathematical problem is well-conditioned, it should be possible in principle

to obtain its solution accurately on a computer using finite-precision calculations.

For ill-conditioned problems, on the contrary, this is precarious, since rounding

errors in the input data or while the steps of the computation are performed,

may easily lead to large inaccuracies in the computed approximate solution.

But even for problems that are well-conditioned and that are in principle

accurately computable using finite precision, it still depends on our choice of

algorithm whether an accurate result is indeed obtained.

Some algorithms use steps that are by themselves ill-conditioned, causing er-

rors in those steps that may be magnified by error propagation and/or may ac-

cumulate, leading to inaccurate results for an otherwise well-conditioned mathe-

matical problem. When the problem itself is ill-conditioned such ill-conditioned

steps tend to be unavoidable, but when the problem is well-conditioned, it is

often possible to devise alternative algorithms that avoid these ill-conditioned

steps and lead to an accurate result. We call algorithms that obtain accurate

results for well-conditioned problems numerically stable algorithms. On the

contrary, algorithms that lead to unnecessary accuracy loss for well-conditioned

problems, e.g., because they employ avoidable ill-conditioned steps, are called

numerically unstable algorithms.

2.6.1 A Simple Example of a Stable and an Unstable Algorithm

Example 2.37: Stable Algorithm for the Roots of a Quadratic Polynomial

Consider the following mathematical problem:

P : compute the roots of p(x) = x2 − 400x+ 2.

The solution of this problem is given with high accuracy by

x1 ≈ 399.9950

≈ 0.005000063.

We assume the problem is well-conditioned (this can be shown).

We illustrate the stability of two possible algorithms for computing the roots

2.6. Stability of a Numerical Algorithm 41

in the floating point number system F (β = 10, t = 4, L = −10, U = 10), with

unit roundoff

µ =

1

2

β−t+1 = 0.5 · 10−3 = 0.0005.

Algorithm 1: We use the standard formulas for computing the roots of a

quadratic polynomial

ax2 + bx+ c = 0,

i.e.,

x1,2 =

−b±√b2 − 4ac

2a

,

or, in our case, for

x2 + 2fx+ c = 0,

we have

x1,2 = −f ±

√

d,

with

f = −200, c = 2, d = f2 − c.

In our floating point system we have

fl(200) = 200 fl(2002) = 2002 = 40 000 fl(2) = 2,

or, using symbols,

fl(f) = f fl(f2) = f2 fl(c) = c,

so we will not explicitly write the fl(·) operation for f, f2 and c in what follows.

For the discriminant d, we get

fl(f2 − c) = fl(40 000− 2) = 40 000 = fl(f2),

and we note that the contribution of c = 2 is lost in this operation due to

rounding. So we get for the approximate roots x1 and x2

x1 = fl[f + fl(

√

fl(f2 − c))]

= fl[200 + fl(

√

40 000)]

= fl[200 + 200] = 400,

x2 = fl[f − fl(

√

fl(f2 − c))]

= fl[200− fl(

√

40 000)]

= fl[200− 200] = 0,

with relative errors

x1 − x1

x1

≈ 1.25 · 10−5 ≈ µ, x2 − x2

x2

= 1 µ.

We see that the result for x2 is highly inaccurate: we obtain a relative error of

100%. We note that catastrophic cancellation has occurred in computing x2:

all accuracy was lost in computing the difference between two almost equal

numbers in the expression −f −

√

f2 − c. (The contribution of c, which is

essential for the relative accuracy of the solution, was entirely lost.) We say

that Algorithm 1 is numerically unstable, in this case because it clearly

contains an ill-conditioned step in which accuracy is lost.

42 Chapter 2. LU Decomposition for Linear Systems

Algorithm 2: A more stable algorithm can be obtained as follows. We compute

x1, the largest root in absolute value, as above, but we compute x2 using an

alternative formula. Observe that

x1x2 = c,

because p(x) can be factored as

p(x) = ax2 + bx+ c = a(x− x1)(x− x2).

So we compute x2 from

x2 =

c

x1

.

This step is well-conditioned unless x1 is small. So we have

x2 = fl

(

c

x1

)

= fl

(

2

400

)

= fl(0.005) = 0.005,

with relative error

x2 − x2

x2

≈ 1.2 · 10−5 ≈ µ.

Algorithm 2 is numerically stable (it avoids the ill-conditioned step).

2.6.2 Stability of LU Decomposition

It can be shown that the standard LU decomposition algorithm is somewhat un-

stable: the algorithm contains steps in which divisions occur by a small number,

when pivot elements are used that are close to zero in absolute value. This may

lead to large elements in the L and U factors, and may lead to inaccuracies, also

for well-conditioned problems.

For this reason, the following partial pivoting variant of LU decomposition is

often used: in every stage (indexed by k) of Gaussian elimination, determine the

pivot element in position (k, k) as follows. In column k (starting from position

(k, k) and below) one determines the largest element in absolute value, and

switches the row with the largest element with the current row k. As such, one

chooses, in each stage, the pivot element with the largest absolute value. This

extra operation is easy to implement and is computationally inexpensive (it

does not change the asymptotic cost of the algorithm). The resulting algorithm

tempers the growth of elements in L and U and is numerically more stable.

2.6. Stability of a Numerical Algorithm 43

Algorithm 2.38: LU decomposition with partial pivoting, kij version, in-

place

Input: A matrix A ∈ Rn×n

Output: L and U stored in A, and a vector ~p storing the pivoting rows

1: ~p(k) = (1, 2, . . . , n)> . Initialise the permutation vector

2: for k = 1, . . . , n− 1 do

3: Determine µ with k ≤ µ ≤ n such that |A(µ, k)| = ‖A(k : n, k)‖∞

4: if |A(µ, k)| < τ then

5: Break . Stop the loop if near zero pivot found

6: end if

7: Swap elements of the permutation vector ~p(k) and ~p(µ)

8: Swap the rows A(k, :) and A(µ, :)

9: rows = k + 1 : n . Update all the rows below k

10: A(rows, k) = A(rows, k)/A(k, k)

11: A(rows, rows) = A(rows, rows)−A(rows, k)A(k, rows)

12: end for

It can be seen as follows that this type of stability problem cannot occur when

applying Cholesky to SPD matrices A. The Cholesky algorithm decomposes SPD

matrices A as A = L̂L̂T , after which A~x = ~b is solved by forward and backward

substitutions L̂~y = ~b and L̂T~x = ~y.

We consider the matrix 2-norm and recall that, for SPD matrices A,

‖A‖2 =

√

λmax(AAT ) =

√

λmax(A2) = λmax(A).

For the Cholesky factor L̂ we obtain

‖L̂‖2 =

√

λmax(L̂L̂T ) =

√

λmax(A) =

√

‖A‖2,

and, similarly,

‖L̂T ‖2 =

√

‖A‖2.

This indicates that the matrix elements in A = L̂L̂T cannot grow strongly.

Cholesky decomposition is numerically stable (without need for pivoting).

44 Chapter 2. LU Decomposition for Linear Systems

Chapter 3

Least-Squares Problems

and QR Factorisation

3.1 Gram-Schmidt Orthogonalisation and QR

Factorisation

In this chapter, we generally consider real rectangular matrices A with more

rows than columns:

A ∈ Rm×n with m ≥ n.

As we will see in Section 3.3, such matrices arise in overdetermined linear systems

(with more equations than unknowns), which may be solved in the least-squares

(LS) sense.

When solving a LS problem, it will be useful to construct an orthonormal

basis for range(A) = span{~a1, . . . ,~an}, the vector space spanned by the columns

of

A =

~a1 . . . ~an

.

In this section we will consider the Gram-Schmidt algorithm to orthogonalise

the columns of A, which will lead to the so-called QR factorisation of A.

3.1.1 Gram-Schmidt Orthogonalisation

For now, we will assume that A ∈ Rm×n, with m ≥ n, has full rank, i.e., its

columns are linearly independent. We seek to construct an orthonormal basis

for range(A).

We first recall the concept of expansion of a vector in an orthonormal basis.

Example 3.1

Let {~e1, ~e2} be a standard orthonormal basis for R2, i.e., ~eTi ~ej = δij for all i, j.

Then any ~x ∈ R2 can be expanded in the basis as

~x = (~eT1 ~x)~e1 + (~e

T

2 ~x)~e2.

In the Gram-Schmidt procedure, we begin by constructing an orthogonal set

of vectors {~v1, . . . , ~vn} that spans range(A) = span{~a1, . . . ,~an}, by taking the

45

46 Chapter 3. Least-Squares Problems and QR Factorisation

vectors ~ai and subtracting their components in the directions of the previous ~vj .

For example, for the case where A has 3 columns (n = 3):

~v1 = ~a1

~v2 = ~a2 − ~v

T

1 ~a2

‖~v1‖2 ~v1

~v3 = ~a3 − ~v

T

1 ~a3

‖~v1‖2 ~v1 −

~vT2 ~a3

‖~v2‖2 ~v2.

In this chapter, all vector norms denote 2-norms. We then obtain the set of

orthonormal vectors {~q1, . . . , ~qn} such that span{~q1, . . . , ~qn} = span{~a1, . . . ,~an},

by normalising the vectors ~vi to unit length:

~qi =

~vi

‖~vi‖ .

For the n = 3 case, this results in

~q1 ‖~v1‖ = ~v1 = ~a1

~q2 ‖~v2‖ = ~v2 = ~a2 − (~qT1 ~a2) ~q1

~q3 ‖~v3‖ = ~v3 = ~a3 − (~qT1 ~a3) ~q1 − (~qT2 ~a3) ~q2. (3.1)

We rewrite this as

~q1 r11 = ~a1

~q2 r22 = ~a2 − r12 ~q1

~q3 r33 = ~a3 − r13 ~q1 − r23 ~q2,

which leads to the factorisation of matrix A as

A =

~a1 ~a2 ~a3

=

~q1 ~q2 ~q3

r11 r12 r130 r22 r23

0 0 r33

= Q̂R̂.

This factorisation A = Q̂R̂ of A is known as the reduced QR factorisation, see

below.

This leads to the following algorithm for Gram-Schmidt orthogonalisation:

Algorithm 3.2: Gram-Schmidt Orthogonalisation

Input: matrix A ∈ Rm×n

Output: the factor matrices Q̂ and R̂ in the thin QR factorisation A = Q̂R̂

Q̂ = 0

R̂ = 0

for j=1:n do

~vj = ~aj

for i=1:j-1 do

r̂ij = ~̂q

T

i ~aj

~vj = ~vj − r̂ij ~̂qi

end for

r̂jj = ‖~vj‖

~̂qj = ~vj/r̂jj

end for

3.1. Gram-Schmidt Orthogonalisation and QR Factorisation 47

3.1.2 QR Factorisation

We now consider the general case of A ∈ Rm×n, with m ≥ n, but where the

columns of A are not necessarily linearly independent.

Definition 3.3

Let A ∈ Rm×n. The reduced QR factorisation of A is given by

A = Q̂R̂, (3.2)

where Q̂ ∈ Rm×n has orthonormal columns:

Q̂T Q̂ = In,

and R̂ ∈ Rn×n is upper triangular.

The n columns of Q̂ form an orthonormal basis for an n-dimensional subspace

of Rm. It is possible to expand this to a basis of the entire Rm by expanding Q̂ on

the right with m−n additional columns that contain m−n further orthonormal

vectors in Rm, leading to the (full) QR factorisation:

Definition 3.4

Let A ∈ Rm×n. The (full) QR factorisation of A is given by

A = QR = Q

[

R̂

0

]

, (3.3)

where Q ∈ Rm×m is an orthogonal matrix:

QTQ = Im = QQ

T ,

and R̂ ∈ Rn×n is upper triangular.

Theorem 3.5

Every A ∈ Rm×n has a full QR factorisation A = QR, and hence also a

reduced QR factorisation A = Q̂R̂.

This can be shown, for the reduced QR factorisation, using the observation

that in the Gram-Schmidt algorithm, if a zero ~vj is obtained and ~qj cannot

be computed, one can instead choose any vector ~qj that is orthonormal with

respect to the previous ~qi (for example, by repeating the orthogonalisation step

for determining ~vj starting from a random vector ~aj ∈ Rm, instead of the original

jth column ~aj of A). For the full QR factorisation, the additional orthogonal

columns of Q can be determined in a similar manner.

3.1.3 Modified Gram-Schmidt Orthogonalisation

It can be observed in example computations, and shown theoretically, that the

Gram-Schmidt algorithm is numerically unstable. If the orthonormal basis is

computed as in Eq. (3.1), the resulting vectors ~qi may suffer from loss of orthog-

onality due to rounding errors.

The stability can be improved substantially by the following small modifi-

cation to the algorithm. For example, for ~v3 in Eq. (3.1), one subtracts the

48 Chapter 3. Least-Squares Problems and QR Factorisation

component in the direction of ~q2 by projecting the original column vector ~a3

onto ~q2. Even though in exact arithmetic ~q1 is orthogonal to ~q2 and the com-

ponent of ~a3 in the direction of ~q2 is equal to the component of ~a3 − (~qT1 ~a3) ~q1

in the direction of ~q2, it turns out that ~a3 − (~qT1 ~a3) ~q1 may have a slightly dif-

ferent component in the direction of ~q2 due to rounding, and it is better for

stability to subtract the component of ~a3 − (~qT1 ~a3) ~q1 in the direction of ~q2. In

a similar manner we repeatedly subtract, as terms are added to determine each

~vj , the components in direction ~qi of the intermediate result for ~vj , instead of

the components of ~aj in direction ~qi. This results in the following modified

Gram-Schmidt algorithm:

Algorithm 3.6: Modified Gram-Schmidt Orthogonalisation

Input: matrix A ∈ Rm×n

Output: the factor matrices Q̂ and R̂ in the thin QR factorisation A = Q̂R̂

Q̂ = 0

R̂ = 0

for j=1:n do

~vj = ~aj

for i=1:j-1 do

r̂ij = ~̂q

T

i ~vj

~vj = ~vj − r̂ij ~̂qi

end for

r̂jj = ‖~vj‖

~̂qj = ~vj/r̂jj

end for

It can be shown that this modified version is substantially more stable than

the original Gram-Schmidt procedure, but for ill-conditioned problems loss of

orthogonality can still occur and a more stable approach is desired. In the next

section we consider a procedure using orthogonal transformations of Householder

reflection type that will accomplish this goal.

3.2 QR Factorisation using Householder Transformations

Since the Gram-Schmidt orthogonalisation and its modified version are deficient

in terms of their numerical stability problems, we seek a more stable approach

to compute the QR decomposition.

It turns out that an approach based on applying orthogonal transformations

to A results in a method with more favourable stability properties. One reason

why such methods have good stability properties is that multiplying A with an

orthogonal matrix Q preserves the Euclidean length of the columns of A:

Theorem 3.7

Orthogonal matrices preserve Euclidean length.

3.2. QR Factorisation using Householder Transformations 49

Proof. Let Q ∈ Rn×n with QTQ = I. Suppose ~y = Q~x. Then

‖~y‖ = ‖Q~x‖

=

√

(Q~x)TQ~x

=

√

~xTQTQ~x

=

√

~xT~x

= ‖~x‖,

where, as in the rest of this chapter, ‖ · ‖ indicates the vector 2-norm.

This means that the matrix element sizes of QA cannot be much larger than

those of A.

A useful further property of orthogonal matrices is the following:

Theorem 3.8

The product of orthogonal matrices is orthogonal.

Proof. Let Q = Q1Q2, where Q1, Q2 ∈ Rn×n are orthogonal. Then

QTQ = (Q1Q2)

TQ1Q2 = Q

T

2 Q

T

1 Q1Q2 = I

3.2.1 Householder Reflections

We want to transform A ∈ Rm×n (with m ≥ n) into an upper-triangular matrix

R ∈ Rm×n by applying orthogonal transformations to A.

Our approach will be to multiply A by a sequence of orthogonal transfor-

mation matrices Qj that create zeros in column j below the element in position

(j, j). This aim is similar to LU decomposition, but we know that each or-

thogonal transformation preserves the Euclidean length of the matrix columns

it operates on.

Let’s consider the first orthogonal transformation, Q1 ∈ Rm×m, which is

applied to

A =

~a1 . . . ~an

.

such that Q1A has zeros in its first column below the first element. Since the

length of column ~a1 is preserved, we know that the first element in the trans-

formed column has to be ±‖~a1‖:

Q1A =

±‖~a1‖ ~rT1

0 A˜2

.

50 Chapter 3. Least-Squares Problems and QR Factorisation

We choose for now a transformation Q1 that results in a transformed first column

with a negative value as its first element:

Q1~a1 =

−‖~a1‖

0

...

0

.

The specific type of transformation we choose for Q1 (and all subsequent Qjs) is

a reflection in Rm about a hyperplane that is orthogonal to the line from ~a1 to

Q1~a1 and intersects the line in the middle between ~a1 and Q1~a1. This reflection

operation is called a Householder reflection. Let ~v1 be the vector pointing from

Q1~a1 to ~a1:

~v1 = ~a1 −Q1~a1,

and let ~u1 be the unit vector in that direction:

~u1 =

~v1

‖~v1‖ .

The vector ~u1 is called a Householder vector. The operation of the Householder

reflection Q1 onto a vector ~x ∈ Rm can then be expressed as

Q1~x = ~x− 2(~uT1 ~x)~u1,

and, since (~uT1 ~x)~u1 = (~u1~u

T

1 )~x, the matrix form of the Householder orthogonal

transformation is given by

Q1 = Im − 2~u1~uT1 .

Theorem 3.9

Let ~u ∈ Rm with ‖~u‖ = 1. Then the Householder reflection matrix

Q~u = Im − 2~u~uT

is a symmetric and orthogonal matrix.

Proof. Clearly, QT~u = Q~u.Then

QT~uQ~u = Q

2

~u

= (Im − 2~u~uT )(Im − 2~u~uT )

= Im − 4~u~uT + 4~u(~uT~u)~uT

= Im.

Finally, we note that the sign in

Q1~a1 =

±‖~a1‖

0

...

0

3.2. QR Factorisation using Householder Transformations 51

is chosen in practical implementations based on numerical stability concerns.

For numerical stability reasons, we choose the sign of ±‖~a1‖ opposite to the

sign of the first component of the original column ~a1 of A, (~a1)1: this avoids

catastrophic cancellation in computing ~v1 = ~a1−Q1~a1 that may otherwise arise

when |(~a1)1| ≈ ‖~a1‖. In other words, we choose the sign of ±‖~a1‖ such that the

size of ~v1 is as large as possible.

3.2.2 Using Householder Reflections to Compute the QR

Factorisation

We now use a sequence of n Householder reflections to compute the QR decom-

position of A ∈ Rm×n. The first transformation creates the desired zeros in the

first column of A:

Q1A =

r11 ~r

T

1

0 A˜2

∈ Rm×n,

and is followed by a second orthogonal transformation of Householder type that

creates zeros in the first column of A˜2 ∈ R(m−1)×(n−1):

Q˜2A˜2 =

r22 ~r

T

2

0 A˜3

∈ R(m−1)×(n−1).

Defining

Q2 =

1 0

0 Q˜2

∈ Rm×m,

these steps can be combined as

Q2Q1A =

r11 ~r

T

1

0 Q˜2A˜2

=

r11 ~rT1

0

r22 ~r

T

2

0 A˜3

∈ Rm×n,

and so on, with, in the next step,

Q˜3A˜3 =

r33 ~r

T

3

0 A˜4

∈ R(m−2)×(n−2),

and

Q3 =

I2 0

0 Q˜3

∈ Rm×m,

52 Chapter 3. Least-Squares Problems and QR Factorisation

etc.

After n transformations this results in

QnQn−1 . . . Q2Q1A =

[

R̂

0

]

,

where R̂ ∈ Rn×n is upper triangular, and

QT = QnQn−1 . . . Q2Q1

is an orthogonal matrix. Finally, the QR factorisation of A results as

A = Q1Q2 . . . Qn−1Qn

[

R̂

0

]

= QR.

3.2.3 Computing Q

In many cases, forming the m×m matrix Q is not needed explicitly. For example,

if only matrix-vector products Q~x are required, one can save the Householder

vectors ~ui (i = 1, . . . , n) and evaluate Q~x as

Q~x = Q1Q2 . . . Qn−1Qn ~x.

If Q is desired explicitly, there are several options for constructing it:

• The transpose of Q can be formed as the loop over the columns of A

progresses:

QT = QnQn−1 . . . Q2Q1Im,

starting with the Q1 multiplication, and then Q2, etc., and Q can be ob-

tained by taking the transpose at the end. This is the approach used in the

pseudocode for the QR decomposition by Householder reflections below.

However, this approach is more costly than necessary because Q1 is typi-

cally dense and does not have leading rows that are zero below the diagonal.

Therefore, all subsequent Householder reflections with Q˜2, . . . , Q˜n−1 need

to be carried out on all n columns of the relevant rows of the intermediate

result (in forming R, in contrast, the transformations do not need to be

carried out on the leading zero columns). The reverse order used in the

next option avoids these extra flops.

• One can store the Householder vectors ~ui (i = 1, . . . , n) and form Q at the

end as

Q = Q1Q2 . . . Qn−1QnIm,

starting with the Qn multiplication, and then Qn−1, etc. This is more

efficient since the Q˜k don’t need to be applied to the leading columns of

the intermediate results that are zero below the diagonal.

This is pseudocode for computing the QR decomposition by Householder

reflections:

3.2. QR Factorisation using Householder Transformations 53

Algorithm 3.10: QR Factorisation using Householder Transformations

Input: matrix A ∈ Rm×n

Output: the factor matrices Q and R in the (full) QR factorisation A = QR

1: R = A

2: Qt = Im . Qt will be the transpose of Q

3: for k=1:n do

. first determine the Householder vector ~uk

4: ~x = R(k : m, k)

5: ~y = zeros(m− k + 1, 1)

6: if x1 < 0 then

7: y1 = ‖~x‖

8: else

9: y1 = −‖~x‖

10: end if

11: ~v = ~x− ~y

12: ~uk = ~v/‖~v‖

. apply the Householder transformation to the relevant part of R

13: R(k : m, k : n) = R(k : m, k : n)− 2~uk (~uTkR(k : m, k : n))

. finally, update Qt (note: we need *all* columns here!)

14: Qt(k : m, 1 : m) = Qt(k : m, 1 : m)− 2~uk (~uTkQt(k : m, 1 : m))

15: end for

16: Q = QTt

The following more compact version of the pseudocode computes QR Fac-

torisation using Householder Transformations without forming Q and perform

operations in-place.

Algorithm 3.11: QR Factorisation using Householder Transformations

(without forming Q)

Input: A matrix A ∈ Rm×n, m > n

Output: The factor matrix R ∈ Rn×n and a sequence of vectors ~uk, k =

1, . . . , n that defines the sequence of unitary similarity transformations.

1: for k = 1, . . . , n do

2: ~b = A(k:m, k)

3: ~v = ~b+ sign(b1) ‖~b‖~e1

4: ~uk = ~v/‖~v‖

5: A(k, k) = −sign(b1)‖~b‖

6: A(k+1:m, k) = 0

7: A(k:m, k+1:n) = A(k:m, k+1:n)− (2~uk)

(

~u>k A(k:m, k+1:n)

)

8: end for

3.2.4 Computational Work

When implementing the Householder algorithm, it is essential to implement the

reflection by first computing the vector ~zTk = ~u

T

kR(k : m, k : n) in

R(k : m, k : n) = R(k : m, k : n)− 2~uk (~uTkR(k : m, k : n)),

54 Chapter 3. Least-Squares Problems and QR Factorisation

rather than first constructing the rank-1 matrix ~uk ~u

T

k and multiplying it with

R(k : m, k : n), which is much more expensive.

When implemented in this order, it can be shown that the dominant terms

in the computational work are given by

W ≈ 2mn2 − 2

3

n3 flops.

Notes:

• For the case of square matrices, m = n, we have

W ≈ 2n3 − 2

3

n3 =

4

3

n3 flops,

which is twice the work of LU decomposition.

• The QR decomposition can be used to solve linear systems as follows:

1. Compute Q and R in A = QR, e.g., using Householder transforma-

tions.

2. The system A~x = ~b can be solved by backward substitution as can be

seen from the following equivalences:

A~x = ~b

QR~x = ~b

QTQR~x = QT~b

R~x = QT~b.

Solving the system in this way is more stable than using the LU

decomposition, but comes at twice the cost.

• The QR decomposition using Householder transformations is really useful

for solving least-squares problems in a numerically stable way, as will be

explained in the next sections.

3.3 Overdetermined Systems and Least-Squares Problems

Let A ∈ Rm×n, where m > n. Such overdetermined linear systems, where there

are more equations (m) than unknowns (n), are common in applications.

Example 3.12

Consider the linear regression problem of finding the “best” linear relation

y(t) = c t + d between m observations {(ti, yi)} for i = 1, . . . ,m. We aim to

solve the following linear system

t1 1

t2 1

...

...

tm 1

(

c

d

)

=

y1

y2

...

ym

. (3.4)

However, the above linear system may be overdetermined and does not have a

solution.

3.3. Overdetermined Systems and Least-Squares Problems 55

Exact solutions do not generally exist for overdetermined systems

A~x = ~b, A ∈ Rm×n, m > n.

Instead, one can seek the “optimal” ~x that minimizes the residual vector

~r = ~b−A~x ∈ Rm

in some norm. A popular choice for the norm, which can be justified, e.g., in

statistical applications, is the 2-norm. This leads to the following definition of

an overdetermined linear least-squares (LS) problem:

Definition 3.13: Least-Squares Problem

Let A ∈ Rm×n with m > n. Find ~x that minimizes f(~x) = ‖~b−A~x‖22.

Note that

f(~x) = ‖~b−A~x‖22 = ‖~r‖22 =

m∑

k=1

r2k,

which explains that the solution is indeed sought that provides the least sum of

squares of the residual components.

3.3.1 The Normal Equations – A Geometric View

Let A ∈ Rm×n with m > n. The columns of A span a subspace of Rm. The

solution of the LS problem is the vector ~x ∈ Rn such that the vector A~x ∈

range(A) is the best approximation of~b in range(A), in the sense that ~xminimises

the residual, ~r = ~b−A~x. The residual ~r = ~b−A~x is minimal if it is orthogonal to

range(A) (or, equivalently, if A~x is the orthogonal projection of~b onto range(A)).

If we specify this geometric condition, we find a linear system of equations that

specifies the solution of the LS problem:

~r ⊥ A~z ∀~z ⇐⇒ (A~z)T~r = 0 ∀~z

⇐⇒ ~zTAT (~b−A~x) = 0 ∀~z

⇐⇒ AT~b−ATA~x = 0

⇐⇒ ATA~x = AT~b.

where ATA ∈ Rn×n. The equations

ATA~x = AT~b

are called the normal equations, the first way to compute the LS solution. One

problem in this approach is that ATA can be ill-conditioned, more so than A

(see below).

3.3.2 The Normal Equations

The following theorem shows that linear least-squares problems can be solved

by finding the solution of a square linear system with matrix ATA.

56 Chapter 3. Least-Squares Problems and QR Factorisation

Theorem 3.14

Let A ∈ Rm×n with m > n.

1. Any minimiser of f(~x) = ‖~b−A~x‖22 satisfies

ATA~x = AT~b.

2. Any solution of the normal equations is a minimiser of f(~x).

3. If A has linearly independent columns, then ATA~x = AT~b (and the least-

squares problem) has a unique solution.

Proof.

1. Consider

f(~x) =

m∑

k=1

r2k =

m∑

k=1

(bk − (

n∑

j=1

akjxj))

2.

If ~x is a minimiser of f(~x), then ~x satisfies the optimality equations

∂f

∂xi

= 0 (i = 1, . . . , n).

This gives

∂f

∂xi

=

m∑

k=1

2aki(bk − (

n∑

j=1

akjxj)) = 0 (i = 1, . . . , n),

or

m∑

k=1

n∑

j=1

akiakjxj =

m∑

k=1

akibk (i = 1, . . . , n).

It is easy to see that this corresponds to ATA~x = AT~b. (Check!) Note

that solutions of this equation could also be maximisers of f(~x), which

we exclude in the next part.

2. Let ~x satisfy ATA~x = AT~b, and ~r = ~b−A~x. Then f(~x+ ~u) ≥ f(~x) ∀~u ∈

Rn×n, since

f(~x+ ~u) = (~b−A(~x+ ~u))T (~b−A(~x+ ~u))

= (~r −A~u)T (~r −A~u)

= ~rT~r − ~rTA~u− ~uTAT~r + ~uTATA~u

= ~rT~r − 2~uTAT~r + ~uTATA~u

= f(~x) + ‖A~u‖22.

3. If A has linearly independent columns, then A~x 6= 0 for all ~x 6= 0.

Therefore, ~xTATA~x = ‖A~x‖22 > 0 for all ~x 6= 0 and ATA is SPD. This

implies that ATA is nonsingular, so ATA~x = AT~b has a unique solution.

3.4. Solving Least-Squares Problems using QR Factorisation 57

Note: If A has linearly dependent columns, then ATA is singular and the

normal equations have infinitely many solutions.

3.3.3 Computational Work for Forming and Solving the Normal

Equations

If A ∈ Rm×n with m > n has linearly independent columns, then the LS solution

of A~x = ~b can be computed efficiently using Cholesky decomposition applied to

the normal equations, since ATA ∈ Rn×n is SPD. The dominant terms in the

computational work, including the cost of forming ATA, are W ≈ n3/3+n2(2m−

1) flops, where the cost of forming ATA dominates strongly for m n.

3.3.4 Numerical Stability of Using the Normal Equations

Regarding conditioning, we have for general non-symmetric square A ∈ Rn×n

κ2(A) =

σmax(A)

σmin(A)

=

√

λmax(ATA)√

λmin(ATA)

.

This can be extended to rectangular matrices A ∈ Rm×n with m > n and linearly

independent columns, using the same expressions for κ2 (since λi(A

TA) > 0 for

all i = 1, . . . , n in this case).

The condition number of the matrix ATA arising in the normal equations is

given by

κ2(A

TA) =

σmax(A

TA)

σmin(ATA)

=

√

λmax((ATA)TATA)√

λmin((ATA)TATA)

.

Since

σmax(A

TA) =

√

λmax((ATA)TATA) =

√

λmax((ATA)2) = λmax(A

TA),

σmin(A

TA) =

√

λmin((ATA)TATA) =

√

λmin((ATA)2) = λmin(A

TA),

we obtain

κ2(A

TA) = κ22(A).

This indicates that solving the normal equations squares the condition number

of the original matrix A, and may thus be ill-conditioned. In the next section we

will see how the QR decomposition of A, e.g. using Householder transformations,

can be used to solve the LS problem. This avoids the squaring of the condition

number of A, and is more numerically stable than solving the normal equations.

3.4 Solving Least-Squares Problems using QR

Factorisation

We consider overdetermined system

A~x = ~b with A ∈ Rm×n (m ≥ n),~b ∈ Rm, ~x ∈ Rn.

We seek to solve the system in the least-squares sense, i.e., we minimize

‖~r‖2 = ‖~b−A~x‖2. (3.5)

58 Chapter 3. Least-Squares Problems and QR Factorisation

We use the (full) QR decomposition of A, with A = Q

[

R̂

0

]

, where Q =

[

Q̂|Q¯

]

∈

Rm×m and Q̂ ∈ Rm×n and R̂ ∈ Rn×n. The factors Q̂ and R̂ can be obtained

using the Householder algorithm.

Then we observe that

‖~r‖22 = ‖QT~r‖22

= ‖QT (~b−A~x)‖22

=

∥∥∥∥QT (~b−Q [R̂0

]

~x

)∥∥∥∥2

2

=

∥∥∥∥∥

[

Q̂T~b

Q¯T~b

]

−

[

R̂~x

0

]∥∥∥∥∥

2

2

= ‖Q̂T~b− R̂~x‖22 + ‖Q¯T~b‖22︸ ︷︷ ︸

indep. of ~x

.

Thus ‖~r‖22 is minimal when Q̂T~b− R̂~x = 0 or

R̂~x = Q̂T~b. (3.6)

We solve this n×n system by backward substitution to find the optimal ~x. This

is numerically more stable than solving the normal equations.

3.4.1 Geometric Interpretation in Terms of Projection Matrices

Equation (3.6) can be interpreted geometrically as follows. We know that the

vector ~x minimising Eq. (3.5) satisfies

A~x = the orthogonal projection of ~b onto range(A).

The columns of

Q̂ =

~q1 . . . ~qn

form an orthogonal basis of range(A). The product

Q̂T~b =

~qT1 ~b. . .

~qTn

~b

contains the projection coefficients of ~b onto the basis vectors ~qi. Then

Q̂Q̂T~b = (~qT1

~b)~q1 + . . .+ (~q

T

n

~b)~qn

= (~q1~q

T

1 )

~b+ . . .+ (~qn~q

T

n )

~b

is the orthogonal projection of ~b onto range(A). So we conclude that the LS

solution ~x satisfies

A~x = Q̂Q̂T~b

Q̂R̂~x = Q̂Q̂T~b

Q̂T Q̂R̂~x = Q̂T Q̂Q̂T~b

3.5. Alternating Least-Squares Algorithm for Movie Recommendation 59

or

R̂~x = Q̂T~b

since Q̂T Q̂ = Im. This is a geometric way to derive result (3.6).

Note that the matrix

P = Q̂Q̂T ∈ Rm×m

is an orthogonal projection matrix, since it satisfies P 2 = P and PT = P .

The matrix-vector product Q̂Q̂T~z projects any vector ~z ∈ Rm orthogonally onto

range(A). The orthogonal projector

Q̂T Q̂ = ~q1~q

T

1 + . . .+ ~qn~q

T

n

is composed of the sum of n rank-one orthogonal projection matrices

Pi = ~qi~q

T

i ∈ Rm×m,

with each Pi satisfying P

2

i = Pi and P

T

i = Pi.

3.5 Alternating Least-Squares Algorithm for Movie

Recommendation

Continuing the discussion on algorithms for movie recommendation from Section

1.3, we now proceed with formulating a least-squares-based optimisation algo-

rithm to compute matrices U ∈ Rf×m and V ∈ Rf×n, with f a small integer

m,n, such that UTM approximates the ratings matrix R on the set of known

ratings, R:

R

≈

UT

M

. (3.7)

In particular, we seek U and M that minimise

g(U,M) = ‖R− UTM‖2F,R + λN(U,M), (3.8)

where the ‖·‖F,R norm is a partial Frobenius norm, summed only over the known

entries of R, as given by the index set R, and N(U,M) is a regularisation term.

We will now explain the details of the Alternating Least Squares (ALS) algo-

rithm for solving minimisation problem (3.8). The algorithm determines U and

M in an alternating fashion: starting from an initial guess for U , determine the

optimal M with U fixed, then determine the optimal U with M fixed, and so

forth. Each subproblem of determining M with fixed U (and vice versa) in this

alternating algorithm boils down to a (regularized) linear least-squares problem,

as we will now describe.

60 Chapter 3. Least-Squares Problems and QR Factorisation

3.5.1 Least-Squares Subproblems for Movie Recommendation

For each user i, let Ji = {j1, j2, j3, . . .} be the set of the indices j of the movies

ranked by user i, and for each movie j, let Ij = {i1, i2, i3, . . .} be the set of the

indices i of the users who have ranked movie j. Let |Ji| be the number of movies

ranked by user i, and let |Ij | be the number of users who have ranked movie j.

Then the function (3.8) we want to minimise is given specifically by

min

U,M

g(U,M) =

∑

(i,j)∈R

(

rij − ~uTi ~mj

)2

+ λ

m∑

i=1

|Ji| ‖~ui‖22 +

n∑

j=1

|Ij | ‖~mj‖22

, (3.9)

with λ a fixed regularisation parameter.

We first rewrite the first part of the objective function g(U,M) as a sum over

all movies:

min

U,M

g(U,M) =

n∑

j=1

‖~rj − UT ~mj‖22,Ij

+ λ

m∑

i=1

|Ji| ‖~ui‖22 +

n∑

j=1

|Ij | ‖~mj‖22

, (3.10)

where ‖ · ‖2,Ij is a partial 2-norm, summed only over the vector entries that

correspond to users who have ranked movie j, as given by the index set Ij . That

is,

‖~rj − UT ~mj‖22,Ij = ‖~rIj − UTIj ~mj‖22,

where ~rIj is the vector containing all the known ratings for movie j (the elements

of column ~rj of R that contain ratings, by the users in the index set Ij), and UTIj

is a submatrix of the user matrix UT that contains only the rows of the users

that have ranked movie j. We rewrite Eq. (3.10) as

min

U,M

g(U,M) =

n∑

j=1

‖~rIj − UTIj ~mj‖22 + λ

m∑

i=1

|Ji| ‖~ui‖22 +

n∑

j=1

|Ij | ‖~mj‖22

.

(3.11)

In the first half of an ALS iteration, we fix U , and find the optimal M given

that fixed U . To this end, we set the gradient of g(U,M) with respect to the

elements of M equal to zero. It is convenient to express this for each of the

columns ~mj of M :

∇~mjg(U,M) = ∇~mj‖~rIj − UTIj ~mj‖22 + λ|Ij | ∇~mj‖~mj‖22 = 0 (j = 1, . . . , n).

(3.12)

These are n independent (regularised) linear least-squares problems for the n

columns ~mj of movie matrix M (with fixed user matrix U).

To compute the gradients in these expressions, the proof of Theorem 3.14

shows that

∇~x‖~b−A~x‖22 = −2AT (~b−A~x)

= 2(ATA~x−AT~b),

3.5. Alternating Least-Squares Algorithm for Movie Recommendation 61

and we also have (e.g., as a special case of the above) that

∇~x‖~x‖22 = 2~x.

Applying these to Eq. (3.12) gives the n (regularised) normal equation conditions

2(UIjU

T

Ij ~mj − UIj~rIj ) + 2λ|Ij |~mj = 0,

(UIjU

T

Ij + λ|Ij |I)~mj = UIj~rIj (j = 1, . . . , n). (3.13)

Solving these small f × f linear systems for the columns ~mj of M (which can be

done in parallel) updates M in the first half of an ALS iteration.

The second half of the ALS iteration fixes M and updates U in a manner

completely analogous to the first half of the iteration. Specifically, we define the

transpose, Q, of the ratings matrix R,

Q = RT

and write

Q ≈MTU

i.e., Q

≈

MT

U

. (3.14)

We rewrite Eq. (3.9) as

min

U,M

g(U,M) =

m∑

i=1

‖~qJi −MTJi~ui‖22 + λ

m∑

i=1

|Ji| ‖~ui‖22 +

n∑

j=1

|Ij | ‖~mj‖22

,

(3.15)

where ~qJi is the vector containing all the known ratings given by user i (the

elements of column ~qi of Q that contain ratings, for the movies in the index set

Ji), and MTJi is a submatrix of the movie matrix MT that contains only the

rows of the movies that are ranked by user i.

Setting the gradient with respect to the elements of U equal to zero, column-

by-column, gives

∇~uig(U,M) = ∇~ui‖~qJi −MTJi~ui‖22 + λ|Ji| ∇~ui‖~ui‖22 = 0 (i = 1, . . . ,m).

(3.16)

This gives the m (regularised) normal equations

(MJiM

T

Ji + λ|Ji|I)~ui = MJi~qJi (i = 1, . . . ,m). (3.17)

Solving these small f × f linear systems for the columns ~ui of U (which, again,

can be done in parallel) updates U in the second half of an ALS iteration.

62 Chapter 3. Least-Squares Problems and QR Factorisation

Chapter 4

The Conjugate Gradient

Method for Sparse SPD

Systems

In this Chapter we will consider iterative methods for solving linear systems

A~x = ~b,

for the specific case where A ∈ Rn×n is a symmetric positive definite (SPD)

matrix.

When using direct solvers for linear systems A~x = ~b such as Gaussian elimina-

tion / LU decomposition and Cholesky decomposition, the algorithm is executed

until completion at which time one obtains the exact solution (in exact arith-

metic), and the algorithm does not generate approximate solutions along the

way.

In contrast, iterative methods start from an initial guess ~x0 for the solution

that one seeks to improve in a sequence of approximations

~x0, ~x1, ~x2, ~x3, . . .

until some convergence criterion is attained that typically prescribes a desired

accuracy in the approximation.

Iterative methods can be advantageous in terms of computational cost, in

particular for large-scale problems that involve highly sparse matrices. For ex-

ample, the matrix A ∈ Rn×n in our 2D model problem has about 5 nonzeros per

row. The cost for a matrix-vector product is therefore O(n) flops (in particular,

about 9n flops). The cost per iteration of iterative solvers is often proportional

to the cost of a matrix-vector product. So if an iterative solver can solve A~x = ~b

up to a desired accuracy in a number of iterations that does not grow strongly

with n, then it can often beat direct solvers. For example, for the 2D model

problem, iterative solvers exist, with O(n) cost per iteration, that converge to

the accuracy with which the PDE was discretised in a number of iterations that

does not grow with problem size. Those iterative solvers can obtain an accurate

answer in O(n) work, which, for large problems, is much faster than the O(n3)

cost of LU decomposition, or the O(n2) cost of banded LU decomposition. In

this chapter we will start explore such iterative methods for solving A~x = ~b, for

the particular case that A is SPD, which arises in many applications.

63

64 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

4.1 An Optimisation Problem Equivalent to SPD Linear

Systems

Theorem 4.1

Let A ∈ Rn×n be an SPD matrix. Then

φ(~x) =

1

2

~xTA~x−~bT~x+ c (4.1)

with c an arbitrary constant, has a unique global minimum, ~x∗, which is the

unique solution of A~x = ~b.

Proof. Since A is SPD, it is nonsingular and A~x = ~b has a unique solution,

which we call ~x∗.

Given an approximation ~x of ~x∗, we define the error ~e by

~e = ~x∗ − ~x.

Considering the A-norm of the error, ~e = ~x∗ − ~x, we find

‖~x∗ − ~x‖A = ‖~e‖A

= ~eTA~e

= (~x∗ − ~x)TA(~x∗ − ~x)

= ~x∗TA~x∗ − ~x∗TA~x− ~xTA~x∗ + ~xTA~x

= ~x∗TA~x∗ − 2~xTA~x∗ + ~xTA~x sinceA = AT

= ~x∗T~b− 2~xT~b+ ~xTA~x

= ~x∗T~b+ 2φ(~x).

Since taking ~x = ~x∗ uniquely minimises the LHS of this equality, ~x∗ is also the

unique minimiser of the RHS, and hence of φ(~x), because ~x∗T~b is independent

of ~x. This shows that φ(~x) has a unique global minimiser, which is the solution

of A~x = ~b.

4.2 The Steepest Descent Method

The first iterative method we consider here for solving A~x = ~b is based on a

basic optimisation method for solving the optimisation problem

min

~x

φ(~x).

Recall that the gradient of φ(~x), ∇φ(~x), points in the direction of steepest

ascent of φ(~x), and is orthogonal to the level surfaces of φ(~x). The direction of

steepest descent is given by −∇φ(~x). In the case of φ(~x) corresponding to the

4.2. The Steepest Descent Method 65

SPD linear system A~x = ~b (Eq. (4.1)), we find

−∇φ(~x) = −(A~x−~b)

= ~r,

where the residual ~r is define as

~r = ~b−A~x.

Here, we have used that

∇(~xTA~x) = A~x+AT~x,

or

∇(~xTA~x) = 2A~x

when A is symmetric.

The steepest descent optimisation method proceeds as follows. Suppose we

are given an initial approximation ~x0. We seek a new, improved approxima-

tion ~x1 by considering φ(~x) along a line in the direction of steepest descent,

−∇φ(~x0) = ~r0, where we define, for approximation ~xi,

~ri = ~b−A~xi.

That is, we determine the next approximation ~x1 of the form

~x1 = ~x0 + α1~r0,

where ~r0 = −∇φ(~x0) is called the search direction.

Considering ~x1(α1) as a function of α1, we determine the optimal step length

α1 from ~x0 along the search direction from the condition

d

dα1

φ(~x1(α1)) = 0,

which leads to

0 =

d

dα1

φ(~x1(α1))

= ∇φ(~x1)T d~x1

dα1

= −~rT1 ~r0.

This has the natural interpretation that the optimal step length is obtained at

the point ~x1 where the line on which we seek the new approximation is tangent

to a level surface, i.e., in the new point ~x1 the new gradient, −~r1 is orthogonal

to the search direction ~r0.

This condition leads to an expression for the optimal step length as follows:

0 = −~rT1 ~r0

= −(~b−A~x1)T~r0

= −(~b−A(~x0 + α1~r0)T~r0

= −(~r0 − α1A~r0)T~r0,

66 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

or

α1 =

~rT0 ~r0

~rT0 A~r0

.

This process is repeated to determine ~x2, ~x3, . . ., until a stopping criterion is

satisfied.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-0.5

0

0.5

kappa:2 steps:13

Figure 4.2.1: Steepest descent convergence pattern for matrix (4.2) with λ = 2

and κ2(A) = 2, from initial guess ~x0 = (−1, 0.5)T .

Algorithm 4.2: Steepest Descent Method for A~x = ~b, A SPD

Input: matrix A ∈ Rn×n, SPD; initial guess ~x0

Output: sequence of approximations ~x1, ~x2, . . .

~r0 = ~b−A~x0

k = 0

repeat

k = k + 1

αk = (~r

T

k−1~rk−1)/(~r

T

k−1A~rk−1)

~xk = ~xk−1 + αk~rk−1

~rk = ~rk−1 − αkA~rk−1

until convergence criterion is satisfied

The cost per iteration of the steepest descent algorithm consists of one matrix-

vector product, two scalar products of vectors in Rn, and two so-called axpy

operations, denoting operations of type a~x+ ~y with vectors in Rn. If A is sparse

with nnz(A) = O(n), then the cost of one steepest descent iteration is O(n).

It can be shown that, if A is SPD, convergence to ~x∗ is guaranteed from any

initial guess. However, convergence can take many iterations, as illustrated in

the following example.

4.2. The Steepest Descent Method 67

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

-0.5

0

0.5

kappa:20 steps:139

Figure 4.2.2: Steepest descent convergence patterns for matrices (4.2) with λ =

20 and λ = 200, from initial guess ~x0 = (−1, 0.05)T and ~x0 = (−1, 0.005)T ,

respectively.

Example 4.3

We consider solving A~x = ~b with SPD matrix

A =

[

1 0

0 λ

]

, (4.2)

where λ > 1, and with ~b = 0, i.e., the solution ~x∗ = (0, 0)T . Since A is SPD,

68 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

we have

κ2(A) =

λmax(A)

λmin(A)

= λ.

We first consider the case that λ = 2, i.e., κ2(A) = 2. Fig. 4.2.1 shows

level curves of φ(~x), which are ellipses aligned with the coordinate axes. The

figure shows the steepest descent convergence pattern starting from initial

guess ~x0 = (−1, 0.5)T . The convergence criterion

‖~ri‖

‖~r0‖ ≤ 10

−6

is satisfied after 13 steps.

However, Fig. 4.2.2 shows that, when increasing λ and κ2(A) to 20 and 200,

the number of iterations grows strongly: as κ2(A) increases, the level curves

become more elongated and, depending on the choice of the initial condition,

this may result in extreme zig-zag patterns. For example, when κ2(A) = 200,

the method requires more than 1,300 iterations. (Note, on the contrary, that

the exact solution is obtained in one step if κ2(A) = 1, in which case the level

curves are circles and the normal from any point is directed exactly towards

to the origin, which is the solution of the problem.)

This example shows that, for the steepest descent method, the number of

iterations required for convergence may increase proportionally to the matrix

condition number, κ. Since in many examples the condition number grows as

a function of problem size, this behaviour is clearly undesirable for the large-

scale problems we target. Therefore, we seek iterative methods with improved

convergence behaviour. The conjugate gradient method of the next section offers

such an improvement.

4.3 The Conjugate Gradient Method

Let A ∈ Rn×n be SPD. The Conjugate Gradient (CG) method for

A~x = ~b

is given by:

4.3. The Conjugate Gradient Method 69

Algorithm 4.4: Conjugate Gradient Method for A~x = ~b, A SPD

Input: matrix A ∈ Rn×n, SPD; initial guess ~x0

Output: sequence of approximations ~x1, ~x2, . . .

1: ~r0 = ~b−A~x0

2: ~p0 = ~r0

3: k = 0

4: repeat

5: k = k + 1

6: αk = (~r

T

k−1~rk−1)/(~p

T

k−1A~pk−1)

7: ~xk = ~xk−1 + αk~pk−1

8: ~rk = ~rk−1 − αkA~pk−1

9: βk = (~r

T

k ~rk)/(~r

T

k−1~rk−1)

10: ~pk = ~rk + βk~pk−1

11: until convergence criterion is satisfied

We first define residual and error, before explaining the context in which the

CG algorithm was derived.

Definition 4.5

Consider iterate ~xk for solving A~x = ~b with exact solution ~x

∗.

The residual of iterate ~xk is given by

~rk = ~b−A~xk.

The error of iterate ~xk is given by

~ek = ~x

∗ − ~xk.

Note that

A~ek = A~x

∗ −A~xk

= ~b−A~xk

= ~rk.

Recall the iteration formula of the steepest descent method,

~xk = ~xk−1 + αk~rk−1

where

~rk−1 = −∇φ(~xk−1) = ~b−A~xk−1

is the direction of steepest descent. We have seen in an example that the steepest

descent direction may not be a suitable direction when the linear system is ill-

conditioned. The CG algorithm aims at making a step in a better direction. It

considers the iteration formula

~xk = ~xk−1 + αk~pk−1, (4.3)

or

~xk = ~xk−1 + ~qk, (4.4)

70 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

with ~qk = αk~pk−1, where ~pk−1 is the step direction and αk is the step length,

which are chosen optimally in the following sense.

Definition 4.6

Let A ∈ Rn×n and ~r0 ∈ Rn. The Krylov space Kk(~r0, A) generated by ~r0

and A is the subspace of Rn defined by

Kk(~r0, A) = span{~r0, A~r0, A2~r0, . . . , Ak−1~r0}.

Considering Eq. (4.4), the CG method determines the vector ~qk in the Krylov

space Kk(~r0, A) such that the error ~ek is minimised in the A-norm.

From Eq. (4.4) we have

~x∗ − ~xk = ~x∗ − ~xk−1 − ~qk

or ~ek = ~ek−1 − ~qk,

so CG chooses ~qk in Kk(~r0, A) such that

‖~ek‖A = ‖~ek−1 − ~qk‖A

is minimal over all vectors ~q in Kk(~r0, A). In the next section we show that

Algorithm 4.4 achieves this goal. This optimality leads to desirable convergence

properties for broad classes of problems, significantly improving over steepest

descent.

Note also that the cost per iteration of the CG algorithm is not much larger

than the cost of steepest descent: CG requires one matrix-vector product, two

scalar products, and three axpy operations per iteration.

4.4 Properties of the Conjugate Gradient Method

In this section we will show and discuss some properties of the CG algorithm.

To make the proofs somewhat easier, we consider, without loss of generality,

the case where we solve

A~x = ~b

with initial guess

~x0 = 0.

This is no restriction, because it is equivalent to applying CG to

A(~x− ~x0) = ~b−A~x0 or A~y = ~c.

when ~x0 6= 0. Note that the residuals for A~y = ~c are the same as for A~x = ~b,

~rk = ~c−A~yk = (~b−A~x0)−A(~xk − ~x0) = ~b−A~xk,

which implies that the step directions and α and β parameters in the CG algo-

rithm also don’t change.

4.4.1 Orthogonality Properties of Residuals and Step Directions

An important property of CG is that the step directions ~pi are mutually A-

orthogonal or A-conjugate, from which the method derives its name.

4.4. Properties of the Conjugate Gradient Method 71

Definition 4.7

Let A ∈ Rn×n be and SPD matrix. Then vectors ~pi and ~pj ∈ Rn are called

A-orthogonal or A-conjugate if

~pTi A~pj = 0.

A-orthogonality of the step directions in Algorithm 4.4 is proven as part of

the following theorem.

Theorem 4.8

Let ~x0 = 0 in the CG algorithm (Algorithm 4.4). As long as convergence has

not been reached before iteration k (~rk−1 6= 0), there are no divisions by 0, and

the following hold:

(A) Let

Xk = span{~x1, . . . , ~xk},

Pk = span{~p0, . . . , ~pk−1},

Rk = span{~r0, . . . , ~rk−1},

Kk = span{~r0, A~r0, A2~r0, . . . , Ak−1~r0} = Kk(~r0, A).

Then

Xk = Pk = Rk = Kk.

(B) The residuals are mutually orthogonal:

~rTk ~rj = 0 (j < k).

(C) The step directions are mutually A-orthogonal:

~pTkA~pj = 0 (j < k).

Proof. The proof is by induction on k. The details are quite involved; we

provide a sketch of the proof.

(A) Assume Xk−1 = Pk−1 = Rk−1 = Kk−1.

Line 7 in Algorithm 4.4 (l7), ~xk = ~xk−1 + αk~pk−1, shows that Xk = Pk.

And (l10), ~pk = ~rk + βk~pk−1, shows that Pk = Rk.

Finally, (l8), ~rk = ~rk−1 − αkA~pk−1, shows that Rk = Kk.

(B) Multiplying (l8), ~rk = ~rk−1 − αkA~pk−1, with ~rj on the right, we get

~rTk ~rj = ~r

T

k−1~rj − αk~pTk−1A~rj .

Case j < k − 1:

~rTk ~rj = 0,

since, by the induction hypothesis, ~rTk−1~rj = 0, and ~p

T

k−1A~rj = 0 since

~rj ∈ Pk−1.

72 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

Case j = k − 1:

~rTk ~rk−1 = 0,

if

αk = (~r

T

k−1~rk−1)/(~p

T

k−1A~rk−1).

However, this is equivalent to (l6):

αk =

~rTk−1~rk−1

~pTk−1A~rk−1

=

~rTk−1~rk−1

~pTk−1A(~pk−1 − βk−1~pk−2)

(by (l10))

=

~rTk−1~rk−1

~pTk−1A~pk−1

(by A-orthogonality).

(C) Multiplying (l10), ~pk = ~rk + βk~pk−1, with A~pj on the right, we get

~pTkA~pj = ~r

T

k A~pj + βk~p

T

k−1A~pj .

Case j < k − 1:

~pTkA~pj = 0,

since, by the induction hypothesis, ~rTk A~pj = ~r

T

k (~rj − ~rj+1)/αj+1 = 0

(using (l8)), and ~pTk−1A~pj = 0.

Case j = k − 1:

~pTkA~pk−1 = 0,

if

βk = −(~rTk A~pk−1)/(~pTk−1A~pk−1).

However, this is equivalent to (l9):

βk =

−(~rTk A~pk−1)

~pTk−1A~pk−1

=

−(~rTk A~pk−1)

~pTk−1A~pk−1

αk

αk

=

~rTk (−αkA~pk−1)

~rTk−1~rk−1

(by (l6))

=

~rTk (~rk − ~rk−1)

~rTk−1~rk−1

(by (l8))

=

~rTk ~rk

~rTk−1~rk−1

(by residual orthogonality)

4.4. Properties of the Conjugate Gradient Method 73

Some additional comments can be made about the residual orthogonality,

~rTk ~rj = 0 (j < k). (4.5)

• Condition (4.5) implies that, for consecutive residuals,

~rTk ~rk−1 = 0,

as in the steepest descent method. However, condition (4.5) implies that

~rk is orthogonal to all previous residuals, which is clearly a much stronger

property than for steepest descent. In fact, this implies finite termina-

tion in at most n steps: since the ~ri are mutually orthogonal vectors in

Rn, and there can be at most n nonzero orthogonal residual vectors in Rn,

we have ~rn = 0. So we have proved the following theorem:

Theorem 4.9

The CG algorithm converges to the exact solution in at most n steps (in

exact arithmetic).

This property may indicate that we can consider CG as a direct method,

but in practice it is used as an iterative method, because in many practical

cases it attains an accurate approximation in much fewer than n steps.

Figure 4.4.1 compares the performance of the CG and steepest descent

methods for the 2D Laplacian matrix.

• It can be shown that, in the update

~xk = ~xk−1 + αk~pk−1,

CG chooses the optimal step length along direction ~pk−1, as in steepest

descent:

d

dαk

φ(~xk(αk)) = 0.

It is easy to show that this requires step length

αk =

~rTk−1~pk−1

~pTk−1A~pk−1

,

which can be shown to be equivalent to (l6) in Algorithm 4.4.

4.4.2 Optimal Error Reduction in the A-Norm

Theorem 4.10

Let ~x0 = 0 in the CG algorithm (Algorithm 4.4). As long as convergence has

not been reached before iteration k (~rk−1 6= 0), the iterate ~xk minimises

‖~ek‖A = ‖~x∗ − ~xk‖A

over the Krylov space Kk(~r0, A).

74 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

100 200 300 400 500

iterations

-6

-5

-4

-3

-2

-1

0

1

lo

g1

0(r

es

idu

al)

cg

steepest descent

Figure 4.4.1: Comparison of steepest descent and CG convergence histories for

the 2D Laplacian with N = 32 and n = 1024, with RHS a vector of all-ones, and

zero initial guess. The condition number κ2(A) ≈ 440.

Proof. We know that ~xk ∈ Kk(~r0, A).

Consider any vector ~y ∈ Kk(~r0, A) that is different from ~x, i.e.,

~y = ~xk + ~z

for some ~z ∈ Kk(~r0, A), ~z 6= 0. Then

~e~y = ~x

∗ − ~y

= ~x∗ − ~xk − ~z

= ~ek − ~z.

We have

‖~e~y‖A = (~ek − ~z)TA(~ek − ~z)

= ~eTkA~ek − ~eTkA~z − ~zTA~ek + ~zTA~z

= ~eTkA~ek − 2~zTA~ek + ~zTA~z (since A = AT )

= ~eTkA~ek − 2~zT~rk + ~zTA~z (since A~ek = ~rk)

= ‖~ek‖A + ~zTA~z (since ~z ∈ Kk(~r0, A) = span{~r0, . . . , ~rk−1}),

so

‖~e~y‖A ≥ ‖~ek‖A,

since A is SPD.

4.4. Properties of the Conjugate Gradient Method 75

Note: this theorem implies that

‖~ek‖A ≤ ‖~ek−1‖A,

since Kk−1 ⊂ Kk. We say that convergence in the A-norm is monotone.

4.4.3 Convergence Speed

The following theorems can be proved about the convergence speed of the steep-

est descent and CG methods.

Theorem 4.11

Let A ∈ Rn×n be SPD. Let κ be the 2-norm condition number of A, κ = κ2(A).

Then the errors of the iterates in the steepest descent method satisfy

‖~ek‖A

‖~e0‖A ≤

(

κ− 1

κ+ 1

)k

.

Theorem 4.12

Let A ∈ Rn×n be SPD. Let κ be the 2-norm condition number of A, κ = κ2(A).

Then the errors of the iterates in the CG method satisfy

‖~ek‖A

‖~e0‖A ≤ 2

(√

κ− 1√

κ+ 1

)k

.

It can be shown that, for large κ, this leads to the following estimates for the

number of iterations k required to converge to

‖~ek‖A

‖~e0‖A ≈

with a fixed small :

• steepest descent:

k = O(κ).

• CG:

k = O(

√

κ).

Example 4.13: Condition Number of 1D Laplacian

Consider the 1D Laplacian matrix

A =

−2 1

1 −2 1

1 −2 1

. . .

. . .

. . .

1 −2 1

1 −2

∈ Rn×n.

76 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

It can be shown that the n eigenvalues of this matrix are given by

λk = 2 (cos(kpih)− 1) (k = 1, . . . , n)

= −4 sin2(kpih/2)

= −4 sin2

(

kpi

2(n+ 1)

)

,

where h = 1/(n + 1). Since A is symmetric and all eigenvalues are strictly

negative, this matrix is symmetric negative definite. (I.e, −A is SPD.)

Since A is symmetric,

κ2(A) =

|λ|max(A)

|λ|min(A) .

It is easy to show that

κ2(A) = O

(

1

h2

)

= O(n2).

This means that linear systems with A become increasingly harder to solve for

iterative methods when n grows:

• Steepest descent takes O(κ) = O(n2) iterations, and since the cost per

iteration is O(n), the total cost is O(n3).

• CG takes O(√κ) = O(n) iterations, so the total cost is O(n2).

Example 4.14: Condition Number of 2D Laplacian

Consider the 2D Laplacian matrix

A =

T I 0 . . . 0

I T I 0 . . . 0

0 I T I 0 . . . 0

...

. . .

. . .

. . .

...

0 . . . 0 I T I 0

0 . . . 0 I T I

0 . . . 0 I T

∈ Rn×n,

where n = N2 and T and I are block matrices ∈ RN×N (T is tridiagonal with

elements 1, -4, 1 on the three diagonals). (Here, N is the number of interior

points in the x and y directions, and h = 1/(N + 1). Note that A does not

include the 1/h2 factor.)

It can be shown that the n = N2 eigenvalues of this matrix are given by the

N2 possible sums of the N eigenvalues of the 1D Laplacian matrices in the x

and y directions:

λk,l = 2 (cos(kpih)− 1 + cos(lpih)− 1) (k = 1, . . . , N ; l = 1, . . . , N)

= −4(sin2(kpih/2) + sin2(lpih/2))

= −4

(

sin2

(

kpi

2(N + 1)

)

+ sin2

(

lpi

2(N + 1)

))

,

4.5. Preconditioning for the Conjugate Gradient Method 77

where h = 1/(N + 1). Since A is symmetric and all eigenvalues are strictly

negative, this matrix is symmetric negative definite. (I.e, −A is SPD.)

It is easy to show that

κ2(A) = O

(

1

h2

)

= O(N2) = O(n).

This means that linear systems with A become increasingly harder to solve for

iterative methods when n grows:

• Steepest descent takes O(κ) = O(n) iterations, and since the cost per

iteration is O(n), the total cost is O(n2).

• CG takes O(√κ) = O(√n) iterations, so the total cost is O(n3/2).

4.5 Preconditioning for the Conjugate Gradient Method

4.5.1 Preconditioning for Solving Linear Systems

We saw in the previous section that the number of iterations, k, required to

reach a specific tolerance when solving a linear system

A~x = ~b

using CG, satisfies

k = O(

√

κ(A)).

If A is ill-conditioned, this may lead to large numbers of iterations, which can be

especially undesirable when the condition number grows as a function of problem

size, like for our 2D model problem. The idea of preconditioning the linear

system aims at reducing the number of iterations an iterative method requires

for convergence by reformulating the linear system as an equivalent problem that

has the same solution, but features a matrix with a smaller condition number.

The first approach is the idea of left preconditioning: multiply A~x = ~b

on the left with a nonsingular matrix P ∈ Rn×n to obtain the equivalent linear

system

PA~x = P~b,

where the preconditioning matrix (or preconditioner) P is chosen such that

κ(PA) κ(A),

perhaps by choosing P such that

P ≈ A−1.

Such a choice may reduce the condition number and the number of iterations sub-

stantially, and will not increase the cost per iteration too much if P is a cheaply

computable approximation of A−1. More broadly, the convergence speed of it-

erative methods for general linear systems A~x = ~b, where A is not necessarily

SPD, usually depends on the eigenvalue distribution of the matrix – e.g., the

clustering of eigenvalues – and its condition number, and the goal of precondi-

tioning is to improve the eigenvalue distribution of PA such that the iterative

method converges faster than for A.

78 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

An alternative approach to left preconditioning is the idea of right precon-

ditioning with a nonsingular matrix P ∈ Rn×n, which reformulates the system

as

APP−1~x = ~b,

and solves

AP~y = ~b

using the iterative method, where ~x is obtained at the end from

P~y = ~x.

4.5.2 Left Preconditioning for CG

The general matrix preconditioning strategies described above are, however, not

directly applicable to CG, because CG requires the system matrix to be SPD,

and, with the original A being SPD, PA or AP are generally not symmetric.

However, preconditioning can be applied to CG as follows.

When applying preconditioning to linear system

A~x = ~b

with A ∈ Rn×n an SPD matrix, we choose a preconditioning matrix P that is

SPD. The matrix P can always be written as the product XXT , where X is a

nonsingular matrix in Rn×n:

P = V ΛV T

= V

√

Λ

√

ΛV T

= (V

√

Λ)(V

√

Λ)T

= XXT ,

where V ∈ Rn×n contains n orthonormal eigenvectors of A, and Λ ∈ Rn×n is

a diagonal matrix containing the corresponding eigenvalues, which are strictly

positive.

The we can, using a change of variables, reformulate the left-preconditioned

linear system as an equivalent system with an SPD matrix as follows:

PA~x = P~b

XXTA~x = XXT~b

XTAXX−1~x = XT~b

(XTAX)~y = XT~b,

B~y = ~c,

where

B = XTAX

~c = XT~b

~y = X−1~x.

The following result shows that B is SPD, so we can apply CG to B~y = ~c.

4.5. Preconditioning for the Conjugate Gradient Method 79

Theorem 4.15

Let A ∈ Rn×n be SPD and X ∈ Rn×n be nonsingular. Then B = XTAX is

SPD.

Proof. B is symmetric since A is symmetric. Moreover, for any ~x 6= 0,

~xT (XTAX)~x = (X~x)TA(X~x) > 0

since A is SPD and X~x 6= 0 because X is nonsingular.

Also, B has the same eigenvalues as PA, so the eigenvalues of PA determine

the 2-condition number of B, and hence, the speed of convergence of CG applied

to B~y = ~c.

Theorem 4.16

Let A ∈ Rn×n be SPD and X ∈ Rn×n be nonsingular, with P = XXT . Then

B = XTAX has the same eigenvalues as PA.

Proof. This follows because B is similar to PA:

B = XTAX = (X−1X)XTAX = X−1(PA)X,

which implies that B and PA have the same eigenvalues.

4.5.3 Preconditioned CG (PCG) Algorithm

Applying CG to

B~y = ~c

results in the following algorithm, where we use notation with a hat for the resid-

uals ~̂rk and search directions ~̂p0 associated with formulating the CG algorithm

for computing ~y, rather than ~x.

80 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

Algorithm 4.17: Preconditioned CG (PCG) Method – Version 1

Input: SPD matrix A, RHS ~b; initial guess ~x0; SPD preconditioner P = XX

T

Output: approximation ~xk after stopping criterion is satisfied

1: B = XTAX

2: ~c = XT~b . we will apply CG to B~y = ~c

3: ~y0 = X

−1~x0

4: ~̂r0 = ~c−B~y0

5: ~̂p0 = ~̂r0

6: k = 0

7: repeat

8: k = k + 1

9: αk = (~̂r

T

k−1~̂rk−1)/(~̂p

T

k−1B~̂pk−1)

10: ~yk = ~yk−1 + αk~̂pk−1

11: ~̂rk = ~̂rk−1 − αkB~̂pk−1

12: βk = (~̂r

T

k ~̂rk)/(~̂r

T

k−1~̂rk−1)

13: ~̂pk = ~̂rk + βk~̂pk−1

14: until stopping criterion is satisfied

15: ~xk = X~yk

It turns out, however, that the PCG algorithm can be reformulated in terms

of the original ~x variable, in a way that involves only the P and A matrices,

without explicit need for the X and XT factors.

This proceeds as follows.

We first multiply (l10) in Algorithm 4.17 by X from the left to convert from

~y to ~x = X~y:

X~yk = X~yk−1 + αkX~̂pk−1

~xk = ~xk−1 + αk~pk−1,

where we have defined the search direction for ~xk, ~pk−1, by

~pk−1 = X~̂pk−1.

Next we observe that residuals for ~y and ~x are related by

~̂r = ~c−B~y

= XT~b−XTAX~y

= XT (~b−A~x)

= XT~r,

which we use to transform (l11) to

~̂rk = ~̂rk−1 − αkB~̂pk−1

XT~rk = X

T~rk−1 − αkXTAX~̂pk−1

~rk = ~rk−1 − αkA~pk−1.

4.5. Preconditioning for the Conjugate Gradient Method 81

Then we multiply (l13) by X from the left to convert from ~̂p to ~p:

X~̂pk = X~̂rk + βkX~̂pk−1

~pk = XX

T~rk + βk~pk−1

~pk = P~rk + βk~pk−1,

where we have used that P = XXT .

Finally, to convert the scalar products in αk and βk to use ~rk and ~pk, we

write

~̂r

T

k ~̂rk = (X

T~rk)

T (XT~rk)

= ~rTkXX

T~rk

= ~rTk P~rk,

and

~̂p

T

k−1B~̂pk−1 = (X

−1~pk−1)TXTAX(X−1~pk−1)

= ~pTk−1X

−TXTAXX−1~pk−1

= ~pTk−1A~pk−1,

resulting in

αk =

~rTk−1P~rk−1

~pTk−1A~pk−1

,

βk =

~rTk P~rk

~rTk−1P~rk−1

.

This gives the second version of the PCG algorithm:

Algorithm 4.18: PCG Method – Version 2

Input: SPD matrix A, RHS ~b; initial guess ~x0; SPD preconditioner P

Output: sequence of approximations ~x1, ~x2, . . .

1: ~r0 = ~b−A~x0

2: ~p0 = P~r0

3: k = 0

4: repeat

5: k = k + 1

6: αk = (~r

T

k−1P~rk−1)/(~p

T

k−1A~pk−1)

7: ~xk = ~xk−1 + αk~pk−1

8: ~rk = ~rk−1 − αkA~pk−1

9: βk = (~r

T

k P~rk)/(~r

T

k−1P~rk−1)

10: ~pk = P~rk + βk~pk−1

11: until stopping criterion is satisfied

In practice, multiplication of a residual ~r by P to obtain a preconditioned

residual ~q = P~r usually involves solving a linear system: since P ≈ A−1, we

normally know the sparse matrix P−1 ≈ A, and we solve

P−1~q = ~r

82 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

for ~q. This step needs to be performed only once per iteration, and it is worth-

while to rewrite the algorithm once more to indicate this explicitly:

Algorithm 4.19: PCG Method – Version 3

Input: SPD matrix A, RHS ~b; initial guess ~x0; SPD preconditioner P

Output: sequence of approximations ~x1, ~x2, . . .

1: ~r0 = ~b−A~x0

2: solve P−1~q0 = ~r0 for ~q0 (the preconditioned residual)

3: ~p0 = ~q0

4: k = 0

5: repeat

6: k = k + 1

7: αk = (~r

T

k−1~qk−1)/(~p

T

k−1A~pk−1)

8: ~xk = ~xk−1 + αk~pk−1

9: ~rk = ~rk−1 − αkA~pk−1

10: solve P−1~qk = ~rk for ~qk (the preconditioned residual)

11: βk = (~r

T

k ~qk)/(~r

T

k−1~qk−1)

12: ~pk = ~qk + βk~pk−1

13: until stopping criterion is satisfied

4.5.4 Preconditioners for PCG

We now briefly describe some standard preconditioners that are often used when

solving linear systems

A~x = ~b.

We begin by writing A as a sum of its diagonal part and its strictly lower and

upper triangular part,

A = AD −AL −AU ,

where the convention of using negative signs for the triangular parts stems from

SPD matrices with positive diagonal elements and negative off-diagonal elements

that arise in the context of certain PDE problems (e.g., −A for our 2D Lapla-

cian).

Example 4.20

The following standard preconditioning matrices are often used as cheaply

computable approximations of A−1, where we assume that the matrix inverses

in the expressions exist:

1. Jacobi:

P = A−1D

2. Gauss-Seidel (GS):

P = (AD −AL)−1

3. Symmetric Gauss-Seidel (SGS):

P = (AD −AU )−1AD(AD −AL)−1

4. Successive Over-Relaxation (SOR):

P = ω(AD − ωAL)−1 (ω ∈ (0, 2))

4.5. Preconditioning for the Conjugate Gradient Method 83

5. Symmetric Successive Over-Relaxation (SSOR):

P = ω(2− ω)(AD − ωAU )−1AD(AD − ωAL)−1 (ω ∈ (0, 2))

A few notes:

• 1., 3. and 5. give symmetric preconditioners P when A is symmetric (i.e.,

ATL = AU ), and they are the only ones that can be used with CG.

• Preconditioners 2.-5. contain (a sequence of) triangular matrices, which

can be inverted inexpensively by forward or backward substitution. If A is

sparse with nnz(A) = O(n), the cost of applying these preconditioners is

O(n), so preconditioning does not increase the computational complexity

per iteration beyond O(n). It may substantially reduce the number of

iterations required for convergence, and hence, may lead to faster overall

solve times and better scalability for large problems.

4.5.5 Using Preconditioners as Stand-Alone Iterative Methods

The preconditioning matrices presented in the previous section can also be used

as iterative methods by themselves, as we now explain. When solving

A~x = ~b

with exact solution ~x∗ and residual and error

~r = ~b−A~x,

~e = ~x∗ − ~x,

satisfying

A~e = ~r,

we start from the identity

~x∗ = ~x+ ~e = ~x+A−1~r.

We obtain a stationary iterative method by considering an easily computable

approximate inverse P of A,

P ≈ A−1,

and writing

~xk+1 = ~xk + P~rk, (4.6)

where

~rk = ~b−A~xk.

We easily derive the error propagation equation

~x∗ − ~xk+1 = ~x∗ − ~xk − P~rk,

~ek+1 = ~ek − PA~ek,

~ek+1 = (I − PA)~ek.

It can be shown that the iteration converges for any initial guess ~x0 when

‖I − PA‖p < 1

in some p-norm.

84 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

Example 4.21

The Gauss-Seidel (GS) iterative method for A~x = ~b with A ∈ Rn×n computes

a new approximation ~xnew from a previous iterate ~xold by (considering a simple

3× 3 example)

a11x

new

1 + a12x

old

2 + a13x

old

3 = b1

a21x

new

1 + a22x

new

2 + a23x

old

3 = b2

a31x

new

1 + a32x

new

2 + a33x

new

3 = b3.

(4.7)

Rearranging (for the general n × n case) yields the defining equation for the

Gauss-Seidel method:

xnewi =

1

aii

bi − i−1∑

j=1

aijx

new

j −

n∑

j=i+1

aijx

old

j

. (4.8)

Using

A = AD −AL −AU ,

we can derive a matrix expression for this method by

A~x = ~b

(AD −AL −AU )~x = ~b

(AD −AL)~xk+1 = AU~xk +~b

~xk+1 = (AD −AL)−1((AD −AL −A)~xk +~b)

= ~xk + (AD −AL)−1(~b−A~xk)

= ~xk + (AD −AL)−1~rk.

Comparing with general update formula (4.6), we identify the preconditioning

matrix P for GS as

P = (AD −AL)−1.

A few notes:

• The Jacobi iteration is defined by

xnewi =

1

aii

bi − i−1∑

j=1

aijx

old

j −

n∑

j=i+1

aijx

old

j

. (4.9)

• The preconditioning matrix for Symmetric Gauss-Seidel (SGS) can be de-

rived by concatenating a forward and a backward Gauss-Seidel sweep:

~xk+1/2 = ~xk + (AD −AL)−1~rk,

~xk+1 = ~xk+1/2 + (AD −AU )−1~rk+1/2.

• The Successive Over-Relaxation (SOR) method for A~x = ~b is an iterative

method in which for every component a linear combination is taken of a

4.5. Preconditioning for the Conjugate Gradient Method 85

Gauss-Seidel-like update and the old value:

xnewi = (1− ω)xoldi + ω

1

aii

bi − i−1∑

j=1

aij x

new

j −

n∑

j=i+1

aij x

old

j

,

with ω a fixed weight. Symmetric Successive Over-Relaxation (SSOR) is

obtained from combining a forward and a backward SOR sweep.

86 Chapter 4. The Conjugate Gradient Method for Sparse SPD Systems

Chapter 5

The GMRES Method for

Sparse Nonsymmetric

Systems

5.1 Minimising the Residual

In this chapter, we consider the generalised minimal residual (GMRES) iterative

method for solving linear systems

A~x = ~b

with A a nonsingular matrix ∈ Rn×n.

We recall that the CG method for linear systems with A SPD seeks the

optimal update in the Krylov space generated by A and the first residual, ~r0:

Ki+1(~r0, A) = span{~r0, A~r0, A2~r0, . . . , Ai~r0}.

CG considers the update formula

~xi+1 = ~xi + αi+1~pi, (5.1)

or

~xi+1 = ~xi + ~zi, (5.2)

and the update ~zi is determined in the Krylov space Ki+1(~r0, A) such that the

error ~ei+1 is minimised in the A-norm, i.e., with

~ei+1 = ~ei − ~zi,

each step of CG minimises the A-norm of the error:

min

~zi∈Ki+1(~r0,A)

‖~ei+1‖A.

Note that error minimisation in the A-norm is only possible when A is SPD.

GMRES, on the contrary, is intended for linear systems with generic nonsin-

gular matrices A, non necessarily symmetric, and also considers optimal updates

in the same Krylov space as CG, Ki+1(~r0, A) = span{~r0, A~r0, A2~r0, . . . , Ai~r0}.

It seeks ~zi in Ki+1(~r0, A) such that

~xi+1 = ~x0 + ~zi,

87

88 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems

with residual

~ri+1 = ~r0 −A~zi,

is minimal in the 2-norm:

min

~zi∈Ki+1(~r0,A)

‖~ri+1‖.

This minimisation of the residual in the 2-norm is more general than minimi-

sation of the error in the A-norm, because it can be done for any matrix A.

The resulting formulas are somewhat less economical than CG for SPD A, but

GMRES is a very powerful approach for general linear systems.

The GMRES method proceeds as follows: GMRES computes an orthonormal

basis {~q0, . . . , ~qi} for Ki+1(~r0, A),

Qi+1 =

~q0 ~q1 . . . ~qi

,

and it does so in an incremental way, computing an additional orthonormal vector

~qi for every iteration. The matrix Qi+1, with the orthonormal basis vectors as

its columns, satisfies

QTi+1Qi+1 = Ii+1.

GMRES chooses the update ~zi ∈ Ki+1(~r0, A), which can be represented as

~zi = Qi+1~y

for some ~y ∈ Ri+1. GMRES finds the optimal ~y ∈ Ri+1 in the expression

~xi+1 = ~x0 +Qi+1~y,

that minimises ~ri+1 in the 2-norm. Note that all vectors norms in this chapter

denote vector 2-norms.

5.2 Arnoldi Orthogonalisation Procedure

GMRES generates an orthonormal basis for the Krylov space

Ki+1(~r0, A) = span{~r0, A~r0, A2~r0, . . . , Ai~r0}.

by setting

~q0 = ~r0/‖~r0‖

and applying modified Gram-Schmidt to orthogonalise the vectors

{~q0, A~q0, A~q1, . . . , A~qi−1}.

Gram-Schmidt generates a new vector ~vm+1 orthogonal to the previous {~q0, ~q1, . . . , ~qm}

by subtracting from A~qm the components in the directions of the previous ~qj :

~vm+1 = A~qm − h0,m~q0 − h1,m~q1 − . . .− hm,m~qm,

where the projection coefficients hj,m are determined in the standard way. The

new orthonormal vector ~qm+1 is then determined by normalising ~vm+1:

~qm+1 = ~vm+1/hm+1,m

5.2. Arnoldi Orthogonalisation Procedure 89

where hm+1,m = ‖~vm+1‖.

So the basis vectors {~q0, ~q1, . . . , ~qm, ~qm+1} satisfy

hm+1,m~qm+1 = A~qm − h0,m~q0 − h1,m~q1 − . . .− hm,m~qm.

This procedure to generate an orthonormal basis of the Krylov space is called

the Arnoldi procedure. It can easily be shows that the set of Arnoldi vectors

generated by the procedure is a basis for span{~r0, A~r0, A2~r0, . . . , Ai~r0}:

Theorem 5.1

Let {~q0, . . . , ~qi} be the vectors generated by the Arnoldi procedure. Then

span{~q0, . . . , ~qi} = span{~r0, A~r0, A2~r0, . . . , Ai~r0}.

Proof. (sketch) This follows from a simple induction argument based on

hm+1,m~qm+1 = A~qm − h0,m~q0 − h1,m~q1 − . . .− hm,m~qm.

The Arnoldi procedure is given by:

Algorithm 5.2: Arnoldi Procedure for an Orthonormal Basis of Ki+1(~r0, A)

Input: matrix A ∈ Rn×n; vector ~r0

Output: vectors ~q0, . . . , ~qi that form an orthonormal basis of Ki+1(~r0, A)

ρ = ‖~r0‖

~q0 = ~r0/ρ

for m = 0 : i− 1 do

~v = A~qm

for j = 0 : m do

hj,m = ~q

T

j ~v

~v = ~v − hj,m~qj

end for

hm+1,m = ‖~v‖

~qm+1 = ~v/hm+1,m

end for

The vectors and coefficients computed in during the Arnoldi procedure can

90 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems

be written in matrix form as:

A

~q0 ~q1 . . . ~qi

=

~q0 ~q1 . . . ~qi ~qi+1

h0,0 h0,1 h0,2 . . . h0,i

h1,0 h1,1 h1,2

h2,1 h2,2

. . .

...

h3,2

. . .

. . .

. . .

0 hi,i−1 hi,i

hi+1,i

,

or

AQi+1 = Qi+2H˜i+1.

Note that Qi+1 ∈ Rn×(i+1), Qi+2 ∈ Rn×(i+2), and H˜i+1 ∈ R(i+2)×(i+1). GMRES

uses this relation to minimise ‖~ri+1‖ over the Krylov space in an efficient manner,

as explained in the next section.

Note also that, when i+ 1 = n, the process terminates with hn,n−1 = ‖~vn‖ =

0, because there cannot be more than n orthogonal vectors in Rn. At this point,

we obtain

AQ = QH,

where Q ∈ Rn×n is orhtogonal and H ∈ Rn×n is a square matrix with zeros

below the first subdiagonal:

H =

h0,0 h0,1 h0,2 . . . h0,n−1

h1,0 h1,1 h1,2

h2,1 h2,2

. . .

...

h3,2

. . .

. . .

. . .

. . .

0 hn−2,n−3

. . .

hn−1,n−2 hn−1,n−1

.

This type of matrix is called an (upper) Hessenberg matrix:

Definition 5.3

Let H ∈ Rn×n. Then H is called an (upper) Hessenberg matrix if

hij = 0 for j ≤ i− 2.

This provides an orthogonal decomposition of A into Hessenberg form:

QTAQ = H.

5.3 GMRES Algorithm

GMRES uses the relation

AQi+1 = Qi+2H˜i+1, (5.3)

5.3. GMRES Algorithm 91

as obtained from the Arnoldi procedure, to minimise ‖~ri+1‖ over the Krylov

space in an efficient manner.

Since the columns of Qi+1 form an orthonormal basis for Ki+1(~r0, A), GM-

RES chooses the optimal ~y ∈ Ri+1 in

~xi+1 = ~x0 +Qi+1~y,

that minimises ~ri+1 in the 2-norm.

Note that, in Eq. (5.3), Qi+1 ∈ Rn×(i+1) and Qi+2 ∈ Rn×(i+2). Since n is

typically large (millions or billions) and i is small (perhaps 20-30 or so), these

matrices have many rows, so we will seek to avoid computing with them directly.

On the contrary, H˜i+1 ∈ R(i+2)×(i+1) is a small matrix, and we will exploit this

as follows.

Using Eq. (5.3), we write

‖~ri+1‖ = ‖~r0 −AQi+1~y‖

= ‖~r0 −Qi+2H˜i+1~y‖.

We know that ~q0 = ~r0/‖~r0‖ forms the first column of Qi+2, so we can write

~r0 = ‖~r0‖Qi+2~e1,

where ~e1 ∈ Ri+2 is the first canonical basis vector, ~e1 = (1, 0, . . . , 0)T . Therefore,

‖~ri+1‖2 =

∥∥∥Qi+2 (‖~r0‖~e1 − H˜i+1~y)∥∥∥2

=

(

Qi+2

(

‖~r0‖~e1 − H˜i+1~y

))T (

Qi+2

(

‖~r0‖~e1 − H˜i+1~y

))

=

(

‖~r0‖~e1 − H˜i+1~y

)T

QTi+2Qi+2

(

‖~r0‖~e1 − H˜i+1~y

)

=

(

‖~r0‖~e1 − H˜i+1~y

)T (

‖~r0‖~e1 − H˜i+1~y

)

=

∥∥∥‖~r0‖~e1 − H˜i+1~y∥∥∥2 .

Minimising ‖~ri+1‖ over ~y ∈ Ri+1 then boils down to solving a small least-squares

problem with an overdetermined matrix H˜i+1 ∈ R(i+2)×(i+1). For example, the

normal equations for this problem are given by

H˜Ti+1H˜i+1~y = ‖~r0‖H˜Ti+1~e1.

We find ~xi+1 from

~xi+1 = ~x0 +Qi+1~y.

In accurate implementations, the least-squares problem is solved using QR

decomposition. As i grows, the QR decomposition does not need to be recom-

puted for every new i, but can be updated cheaply as explained in [Saad, 2003].

Also, ~xi+1, or even ~y, does not need to be computed in every iteration. Since

the least-squares problem grows, it is common to restart the algorithm every 20

or so iterations.

The GMRES method for

A~x = ~b

92 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems

is given by:

Algorithm 5.4: GMRES Method for A~x = ~b

Input: matrix A ∈ Rn×n; initial guess ~x0

Output: sequence of approximations ~x1, ~x2, . . .

1: ~r0 = ~b−A~x0

2: ρ = ‖~r0‖

3: ~q0 = ~r0/ρ

4: m = 0

5: repeat

6: ~v = A~qm

7: for j = 0 : m do

8: hj,m = ~q

T

j ~v

9: ~v = ~v − hj,m~qj

10: end for

11: hm+1,m = ‖~v‖

12: ~qm+1 = ~v/hm+1,m

13: find ~y that minimises ‖ρ~e1 − H˜m+1~y‖

14: ~xm+1 = ~x0 +Qm+1~y

15: ‖~rm+1‖ = ‖ρ~e1 − H˜m+1~y‖

16: m = m+ 1

17: until convergence criterion is satisfied

5.4 Convergence Properties of GMRES

The following convergence result can be proved for the case that A is diagonal-

isable,

A = V ΛV −1.

(Note that eigenvalues of A ∈ Rn×n may be complex.)

Theorem 5.5

Let A ∈ Rn×n, nonsingular, be diagonalisable, A = V ΛV −1. Then the residu-

als generated in the GMRES method satisfy

‖~ri‖

‖~r0‖ ≤ κ2(V ) minpi(x)∈Pi maxΣ(A) |pi(λ)|.

Here, pi(x) is a polynomial of degree at most i in Pi, the set of polynomials of

degree at most i which satisfy pi(0) = 1. Σ(A) is the eigenvalue spectrum of

A, i.e., the set of eigenvalues of A.

This theorem indicates that the convergence behaviour depends on the con-

dition number of the matrix of eigenvectors of A, and on the distribution of the

eigenvalues of A in the complex plane. E.g., clustered spectra tend to lead to

fast convergence, since a low-degree polynomial can then typically be found that

is small on the whole spectrum. Since GMRES updates can be written in terms

of polynomials of A multiplying ~r0, GMRES can be interpreted as seeking the

optimal polynomial in Pi, which is used in the proof of this theorem.

5.5. Preconditioned GMRES 93

5.5 Preconditioned GMRES

Left preconditioning for GMRES proceeds by considering

PA~x = P~b

with, e.g.,

P ≈ A−1.

Alternatively, right preconditioning for GMRES proceeds by considering

APP−1~x = ~b

or

AP~z = ~b,

P−1~x = ~z.

The two variants perform similarly, but right preconditioning is sometimes pre-

ferred because it works with the original residual:

~r0 = ~b−AP~z0 = ~b−APP−1~x0 = ~b−A~x0.

This is right-preconditioned GMRES:

Algorithm 5.6: Right-Preconditioned GMRES Method for A~x = ~b

Input: matrix A ∈ Rn×n; initial guess ~x0; preconditioner P ≈ A−1

Output: sequence of approximations ~x1, ~x2, . . .

1: ~r0 = ~b−A~x0

2: ρ = ‖~r0‖

3: ~q0 = ~r0/ρ

4: m = 0

5: repeat

6: ~v = AP~qm

7: for j = 0 : m do

8: hj,m = ~q

T

j ~v

9: ~v = ~v − hj,m~qj

10: end for

11: hm+1,m = ‖~v‖

12: ~qm+1 = ~v/hm+1,m

13: find ~y that minimises ‖ρ~e1 − H˜m+1~y‖

14: ~xm+1 = ~x0 + PQm+1~y

15: ‖~rm+1‖ = ‖ρ~e1 − H˜m+1~y‖

16: m = m+ 1

17: until convergence criterion is satisfied

5.6 Lanczos Orthogonalisation Procedure for Symmetric

Matrices

If A = AT , then the Hessenberg matrix obtained by the Arnoldi process satisfies

HT = (QTAQ)T = QTATQ = QTAQ = H,

94 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems

so H is symmetric, which implies that it is tridiagonal.

Therefore, the Arnoldi update formula simplifies from

hm+1,m~qm+1 = A~qm − h0,m~q0 − h1,m~q1 − . . .− hm,m~qm.

to a three-term recursion relation

hm+1,m~qm+1 = A~qm − hm−1,m~qm−1 − hm,m~qm,

with

A

~q0 ~q1 . . . ~qi

=

~q0 ~q1 . . . ~qi ~qi+1

h0,0 h0,1

h1,0 h1,1 h1,2 0

h2,1 h2,2 h2,3

. . .

. . .

. . .

. . .

. . .

0 hi,i−1 hi,i

hi+1,i

,

or, taking into account the symmetry further,

H˜i+1 =

α0 β0

β0 α1 β1 0

β1 α2 β2

. . .

. . .

. . .

. . .

. . .

0 βi−1 αi

βi

.

The simplification of the Arnoldi procedure to compute the orthonormal basis

{~q0, . . . , ~qi} of the Krylov space based on

A~qi = βi−1~qi−1 + αi~qi + βi~qi+1

is called the Lanczos procedure. It can be shown that the Lanczos procedure

is related to the CG algorithm (just like Arnoldi is used by GMRES).

The Lanczos procedure is given by:

5.6. Lanczos Orthogonalisation Procedure for Symmetric Matrices 95

Algorithm 5.7: Lanczos Procedure for an Orthonormal Basis of Ki+1(~r0, A)

Input: matrix A ∈ Rn×n, symmetric; vector ~r0

Output: vectors ~q0, . . . , ~qi that form an orthonormal basis of Ki+1(~r0, A)

ρ = ‖~r0‖

~q0 = ~r0/ρ

β−1 = 0

~q−1 = 0

for m = 0 : i− 1 do

~v = A~qm

αm = ~q

T

m~v

~v = ~v − αm~qm − βm−1~qm−1

βm = ‖~v‖

~qm+1 = ~v/βm

end for

Finally, it can be shown that the eigenvalues of the (i + 1) × (i + 1) matrix

Ĥi+1 that is formed by the first i + 1 rows of H˜i+1 ∈ R(i+2)×(i+1), obtained by

Arnoldi or Lanczos, provide approximations for eigenvalues of A. Indeed, when

i+ 1 = n, we can consider the eigenvalue decomposition

H = V ΛV −1

of H, and then

AQ = QH = QV ΛV −1

implies

AQV = QV Λ,

i.e., the columns of QV are the eigenvectors of A, with associated eigenvalues

in Λ. When i + 1 = n this relation is exact. When i n, the eigenvalues of

Ĥi+1 = V ΛV

−1 approximate some of the eigenvalues of A, and the columns

of Qi+1V approximate the associated eigenvectors. Eigenvalue and eigenvector

computation is an important topic of the second part of this unit.

96 Chapter 5. The GMRES Method for Sparse Nonsymmetric Systems

Part II

Eigenvalues and Singular

Values

Chapter 6

Basic Algorithms for

Eigenvalues

Eigenvalues problems and singular value decomposition are particularly inter-

esting because they serve as the driving force behind many important practical

problems, ranging from structural dynamics, quantum chemistry, data science,

Markov chain techniques, control theory, and beyond. Numerically stable and

computationally fast algorithms for identifying eigenvalues and eigenvectors are

powerful and yet far from obvious to construct.

6.1 Example: Page Rank and Stochastic Matrix

Before diving into the details of these jewels of computational science, we will

first introduce the stochastic matrix of a Markov chain. This will be later used

as an example in tutorial and assignment questions.

Markov chain is widely used for studying cruise control systems in motor

vehicles, queues or lines of customers arriving at an airport, exchange rates of

99

100 Chapter 6. Basic Algorithms for Eigenvalues

currencies, or even modelling internet search. Here we will use page rank as the

motivating example1.

Step 1: A directed graph consists of a non-empty set of nodes and a set of

directed edges. Nodes are indexed by natural integers, 1, 2, · · · . If there is an

edge from the node i to the node j then i is often called tail, while j is called

head. Each directed edge represents a possible transition from its tail to its head.

Given a collection of web sites, it is reasonable to think a web site i is a node,

and a hyperlink linked to another web site j forms a directed edge from i to j.

This creates a directed graph.

0.5

0.5

0.4

0.6

0.3

0.3

0.3

0.1

0.4

0.5

0.1S1

S2

S3

S4

Remark

Since the importance of a web site is measured by its popularity (how many

incoming links it has), we can view the importance of a site i as the proba-

bility that a random surfer on the Internet entering that website by following

hyperlinks.

Step 2: We can weigh the edge (hyperlink) of the graph in a probabilistic

way: A web site i is linked to other web sites (including itself) by hyperlinks, we

can count the number of occurrences of hyperlinks that are pointed to a web site

j, and normalise these numbers by the total number of hyperlinks contained in

site i. This way, the directed edge from i to j are weighted, and such weights can

be interpreted as a discrete probability distribution. Following this probability

distribution, a random surfer currently browsing web site i will enter other sites.

This transition is described by a stochastic system, at each node i, the tran-

sition from the node i to the node j follows certain probability. This discrete

probability distribution is represented as a vector, the j-th entry of the vector

represents the probability of moving to a node j. It follows several principles:

• If the transition probability from i to j is 0, there is no edge started at i

and ended at j, and vice versa.

1The web pages shown above above are downloaded from http://www.math.cornell.edu/

mec/Winter2009/RalucaRemus/Lecture3/lecture3.html

6.1. Example: Page Rank and Stochastic Matrix 101

• We can have a probability of staying at the current node i, this is repre-

sented by an edge started and ended at the same node i.

• At any give node, the sum of transition probabilities to other nodes (in-

cluding the current node) must be 1.

This way, we can define a stochastic matrix, also known as transition matrix or

Markov matrix, to describe the transitions of a Markov chain. If we assume that

there are n possible nodes, the stochastic matrix M ∈ Rn×n is a square matrix

and each of its entries is a nonnegative real number representing a probability

of moving from the node indexed by its row number to another node indexed

by its column number. The i-th row of the matrix M is the discrete probability

distribution moving from the current node i to other nodes.

Example 6.1

For example, the system in the figure has the following stochastic matrix:

M =

0.5 0.5 0 0

0 0.4 0 0.6

0.3 0.3 0.3 0.1

0 0.4 0.5 0.1

Note that each row of the stochastic matrix sums to 1.

Step 3: How to work out the popularity of a collection of web sites given the

stochastic matrix? We need to figure out the probability distribution of surfers

entering this collection of web sites. A site is considered as popular if it has a

high probability to be visited.

At a current step k, we assume the probability of visiting the collection of

web sites can be represented by a vector ~x(k). This vector is called the state of

the system at time k. The probability of a web site j will be visited in the next

step k+ 1 is the sum of probabilities of currently visiting a site i and then follow

an edge entering into site j. This is given as

~x(k+1)(j) =

n∑

i=1

~x(k)(i)Mij , (6.1)

where ~x(k)(i) is the i-th element of the vector ~x(k). This way, the probability

~x(k+1) is given as

~x(k+1)> = ~x(k)>M. (6.2)

Example 6.2

Continuing with the transition matrix in the diagram, we assume all web

sites have equally probabilities to be visited at the beginning, i.e., ~x(0) =[

0.25 0.25 0.25 0.25

]>

. The probability of visiting the node 2 in the

next step is given as

4∑

i=1

~x(0)(i)M(i, 2) = ~x(0)>M(:, 2) =

[

0.25 0.25 0.25 0.25

]

0.5

0.4

0.3

0.4

= 0.4.

102 Chapter 6. Basic Algorithms for Eigenvalues

Starting with an initial probability ~v(0), after k steps, the probability distri-

bution of web sites being visited is

~x(k)> = ~x(0)>Mk. (6.3)

The vector ~x(k), k → represents the probability distribution of web sites being

visited after a large number of visits.

Example 6.3

Continuing with the transition matrix in the diagram, we assume all web sites

have equally probabilities to be visited, i.e., ~x(0) =

[

0.25 0.25 0.25 0.25

]>

. Af-

ter one step of the transition, it gives

~x(1)> = ~x(0)>M =

[

0.2 0.4 0.2 0.2

]

.

After three steps of the transition, it gives

~x(3)> = ~x(0)>M3 =

[

0.128 0.4 0.188 0.284

]

.

After five step of the transition, it gives

~x(5)> = ~x(0)>M5 = 0.11972 0.3922 0.20312 0.28496 .

After ten steps of the transition, it gives

~x(10)> = ~x(0)>M10 =

[

0.12164 0.3919 0.20269 0.28378

]

.

After one hundred steps of the transition, it gives

~x(100)> = ~x(0)>M100 =

[

0.12162 0.39189 0.2027 0.28378

]

.

After one thousands steps of the transition, it gives

~x(1000)> = ~x(0)>M1000 =

[

0.12162 0.39189 0.2027 0.28378

]

.

We can also randomly choose an initial distribution ~x(0) =[

0.081295 0.54474 0.30791 0.066051

]>

. After one thousands steps

of the transition, it gives

~x(1000)> = ~x(0)>M1000 =

[

0.12162 0.39189 0.2027 0.28378

]

.

Regardless with the initial distribution, the transition seems to converge to a

stationary distribution.

Studying the eigenvectors and eigenvalues are crucial for identifying ~x(k), k →

and understanding its behaviour. In fact, we can show that the largest eigenvalue

of M is one. Under certain conditions, with large k, the vector ~x(k) converges

to a vector ~x(∞). The vector ~x(∞) is called the stationary distribution. The

stationary distribution is invariant under the transition defined by M , which

can be stated in the form of

~x(∞)> = ~x(∞)>M.

In fact, the stationary distribution is the eigenvector of M> associated with

eigenvalue 1.

6.1. Example: Page Rank and Stochastic Matrix 103

Remark

Here, we assume that the next state of the system only depends on it current

state. This property is called the Markov property.

The stochastic matrix discussed so far is often called the right stochastic

matrix, as it appears in the right side of the multiplication in defining a transition.

Definition 6.4

Given a right stochastic matrix Mr ∈ Rn×n, its entry Mr(i, j) defines the

transition probability from the node i to the node j. Each row of M sums to 1.

For computational convenience, in this note we are dealing with the transpose

of the right stochastic matrix, which is often referred to as the left stochastic

matrix.

Definition 6.5

A left stochastic matrix Ml ∈ Rn×n is a stochastic matrix with each of its

entries Ml(i, j) defines the transition probability from the node j to the node i.

Each column of M sums to 1. For the same transition diagram, Ml = M

>

r .

Example 6.6

The left stochastic matrix of the given diagram is

Ml =

0.5 0 0.3 0

0.5 0.4 0.3 0.4

0 0 0.3 0.5

0 0.6 0.1 0.1

.

Given an initial distribution ~x(0) =

[

0.25 0.25 0.25 0.25

]>

, the probability of

visiting the node 2 in the next step is given as

4∑

i=1

Ml(2, i)~x

(0)(i) = Ml(2, :) ~x

(0) =

[

0.5 0.4 0.3 0.4

]

0.25

0.25

0.25

0.25

= 0.4.

Given ~x(k), the probability ~x(k+1) is given as

~x(k+1) = Ml ~x

(k). (6.4)

The stationary distribution has the property

~x(∞) = Ml ~x(∞). (6.5)

104 Chapter 6. Basic Algorithms for Eigenvalues

6.2 Fundamentals of Eigenvalue Problems

Let A ∈ Rn×n be a square matrix, a non-zero ~x ∈ C is called a eigenvector and

λ ∈ is called its corresponding eigenvalue, given that

A~x = λ~x. (6.6)

Here we review the basic mathematics of eigenvalues and eigenvectors.

6.2.1 Notations

This note does not deal with the eigenvalue problems of matrices with complex

entries. However, the eigenvalues and eigenvectors of a matrix filled with real

entries may not be real.

Example 6.7

The matrix

A =

[

0 1

−1 0

]

,

has eigenvalues ±i and eigenvectors [1,±i]>/√2.

We need to introduce some special matrix operations and special matrices

involving complex numbers.

Definition 6.8

The conjugate transpose or Hermitian transpose of a matrix A ∈ with

complex entries is the m-by-n matrix A∗ obtained from A by taking the trans-

pose and then taking the complex conjugate of each entry (i.e. negating their

imaginary parts but not their real parts). This takes the form of

A∗ij = Aji.

Definition 6.9

A unitary matrix is a complex square matrix Q ∈ with its conjugate transpose

Q∗ is also its inverse. That is

QQ∗ = Q∗Q = I.

An orthogonal matrix is unitary.

We also introduce some basic matrix operations that will be used in Part II.

1. Given a matrix A, the entry on the i-th row and j-th column is denoted as

Aij .

2. The k-th power of a square matrix A is denoted as Ak.

3. The superscript (k) is used to denote some variables at the k-th iteration

of an algorithm. For example, A(k) is a matrix A in the k-th iteration of

some algorithm. In general, A(k) is not Ak.

4. ~vi is used to denote the i-th vector in a sequence of vectors, and ~vi(j) is

used to denote the j-th entry of the vector ~vi.

6.2. Fundamentals of Eigenvalue Problems 105

5. To be consistent with 4, we also use A(i, j) to denote the entry on the i-th

row and j-th column of a matrix A, i.e., A(i, j) = Aij .

6. Similarly, we use A(k : l,m : n) to denote a submatrix of A—the submatrix

spans the k-th to l-th rows and m-th to n-th columns of A.

6.2.2 Eigenvalue and Eigenvector

Equation (6.6) can be equivalently stated as

(A− λI)~x = 0. (6.7)

For a given eigenvalue λ, there may exists a set of linearly independent eigen-

vectors Eλ such that

(A− λI)~v = 0, ∀~v ∈ Eλ.

The subspace spanned by the set Eλ is called an eigenspace. The eigenspace is

the nullspace of the matrix (A− λI).

Equation (6.7) has a non-zero solution ~x if and only if the determinant of the

matrix A− λI is zero. The determinant of the matrix A− λI can be expressed

as a polynomial.

Definition 6.10

The characteristic polynomial of A is a degree n polynomial in the form of

pA(λ) = det(A− λI). (6.8)

The eigenvalues of a matrix A are the roots of the characteristic polynomial.

The fundamental theorem of algebra implies that the characteristic polyno-

mial of A ∈ Rn×n, is a degree-n polynomial, can be factored as

pA(λ) = (λ− λ1)(λ− λ2) · · · (λ− λn) =

n∏

i=1

(λ− λi).

We note that each of the roots of the characteristic polynomial, λi, can be a

complex number. The roots, λ1, λ2, . . . , λn, may not all have distinct values.

This leads to the concept of algebraic multiplicity of an eigenvalue.

Definition 6.11

The algebraic multiplicity of an eigenvalue λi, µA(λi) is the multiplicity of

λi as a root of pA(λi). An eigenvalue is simple if it has multiplicity 1.

Another multiplicity of an eigenvalue λi, the geometric multiplicity, is defined

by the dimension of the nullspace of (A− λiI).

Definition 6.12

The geometric multiplicity of λi, µG(λi), is the number of linearly inde-

pendent eigenvectors associated with λi, or the dimension of the nullspace of

(A− λiI).

The algebraic multiplicity of any eigenvalue of a matrix A ∈ Rn×n is always

greater than or equal to the its geometric multiplicity. Later we will prove this.

106 Chapter 6. Basic Algorithms for Eigenvalues

Example 6.13

Consider two matrices

A =

a a

a

, B =

a 1a 1

a

, (6.9)

where a > 0. Both A and B have the same characteristic polynomial (a− 1)3.

A has three linearly independent eigenvectors, where as B only has one, namely

the scalar multiplication of ~e1.

Definition 6.14

A defective eigenvalue is an eigenvalue whose algebraic multiplicity exceeds

its geometric multiplicity. A matrix A ∈ Rn×n is called a defective matrix

if it has one or more defective eigenvalues.

6.2.3 Similarity Transformation

Definition 6.15

If a matrix X ∈ is nonsingular, then the map

A→ X−1AX,

is called a similarity transformation. Two matrices A ∈ and B ∈ are called

similar if there exist a nonsingular matrix X ∈ such that

B = X−1AX.

Similar matrices A and X−1AX share many important properties.

Theorem 6.16

Given a matrix A ∈ and a nonsingular matrix X ∈, A and X−1AX have

the same characteristic polynomial, eigenvalues, and algebraic and geometric

multiplicities.

Proof. By the definition of characteristic polynomial, we have

pX−1AX(λ) = det(X

−1AX − λI)

= det(X−1(A− λI)X)

= det(X−1) det(A− λI) det(X)

= det(A− λI)

= pA(λ)

Since A andX−1AX have the same characteristic polynomial, the agreement of

eigenvalues and algebraic multiplicity follows. The dimension of the nullspace

of (A − λI) is the same as that of X−1(A − λI)X are identical because X is

nonsingular, and thus the agreement of geometric multiplicity follows.

6.2. Fundamentals of Eigenvalue Problems 107

Similarity transformation can be used to show connections between algrbric

multiplicity and geometric multiplicity.

Theorem 6.17

The algebraic multiplicity of any eigenvalue of a matrix A ∈ Rn×n is always

greater than or equal to the its geometric multiplicity.

Proof. Suppose an eigenvalue with geometric multiplicity r has r linearly

independent eigenvectors, ~v1, . . . , ~vr, forming a matrix

Vr =

~v1 · · · ~vr

,

such that AVr = λVr. We can extending Vr to a unitary matrix V = [Vr|V⊥].

Applying the similarity transformation V ∗AV , we obtain

B = V ∗AV =

[

λIr C

0 D

]

,

where Ir ∈ is an identity matrix, C ∈r×(n−r), and D ∈(n−r)×(n−r). Since A

and B have the same characteristic polynomial, we can then expressed the

characteristic polynomial of A as

pA(z) = det(B−zI) = det(zIr−λIr) det(zIn−r−D) = (z−λ)r det(zIn−r−D).

Thus the algebraic multiplicity of λ is greater than or equal to r.

6.2.4 Eigendecomposition, Diagonalisation, and Schur Factorisation

For a matrix A ∈ Rn×n that is non-defective, i.e., the algebraic multiplicity and

the geometric multiplicity of each of its eigenvalue are the same. We have

AV = V Λ, (6.10)

where

V =

~v1 ~v2 · · · ~vn

, Λ =

λ1

λ2

. . .

λn

are the eigenvectors and corresponding eigenvalues. This effective factorise the

matrix A in the form of

A = V ΛV −1. (6.11)

This similarity transformation effectively diagonalise the matrix A. In fact, it is

easy to verify that a diagonal matrix is non-defective.

108 Chapter 6. Basic Algorithms for Eigenvalues

Theorem 6.18

A matrix A ∈ Rn×n is non-defective if and only if it has an eigenvalue decom-

position A = V ΛV −1.

Proof. Given an eigenvalue decomposition A = V ΛV −1, we know that A and

Λ are similar. Since the diagonal matrix Λ is non-defective, A is non-defective

by Theorem 6.7.

A non-defective matrix must have n linearly independent eigenvectors, because

(1) the number of linearly independent eigenvector associated with each eigen-

value is equal to its algebraic multiplicity; and (2) eigenvectors associated with

different eigenvalues are linearly independent. Thus, the resulting matrix V

formed by all the eigenvectors is nonsingular.

A matrix A is unitarily diagonalisable, that is, there exists a unitary matrix

Q such that

A = QΛQ∗.

Real symmetric matrices are special matrices that are orthogonally diagonalis-

able. This leads to many computational advantages in finding their eigenvalues.

Remark 6.19

A real symmetric matrix is orthogonally diagonalisable and its eigenvalues are

real. That is, both Q and Λ are real for a symmetric A.

Not every matrix is unitarily diagonalisable. Furthermore, deflective matrices

are even not diagonalisable. A more general matrix decomposition is the Schur

factorisation.

Definition 6.20

A Schur factorisation of a matrix A ∈ Rn×n takes the form

A = QTQ∗,

where T is upper-triangular and Q is unitary.

Theorem 6.21

Every square matrix A ∈ Rn×n has a Schur factorisation.

Proof.

The case n = 1 is trivial as A is a scalar. Suppose n ≥ 2. Let ~x be any

eigenvector of A with corresponding eigenvalue λ. Take ~x be normalised and

let it be the first column of a unitary matrix U in the form of

U = [~x |U2] ,

where U2 ∈ R(n−1)×n.

The product U∗AU has the form

U∗AU =

[

~x∗A~x ~x∗AU2

U∗2A~x U

∗

2AU2

]

.

6.2. Fundamentals of Eigenvalue Problems 109

Since ~x∗A~x = λ and U∗2A~x = λU

∗

2 ~x = 0. Let C = ~x

∗AU2 and D = U∗2AU2,

the product can be simplified to

U∗AU =

[

λ C

0 D

]

.

By induction, there exists a Schur factorisation V TV ∗ of the lower dimensional

matrix D. Then write the unitary matrix

Q =

[

1 0

0 V

]

,

and we have

(Q∗U∗)A(UQ) =

[

λ CV

0 T

]

.

Since UQ is unitary and (Q∗U∗)A(UQ) is upper triangular, we obtain the

Schur factorisation.

Theorem 6.22

The eigenvalues of a triangular matrix are the entries on its main diagonal.

Proof. Given an upper triangular matrix T ∈ Rn×n. The characteristic

polynomial of T can be written as

pT (λ) = det(T − λI).

We can partition T in the following form

T − λIn =

T2 − λIn−1 ~t1

~0> T11 − λ

,

where T2 − λIn−1 is also upper triangular. Using the property of the determi-

nant of block matrices, we have

det(T − λI) = det(T2 − λIn−1) det(T11 − λ−~0>(T2 − λIn−1)−1~t1)

= det(T2 − λIn−1)(T11 − λ).

Since T2− λI is also upper triangular, repeatedly applying this procedure will

leads to the characteristic polynomial

pT (λ) =

n∏

i=1

(Tii − λ).

Therefore, the eigenvalues of a triangular matrix are the entries on its main

diagonal.

110 Chapter 6. Basic Algorithms for Eigenvalues

Remark 6.23

In summary, we have the following important results for identifying eigenvalues

of a matrix.

1. A matrix A is nondefective if and only if there exists an eigenvalue de-

composition A = V ΛV −1.

2. For a symmetric matrix A, there exists an orthogonal diagonalisation

A = QΛQ∗.

3. A unitary triangularisation (Schur factorisation) always exists A =

QTQ∗.

Theorem 6.24

A real square matrix is symmetric if and only if it has the eigendecomposition

A = QΛQ>, where Q is a real orthogonal matrix and Λ is a real diagonal

matrix whose entries are the eigenvalues of A.

Proof. (The “only if” part =⇒ ): From Theorem 6.21 we know that a

general square matrix has the Schur decomposition A = QTQ>, where T is

upper triangular. This way we have T = QAQ>. For a symmetric matrix A,

the matrix T should be also symmetric. A symmetric upper triangular matrix

must be diagonal. This leads to the decomposition A = QTQ> where T is

diagonal, which is an eigendecomposition. Furthermore, all the eigenvalues

and eigenvectors of a real symmetric matrix are real.

(The “if” part ⇐= ): Given the eigendecomposition A = QΛQ> with a real

orthogonal Q and real diagonal Λ, A is a real symmetric matrix since Λ is

symmetric.

Therefore the result follows.

6.2.5 Extending Orthogonal Vectors to a Unitary Matrix

In the proofs in the previous subsection, one important step is extending a rect-

angular matrix

Vr =

~v1 · · · ~vr

,

where Vr ∈, to a unitary matrix

V =

Vr V⊥

,

6.2. Fundamentals of Eigenvalue Problems 111

where V⊥ ∈n×(n−r). Here we explain the details of this operation.

For a given matrix A ∈ Rn×n, suppose it has an eigenvalue λ with geometric

multiplicity r. This way, the eigenvalue λ has r linearly independent eigenvectors,

i.e., A~ui = λ~ui, i = 1, . . . , r. Furthermore, we can show that a sequence of

orthonormal eigenvectors {~v1, ~v2, · · · , ~vr} can be obtained by orthogonalising and

normalising this set of eigenvectors {~u1, ~u2, · · · , ~ur}—using either Gram-Schmidt

or Householder reflection. The vectors {~v1, ~v2, · · · , ~vr} are still in the null space

of A−λI (the eigenspace of λ) as they are linear combinations of {~u1, ~u2, · · · , ~ur},

and thus are eigenvectors. This forms the matrix Vr.

As in the QR factorisation, we can always construct another n-by-(n-r) or-

thonormal matrix

V⊥ =

~vr+1 · · · ~vn

,

such that each column of V⊥ is orthogonal to all the columns of Vr. Since both

Vr and V⊥ are orthonormal, the matrix V = [Vr|V⊥] is a unitary matrix.

Now we have

V ∗AV =

[

V ∗r

V ∗⊥

]

A

[

Vr V⊥

]

=

[

V ∗r AVr V

∗

r AV⊥

V ∗⊥AVr V

∗

⊥AV⊥

]

.

Since AVr = λVr and V⊥ is orthogonal to Vr, the above equation can be written

as

V ∗AV =

[

λIr C

0 D

]

,

where C = V ∗r AV⊥ and D = V

∗

⊥AV⊥. The resulting matrix V

∗AV is upper

block triangular.

Similarly, we can construct another matrix

U =

V⊥ Vr

,

and repeat the above process. This leads to

U∗AU =

[

V ∗⊥

V ∗r

]

A

[

V⊥ Vr

]

=

[

V ∗⊥AV⊥ V

∗

⊥AVr

V ∗r AV⊥ V

∗

r AVr

]

=

[

D 0

C λIr

]

,

with the same C and D defined above. The resulting matrix V ∗AV is lower

block triangular.

112 Chapter 6. Basic Algorithms for Eigenvalues

6.3 Power Iteration and Inverse Iteration

Given a matrix A ∈ Rn×n, we recall that eigenvalues are the roots of the char-

acteristic polynomial pA(λ) = det(A − λI). In principle, this characteristic

polynomial has degree n. For a polynomial with degree 2 or 3, well established

formulas can be used to find its roots. However, as shown by Abel, Galois, and

others in nineteenth century, a degree n ≥ 5 polynomial in the form of

p(λ) = a0 +

n∑

i=1

aiλ

i,

where each coefficient ai is a rational number, its roots can not be obtained by

algebraic expressions—addition, subtraction, multiplication, and division. This

suggests that we are not able to have direct solvers for finding eigenvalues of

general matrices.

Remark 6.25

As many root finding algorithms, eigenvalue solvers must be iterative.

6.3.1 Power Iteration

A straightforward idea is that the sequence

~b

‖~b‖

,

A~b

‖A~b‖

,

A2~b

‖A2~b‖

, · · · , A

k~b

‖Ak~b‖

converges to an eigenvector corresponding to the largest eigenvalue (in absolute

value) of the matrix A. This is called the power iteration. It can be formalised

as the following:

Algorithm 6.26: Power Iteration

Input: Matrix A ∈ Rn×n and an initial vector ~b(0) = ~x ∈ Rn, where ‖~x‖ = 1

Output: An eigenvalue λ(m) and its eigenvector ~b(m)

1: for k = 1, 2, . . . ,m do

2: ~t(k) = A~b(k−1) . Apply A

3: ~b(k) = ~t(k)/‖~t(k)‖ . Normalise

4: λ(k) =

(

~b(k)

)∗ (

A~b(k)

)

. Estimate eigenvalue

5: end for

Repeatedly apply Steps 2 and 3, the vectors ~b(k), k = 0, 1, . . . ,m follows the

sequence

~x

‖~x‖ ,

A~x

‖A~x‖ ,

A2~x

‖A2~x‖ , · · · ,

Am~x

‖Am~x‖ .

Suppose ~b(k) is an eigenvector of A, then we have

A~b(k) = λ(k)~b(k).

As ~b(k) is normalised, multiplying both sides by

(

~b(k)

)∗

leads to the ratio

λ(k) =

(

~b(k)

)∗ (

A~b(k)

)

(

~b(k)

)∗

~b(k)

=

(

~b(k)

)∗ (

A~b(k)

)

. (6.12)

6.3. Power Iteration and Inverse Iteration 113

Definition 6.27

The ratio

r(~b) =

~b ∗A~b

~b ∗~b

, (6.13)

can be understood as: given a direction ~b, what scalar λ acts most like an eigen-

value for ~b, in the sense of minimising the f(λ) = ‖A~b−λ~b‖2. By differentiate

this term w.r.t. λ, we have

∂f

∂λ

=

∂‖A~b− λ~b‖2

∂λ

= −2~b ∗(A~b− λ~b).

At λ such that ∂f∂λ = 0, f(λ) has the local minima (as the second derivative is

2~b ∗~b = 2‖~b‖2), and thus we have λ = r(~b) as defined above. For a symmetric

matrix A ∈ Rn×n, this ratio if called Rayleigh quotient.

6.3.2 Convergence of Power Iteration

We want to show the convergence of the power iteration in two aspects. We first

show that the sequence ~b(k) converges linearly to an eigenvector corresponding

to the largest eigenvalue. Then we prove that for an estimated eigenvector, the

estimated eigenvalue given by the ratio (6.13) converges linearly to corresponding

eigenvector.

Theorem 6.28

Assume a matrix A ∈ Rn×n is non-defective. Suppose its eigenvalues are

ordered so that

|λ1| ≥ |λ2| ≥ | · · · ≥ |λn|.

Let ~v1, . . . , ~vn denote (normalised) eigenvectors corresponding to each of the

eigenvalues. Suppose further we have an initial vector ~b(0) = ~x such that

~x∗~v1 6= 0. Then the vector ~b(k) in the power iteration satisfies

‖~b(k) − (±~v1)‖ = O

(∣∣∣∣λ2λ1

∣∣∣∣k

)

,

as k →∞. The ± represents one or other choice of the sign is to be taken.

Proof. We represent ~x as a linear combination of all the (normalised) eigen-

vectors ~v1, . . . , ~vn, which takes the form of

~x =

n∑

i=1

ai ~vi.

114 Chapter 6. Basic Algorithms for Eigenvalues

Let

V =

~v1 ~v2 · · · ~vn

, Λ =

λ1

λ2

. . .

λn

, and ~a =

a1

a2

...

an

,

we have A = V ΛV −1 and ~x = V~a, and hence

~b(k) = c(k)Ak~x = c(k)V ΛkV −1V~a = c(k)V Λk~a = c(k)

n∑

i=1

λki ai ~vi,

where c(k) brings the vector ~b(k) normalised.

Now we bring λk1 to the outside of the summation in the form of

~b(k) = c(k)λk1

(

n∑

i=1

λki

λk1

ai ~vi

)

= c(k)λk1a1 ~v1 + c

(k)λk1

n∑

i=2

λki

λk1

ai ~vi.

Therefore, the convergence of ~b(k) to ~v1 is dominated by the rate that each of

λki

λk1

vanish, which is on the order of

∣∣∣λ2λ1 ∣∣∣k.

Theorem 6.29

Assume a non-symmetric matrix A ∈ Rn×n is non-defective. Suppose λK is

an eigenvalue of A with an eigenvector ~vK . The ratio

r(~b) =

~b ∗A~b

~b ∗~b

,

is a linearly accurate estimate of the eigenvalue λK :

|r(~b)− λK | = O(‖~b− ~vK‖), as ~b→ ~vK .

Proof. We represent~b as a linear combination of all the eigenvectors ~v1, . . . , ~vn,

which takes the form of

~b =

n∑

i=1

ai ~vi.

As defined in the previous proof, we have A = V ΛV −1 and ~b = V~a, and hence

A~b = V ΛV −1V~a = V Λ~a =

n∑

i=1

λi ai ~vi.

6.3. Power Iteration and Inverse Iteration 115

This way the ratio r(~b) can be written as

r(~b) =

(

∑n

i=1 λi ai ~vi)

∗~b

~b∗~b

Thus, the error in the eigenvalue estimate takes the form

r(~b)− λK = (

∑n

i=1 λi ai ~vi)

∗~b

~b∗~b

− λK

~b∗~b

~b∗~b

=

(

∑n

i=1 λi ai ~vi)

∗~b− λK (

∑n

i=1 ai ~v

∗

i )

~b

~b∗~b

=

(∑n

i 6=K(λi − λK) ai ~v∗i~b

)

~b∗~b

.

Now, we can express the error as a weighted sum of ai for i 6= K, which is in

the form of

r(~b)− λK =

n∑

i6=K

ai wi, where wi =

(λi − λK)~v∗i~b

~b∗~b

.

Given ~b = aK ~vK +

∑n

i 6=K ai ~vi, if ~b is close to ~vK , each ai for i 6= K is on

the order of ~b−~vK . Therefore, r(~b) converges linearly to the eigenvalue λK as

~b→ ~vK .

Power iteration by itself can be slow. For example, it does not converge

if |λ1| = |λ2|. Nevertheless, it serves as a basis for many powerful eigenvalue

algorithms we will explore in later section. It also reveals the iterative nature of

eigenvalue solvers.

6.3.3 Shifted Power Method

We have observed that if the first and second largest eigenvalues (in their absolute

value) are close, the power iteration suffers from slow convergence. One simple

yet power idea to handle is situation is using a shifted matrix A+ σI.

Theorem 6.30

If λ is an eigenvalue of A, then λ+µ is an eigenvalue of A+µI. Furthermore,

if ~v is an eigenvector of A associated with λ, ~v is also an eigenvector of A+µI

associated with λ+ µ.

Using the shifted matrix A+ σI, we can enhance the ratio between the first

and second largest eigenvalues.

6.3.4 Inverse Iteration

There also exists alternative ways to enhance the ratio between eigenvalues.

116 Chapter 6. Basic Algorithms for Eigenvalues

Theorem 6.31

Suppose µ is not an eigenvalue of A ∈ Rn×n, the eigenvectors of (A−µI)−1 are

the same as A, and the corresponding eigenvalues are (λi − µ)−1, i = 1, . . . , n,

where λi, i = 1, . . . , n are eigenvalues of A.

This theorem suggests that if we choose a µ that is close to an eigenvalue

λK . Then the eigenvalue (λK−µ)−1 may be much larger that other eigenvalues,

(λi−µ)−1, i 6= K, of the matrix (A−µI)−1. This leads to the inverse iteration.

Algorithm 6.32: Inverse Iteration

Input: Matrix A ∈ Rn×n, an initial vector ~b(0) = ~x ∈ Rn where ‖~x‖ = 1,

and a shift scalar µ ∈ R.

Output: An eigenvalue λ(m) and its eigenvector ~b(m)

1: for k = 1, 2, . . . ,m do

2: Solve (A− µI)~w(k) = ~b(k−1) for ~w(k) . Apply (A− µI)−1

3: ~b(k) = ~w(k)/‖~w(k)‖ . Normalise

4: λ(k) =

(

~b(k)

)∗ (

A~b(k)

)

. Estimate eigenvalue

5: end for

6.3.5 Convergence of Inverse Iteration

Theorem 6.33

Given a nondefective matrix A ∈ Rn×n, suppose λK is the closest eigenvalue

to µ and λL is the second closest, that is,

|λK − µ| < |λL − µ| ≤ |λi − µ|, for each i 6= K.

Let ~v1, . . . , ~vn denote eigenvectors corresponding to each of the eigenvalues of

A. Suppose further we have an initial vector ~x such that ~x∗~vK 6= 0. Then the

vector ~b(k) in the inverse iteration satisfies

‖~b(k) − (±~vK)‖ = O

(∣∣∣∣λK − µλL − µ

∣∣∣∣k

)

,

and the estimated eigenvalue λ(k) satisfies

|λ(k) − λK | = O

(∣∣∣∣λK − µλL − µ

∣∣∣∣k

)

.

Proof. Using Theorem 6.31, we can show that the matrix B = (A − µI)−1

has eigenvalues zi = (λi − µ)−1, i = 1, . . . , n associated with (normalised)

eigenvectors ~v1, . . . , ~vn. Note that the eigenvalues are ordered as

|zK | > |zL| ≥ |zi| for each i 6= K.

6.3. Power Iteration and Inverse Iteration 117

Using the same argument in the proof of Theorem 6.28, we can show that

‖~b(k) − (±~vK)‖ = O

(∣∣∣∣ zLzK

∣∣∣∣k

)

= O

(∣∣∣∣λK − µλL − µ

∣∣∣∣k

)

.

Since the estimated eigenvector ~b(k) converges to ±~vK on the order of

O

(∣∣∣ zLzK ∣∣∣k). Applying Theorem 6.29, we can show that λ(k) = r(~b(k)) sat-

isfies

|λ(k) − λK | = O

(∣∣∣∣λK − µλL − µ

∣∣∣∣k

)

.

Remark 6.34

Step 2 of the inverse iteration relies on solving a linear system that is exceed-

ingly ill-conditioned. Will this create any fatal flaw in the algorithm?

Fortunately this does not introduce fatal flaw if the linear system is solved

by some stable methods. Step 2 of the inverse iteration solves

(A− µI)~w(k) = ~b(k−1)

for ~w(k). Suppose µ is close to an eigenvalue λJ with an eigenvector ~vJ .

Using Theorem 6.30, we can show that the matrix C = A−µI has eigenvalues

σi = λi − µ, i = 1, . . . , n associated with (normalised) eigenvectors ~v1, . . . , ~vn.

Given a diagonal matrix D where Dii = σi, the matrix C has the similarity

transformation C = V DV −1. We can express the right hand side vector ~b(k−1)

as a linear combination of eigenvectors,~b(k−1) = V~a. This way, ~w(k) = C−1~b(k−1)

can be written as

~w(k) = V D−1~a =

~a(J)

λJ − µ~vJ +

n∑

i6=J

~a(i)

λi − µ~vi. (6.14)

This leads to the desired eigenvector we want to approximate if µ is close to λJ .

Now we deal with the ill-conditioning part. We want to examine the stability

of ~w(k) given a small perturbation to C and ~b(k−1).

(C + δC)(~w(k) + δ ~w) = ~b(k−1) + δ~b.

The left hand side takes the form of

(C + δC)(~w(k) + δ ~w) = C ~w(k) + Cδ ~w + δC ~w(k) + δCδ ~w.

Since the double perturbation term δCδ ~w can be neglected and C ~w(k) = ~b(k−1),

we have

δ ~w = −C−1(δC ~w(k) + δ~b).

Without loss of generality, we can express (δC ~w(k) + δ~b) as a linear combination

of eigenvectors, (δC ~w(k) + δ~b) = V ~d. Using the eigendecomposition of C, we

have

δ ~w = V D−1 ~d =

~d(J)

λJ − µ~vJ +

n∑

i 6=J

~d(i)

λi − µ~vi. (6.15)

118 Chapter 6. Basic Algorithms for Eigenvalues

If µ is close to λJ , the perturbation to the solution, δ ~w also lies along the desired

eigenvector we want to approximate.

Therefore, as far as the linear system is solved by a stable method (for exam-

ple, LU with pivoting) that can produce a solution ~w + δ ~w, both ~w and ~w + δ ~w

closely lies along the same direction ~vJ . One step of normalisation will resolve

the difference in size.

6.4. Symmetric Matrices and Rayleigh Quotient Iteration 119

6.4 Symmetric Matrices and Rayleigh Quotient Iteration

In this section, we focus on applying the power iteration and the inverse iteration

to symmetric matrices. The convergence of eigenvalue estimates of symmetric

matrices exhibit a higher speed of convergence compare with that of unsym-

metric matrices. We will also show a new algorithm that combines eigenvalue

estimation using Rayleigh quotient with the inverse iteration to further enhance

the converence speed.

6.4.1 Rate of Convergence

Definition 6.35

Suppose we have a sequence y(1), y(2), . . . , y(k) converges to a number y. We

say the sequence converges linearly to y if

lim

k→∞

|y(k+1) − y|

|y(k) − y| = σ,

for some sigma ∈ (0, 1).

If the sequence converges with an iteration dependent σk ∈ (0, 1)

lim

k→∞

|y(k+1) − y|

|y(k) − y| = σk.

We say the sequence converges superlinearly to y if σk → 0 as k →∞. We

say the sequence converges sublinearly to y if σk → 1 as k →∞.

An alternative way of viewing this is to look at the error in the logarithmic

scale:

log(|y(k+1) − y|)− log(|y(k) − y|) = log(σk).

If log(σk) < 0 is a constant, then the logarithmic of the error decreases linearly. If

log(σk)→ −∞ as k →∞, then the error decreases superlinearly. If log(σk)→ 0

as k →∞, then the error decreases sublinearly.

Definition 6.36

Suppose we have a sequence y(1), y(2), . . . , y(k) converges to a number y. We

say the sequence converges with order q to y if

|y(k+1) − y|

|y(k) − y|q = γ,

for some γ > 0. For example, q = 2 gives quadratic convergence.

Using the logarithmic scale:

lim

k→∞

log(|y(k+1) − y|)− q log(|y(k) − y|) = log(γ).

6.4.2 Power Iteration and Inverse Iteration for Symmetric Matrices

Recall the Rayleigh quotient

r(~b) =

~b ∗A~b

~b ∗~b

, (6.16)

120 Chapter 6. Basic Algorithms for Eigenvalues

for estimating eigenvalues given a vector ~b. Now we want to assess the accuracy

of this eigenvalue estimate for symmetric matrices.

Theorem 6.37

Given a symmetric matrix A ∈ Rn×n. Suppose λK is an eigenvalue of A

with an eigenvector ~qK . The ratio

r(~b) =

~b ∗A~b

~b ∗~b

,

is a quadratically accurate estimate of the eigenvalue λK :

|r(~b)− λK | = O(‖~b− ~vK‖2), as ~b→ ~vK .

Proof. Since a symmetric matrix A has an eigendecomposition A = QΛQ∗,

where Q is an orthogonal matrix and Λ is a diagonal matrix. Each diagonal

entry of λi = Λii is an eigenvalue of A, and the corresponding i-th column of

Q, ~qi = Q(:,i) is an eigenvector with eigenvalue λi. We represent ~b as a linear

combination of all the eigenvectors ~q1, . . . , ~qn, which takes the form of

~b =

n∑

i=1

ai ~qi, or ~b = Q~a.

Now we have

~b∗A~b = ~a∗Q∗QΛQ∗Q~a = ~a∗Λ~a =

n∑

i=1

λi a

2

i ,

since Q is orthogonal.

This way the ratio r(~b) can be written as

r(~b) =

∑n

i=1 λi a

2

i

~b∗~b

Thus, the error in the eigenvalue estimate takes the form

r(~b)− λK =

∑n

i=1 λi a

2

i

~b∗~b

− λK

∑n

i=1 a

2

i

~b∗~b

=

∑n

i 6=K(λi − λK) a2i

~b∗~b

.

Now, we can express the error as a weighted sum of a2i for i 6= K, which is in

the form of

r(~b)− λK =

n∑

i 6=K

a2i wi, where wi =

λi − λK

~b∗~b

.

Given ~b = aK ~qK +

∑n

i 6=K ai ~qi, if ~b is close to ~qK , each ai for i 6= K is on

the order of ~b − ~qK , and hence a2i = O(‖~b − ~qK‖2) for i 6= K. Therefore, r(~b)

converges quadratically to the eigenvalue λK as ~b→ ~qK .

6.4. Symmetric Matrices and Rayleigh Quotient Iteration 121

Not surprisingly, applying the power iteration 6.37 to a symmetric matrix, it

will have linear convergence in the eigenvector estimate and quadratic conver-

gence in the eigenvalue estimate, given that the ratio between first and second

largest eigenvalue is not 1. A similar result holds for the inverse iteration as well.

Theorem 6.38

Given a symmetric matrix A ∈ Rn×n. Suppose its eigenvalues are ordered so

that

|λ1| ≥ |λ2| ≥ | · · · ≥ |λn|.

Let ~q1, . . . , ~qn denote (normalised) eigenvectors corresponding to each of the

eigenvalues. Suppose further we have an initial vector ~b(0) = ~x such that

~x∗~q1 6= 0. Then the vector ~b(k) in the power iteration converges as

‖~b(k) − (±~qK)‖ = O

(∣∣∣∣λ2λ1

∣∣∣∣k

)

,

and the estimated eigenvalue λ(k) converges as

|λ(k) − λK | = O

(∣∣∣∣λ2λ1

∣∣∣∣2k

)

.

Theorem 6.39

Given a symmetric matrix A ∈ Rn×n, suppose λK is the closest eigenvalue to

µ and λL is the second closest, that is,

|λK − µ| < |λL − µ| ≤ |λi − µ|, for each i 6= K.

Let ~q1, . . . , ~qn denote eigenvectors corresponding to each of the eigenvalues of

A. Suppose further we have an initial vector ~x such that ~x∗~qK 6= 0. Then the

vector ~b(k) in the inverse iteration converges as

‖~b(k) − (±~qK)‖ = O

(∣∣∣∣λK − µλL − µ

∣∣∣∣k

)

,

and the estimated eigenvalue λ(k) converges as

|λ(k) − λK | = O

(∣∣∣∣λK − µλL − µ

∣∣∣∣2k

)

.

6.4.3 Rayleigh Quotient Iteration

Once given a good estimate of eigenvalue, the inverse iteration demonstrates

great speed in finding the eigenvector, while the Rayleigh quotient estimates the

eigenvalue for a given vector. It is natural to combine both ideas. This leads to

the Rayleigh Quotient Iteration.

122 Chapter 6. Basic Algorithms for Eigenvalues

Algorithm 6.40: Rayleigh Quotient Iteration

Input: Matrix A ∈ Rn×n and an initial vector ~b(0) = ~x ∈ Rn where ‖~x‖ = 1.

Output: An eigenvalue λ(m) and its eigenvector ~b(m)

1: λ(0) =

(

~b(0)

)∗ (

A~b(0)

)

2: for k = 1, 2, . . . ,m do

3: Solve (A− λ(k−1)I)~w(k) = ~b(k−1) for ~w(k) . Apply (A− λ(k−1)I)−1

4: ~b(k) = ~w(k)/‖~w(k)‖ . Normalise

5: λ(k) =

(

~b(k)

)∗ (

A~b(k)

)

. Estimate eigenvalue

6: end for

In the Rayleigh quotient iteration, we first have an estimate of the eigenvalue

for the initial vector. Then, in each iteration, we feed the estimated eigenvalue

from the previous step into the shifted inverse iteration (for estimating eigenvec-

tor). This leads to spectacular convergence.

Theorem 6.41

Given a symmetric matrix A ∈ Rn×n. Suppose the initial vector is close to an

eigenvector ~qK corresponding to an eigenvalue λK . Then the vector ~b

(k) in the

Rayleigh quotient iteration converges cubically as

‖~b(k+1) − (±~qK)‖ = O

(∣∣∣‖~b(k) − (±~qK)‖∣∣∣3) ,

and the estimated eigenvalue λ(k) converges cubically as

|λ(k+1) − λK | = O

(

|λ(k) − λK |3

)

.

Note the ± sign on both sides are not necessarily the same in above equations.

Proof. Here we employ a rather restrictive assumption that the eigenvalue λK

is simple. Let ‖~b(k) − (±~qK)‖ = , and for sufficiently small , using Theorem

6.37 we can show that

|λ(k) − λK | = O(2).

Now consider taking one step of the inverse iteration, the error of eigenvector

estimates in adjacent steps can be written as

‖~b(k+1) − (±~qK)‖

‖~b(k) − (±~qK)‖

= O

(∣∣∣∣λK − λ(k)λL − λ(k)

∣∣∣∣) .

Since |λ(k) − λK | = O(2) and the right hand side of the above equation is on

the order of λK − λ(k), we have

‖~b(k+1) − (±~qK)‖ = O(‖~b(k) − (±~qK)‖2) = O(3).

This completes the proof of the first equation (convergence of the eigenvector

estimate is cubic). In the eigenvalue estimate at step k+ 1, since the Rayleigh

quotient is quadratically accurate, we have

|λ(k+1) − λK | = O(‖~b(k+1) − (±~qK)‖2) = O(6).

6.4. Symmetric Matrices and Rayleigh Quotient Iteration 123

Compare to the accuracy of the eigenvalue estimate at step k, which is O(2),

we can conclude that the second equation (convergence of the eigenvector es-

timate is cubic) also holds.

With a similar reasoning, we can show that the Rayleigh quotient iteration

converges quadratically on non-symmetric matrices.

6.4.4 Summary of Power, Inverse, and Rayleigh Quotient Iterations

The convergence of the power, inverse, and Rayleigh quotient iterations can be

summarised in the Table 6.1. We note that the Rayleigh quotient iteration may

Table 6.1: Let a =

∣∣∣λ2λ1 ∣∣∣ and b = ∣∣∣λK−µλL−µ ∣∣∣ as defined in the power iteration and

the inverse iteration.

Symmetric matrices Non-symmetric matrices

Eigenvector Eigenvalue Eigenvector Eigenvalue

Power Linear O(ak) Linear O(a2k) Linear O(ak) Linear O(ak)

Inverse Linear O(bk) Linear O(b2k) Linear O(bk) Linear O(bk)

Rayleigh Cubic Cubic Quadratic † Quadratic †

not always converge for non-symmetric matrices, the quadratic convergence can

be only obtained in limited cases.

In terms of operations counts, the power iteration requires O(n2) flops per

iteration for handling matrix-vector products. The inverse and Rayleigh quo-

tient iterations require solving a linear system for eigenvector estimation and

an additional matrix-vector-product for eigenvalue estimation. For a general

dense matrix, these two operations require O(n3) and O(n2) flops, respectively.

In general, if we can transform the input matrix into a reduced form, namely

a tridiagonal matrix (for the symmetric case) or a Hessenberg matrix (for the

general case), the order of the operations counts may be greatly reduced.

124 Chapter 6. Basic Algorithms for Eigenvalues

Chapter 7

QR Algorithm for

Eigenvalues

Many general purpose eigenvalue solvers are based on the Schur factorisation.

Recall that the Schur factorisation of a matrix A ∈ Rn×n takes the form

A = QTQ∗,

where T is upper-triangular and Q is unitary. The eigenvalues of T , and hence

the eigenvalues of A, are the entries on the main diagonal of T .

We aim to construct a sequence of elementary unitary transformationsQ∗kAQk,

so the product

Q∗k · · ·Q∗2Q∗1 AQ1Q2 · · ·Qk (7.1)

converges to an upper triangular matrix T as k →∞. Effectively, we construct

a unitary matrix Q in the form of

Q = Q1Q2 · · ·Qk,

in this process. For a real symmetric matrix A ∈ Rn×n, let each Qk ∈ Rn×n to

be an orthogonal (real) matrix, then Q∗k · · ·Q∗2Q∗1 AQ1Q2 · · ·Qk should also be

symmetric and real. Therefore, the same algorithm should produce an upper-

triangular and symmetric matrix T , which is diagonal.

7.1 Two Phases of Eigenvalue Computation

Definition 7.1

Hessenberg matrix is a nearly triangular square matrix. An upper Hessen-

berg matrix has zero entries below the first subdiagonal, and a lower Hessenberg

matrix has zero entries above the first superdiagonal, as shown below.

×××××

×××××

××××

×××

××

Upper Hessenberg

××

×××

××××

×××××

×××××

Lower Hessenberg

125

126 Chapter 7. QR Algorithm for Eigenvalues

The sequence (7.1) is usually split into two phases. In the first phase, a matrix

is transformed to an upper Hessenberg matrix by a direct method. In the second

phase, an iterative process (as described earlier on) is applied to transform the

Hessenberg matrix to an upper triangular matrix. The process looks like the

following:

×××××

×××××

×××××

×××××

×××××

A 6=A∗

Phase 1−−−−−→

Q∗0AQ0

×××××

×××××

××××

×××

××

Hessenberg

Phase 2−−−−−→

Q∗AQ

×××××

××××

×××

××

×

T

For a real symmetric matrix, Phase 1 will produce an upper Hessenberg and

symmetric matrix, which is tridiagonal. Phase 2 will produce a diagonal matrix

as previously discussed.

×××××

×××××

×××××

×××××

×××××

A=A∗

Phase 1−−−−−→

Q∗0AQ0

××

×××

×××

×××

××

Tridiagonal

Phase 2−−−−−→

Q∗AQ

×

×

×

×

×

T

Phase 1 uses a direct method that has the operation count comparable to QR

or LU factorisation. By transforming the matrix to an upper Hessenberg or

tridiagonal matrix, the operation count of matrix factorisations can be reduced

by utilising the structure of Hessenberg or tridiagonal matrix. This fact can be

used to greatly reduce the operation count of the iterative process in Phase 2.

7.2. Hessenberg Form and Tridiagonal Form 127

7.2 Hessenberg Form and Tridiagonal Form

To compute the Schur factorisation A = QTQ ∗, we would like to apply unitary

similarity transformation to A so that zeros below diagonal can be introduced.

×××××

×××××

×××××

×××××

×××××

A=QTQ ∗

Q ∗AQ−−−−→

×××××

××××

×××

××

×

T=Q ∗AQ

The first thought could be applying the Householder reflection to create such a

unitary Q that triangularise the matrix A, as in the QR factorisation case:

×××××

×××××

×××××

×××××

×××××

A=QR

Q ∗A−−−→

×××××

××××

×××

××

×

R=Q ∗A

.

However, this does not work in general.

Example 7.2

Consider the following symmetric matrix A,

A =

34 47 5 18 26

47 10 13 26 34

5 13 26 39 47

18 26 39 42 5

26 34 47 5 18

.

It has a QR factorisation A = QR, where

Q =

−0.51315 0.50931 0.68177 −0.097865 −0.053739

−0.70936 −0.69517 −0.032877 −0.097865 −0.053739

−0.075464 0.22987 −0.36671 −0.59133 −0.67625

−0.27167 0.29808 −0.42781 −0.39282 0.70711

−0.39241 0.34006 −0.46541 0.69056 −0.19208

,

and

R =

−66.2571 −52.5982 −42.7879 −43.9953 −49.4287

0 39.2866 27.0942 14.2779 8.0216

0 0 −45.1121 −23.1797 −11.1436

0 0 0 −40.4136 −23.1985

0 0 0 0 −34.9301

.

However, the resulting unitary similarity transformation defined by Q is

Q ∗AQ =

−14.9985 91.9334 68.0421 2.957 7.1403

36.6527 −30.0301 −14.7725 0.18308 9.0688

−27.8888 4.3505 37.7858 20.5252 7.1294

5.2017 5.2017 39.5858 −0.52871 −23.4519

1.8771 1.8771 23.6216 −24.6996 6.7092

,

128 Chapter 7. QR Algorithm for Eigenvalues

which clearly does not lead to a triangular matrix.

One step of the Householder reflection changes all the rows of A:

×××××

×××××

×××××

×××××

×××××

A

Q ∗1 A−−−→

×××××

0 ××××

0 ××××

0 ××××

0 ××××

Q ∗1 A

.

Now we multiply Q ∗1A with Q1 to complete the unitary transformation. Since

Q ∗1AQ1 = Q

∗

1 (Q

∗

1A)

∗,

we effectively apply the same Householder reflector to (Q ∗1A)

∗. This changes all

the rows of (Q ∗1A)

∗, or all the columns of Q ∗1A, so this may destroy the zeros

introduced previously.

× 0 0 0 0

×××××

×××××

×××××

×××××

(Q ∗1 A) ∗

Q ∗1 (Q

∗

1 A)

∗

−−−−−−−→

×××××

×××××

×××××

×××××

×××××

Q ∗1 (Q

∗

1 A)

∗

(·) ∗−−−→

×××××

×××××

×××××

×××××

×××××

Q ∗1 AQ1

Example 7.3

Consider the following symmetric matrix A, one step of householder reflection

(aiming at creating zeros below A(1,1))

A =

34 47 5 18 26

47 10 13 26 34

5 13 26 39 47

18 26 39 42 5

26 34 47 5 18

,

leads to a matrix

Q1A =

−66.2571 −52.5982 −42.7879 −43.9953 −49.4287

0 −36.6911 −9.4027 −3.0631 −1.3606

0 8.0329 23.6167 35.9082 43.2382

0 8.1183 30.4202 30.8695 −8.5423

0 8.1709 34.607 −11.0774 −1.5612

,

where

Q1 = I − 2u1u ∗1 , u1 =

0.86981

0.40776

0.043379

0.15617

0.22557

.

7.2. Hessenberg Form and Tridiagonal Form 129

Now, multiply Q1A with Q

∗

1 , we have

Q1AQ

∗

1 =

105.8884 28.1027 −34.2027 −13.0886 −4.7856

28.1027 −23.5167 −8.0012 1.9824 5.9274

−34.2027 −8.0012 21.911 29.7675 34.3683

−13.0886 1.9824 29.7675 28.5196 −11.9367

−4.7856 5.9274 34.3683 −11.9367 −2.8022

,

which no longer has those zeros introduced by Q1A.

7.2.1 Householder Reduction to Hessenberg Form

Instead of directly transforming a matrix A to a triangular form, we can trans-

form it to a Hessenberg form (Phase 1 of the eigenvalue solvers), and then find

other ways to obtain the Schur factorisation of the Hessenberg matrix.

This can be archived by applying a Householder reflector to the second row of

the matrix A at the start. Consider a square matrix A ∈ Rn×n can be partitioned

as the following:

A =

A11 ~a

>

1

~b1 A2

.

We want to first find a Householder reflector that transforms~b1 to−(~b1(1))‖~b1‖~e1,

which effectively create zeros below the first entry of ~b1. The Householder trans-

formation is defined by the unit vector

~u1 =

~v1

‖~v1‖ , where ~v1 =

~b1 + (~b1(1))‖~b1‖~e1,

that determines the reflection hyperplane. This way, we can create a unitary

matrix Q1 ∈ Rn×n that leaves the first row of A unchanged

Q1 =

1 · · · 0 · · ·

...

0 U1

...

. (7.2)

where U1 = I − 2~u1~u ∗1 is the Householder transformation matrix constructed

with respect to ~b1.

After multiplying Q ∗1 on the left of A, which has the form of

Q ∗1A =

A11 ~a

>

1

±‖~b1‖

0 U ∗1 A2

...

, (7.3)

130 Chapter 7. QR Algorithm for Eigenvalues

we multiply Q1 on the right of Q

∗

1A. This time, the matrix Q1 leaves the first

column of Q ∗1A unchanged, in which we have

Q ∗1AQ1 =

A11 ~a

>

1 U1

±‖~b1‖

0 U ∗1 A2U1

...

. (7.4)

Let A˜2 = U

∗

1 A2U1, we can have

~b2 = A˜2(:,1), and repeat the above process.

Here the unitary matrix Q2 should takes the form of

Q2 =

I2 · · · 0 · · ·

...

0 U2

...

. (7.5)

where U2 = I−2~u2~u ∗2 is the Householder transformation matrix defined by a unit

vector ~u2. The matrix Q2 leaves the first two rows and columns of Q

∗

1AQ1 by

multiplying with it on both sides. This process is called Householder reduction.

Example 7.4

Consider the following symmetric matrix A,

A =

34 47 5 18 26

47 10 13 26 34

5 13 26 39 47

18 26 39 42 5

26 34 47 5 18

,

one step of householder reduction (by using a transformation aiming at creating

zeros below A(2,1)) leads to a matrix

Q ∗1AQ1 =

34 −56.8683 0 0 0

−56.8683 63.585 −42.2045 −23.7277 −17.8221

0 −42.2045 20.561 26.5925 30.0411

0 −23.7277 26.5925 23.1555 −18.7528

0 −17.8221 30.0411 −18.7528 −11.3015

,

After two steps we have

Q ∗2Q

∗

1AQ1Q2 =

34 −56.8683 0 0 0

−56.8683 63.585 51.5932 0 0

0 51.5932 48.3358 −3.7237 4.6293

0 0 −3.7237 6.0401 −32.2763

0 0 4.6293 −32.2763 −21.961

.

7.2. Hessenberg Form and Tridiagonal Form 131

7.2.2 Implementation and Computational Cost

Remark 7.5

In this section, since we are dealing with real square matrices, each Householder

transformation matrix and the resulting Q ∗k · · ·Q ∗1AQ1 · · ·Qk are real. This

way, the conjugate transpose is equivalent to transpose here.

To set all the entries below the first subdiagonal of a matrix zero, aka the

Hessenberg form, the Householder reduction has to be applied n− 2 steps. The

algorithm is formulated below.

Algorithm 7.6: Householder Reduction to Hessenberg Form

Input: A matrix A ∈ Rn×n

Output: A Hessenberg matrix A ∈ Rn×n and a sequence of vectors ~uk, k =

1, . . . , n− 2 that defines the sequence of unitary similarity transformations.

1: for k = 1, . . . , n− 2 do

2: ~b = A(k+1:n, k)

3: ~v = ~b+ sign(~b(1))‖~b‖~e1

4: ~uk = ~v/‖~v‖

5: A(k+1, k) = −sign(~b(1))‖~b‖

6: A(k+2:n, k) = 0

7: A(k+1:n, k+1:n) = A(k+1:n, k+1:n)− (2~uk)

(

~u>k A(k+1:n, k+1:n)

)

8: A(1:n, k+1:n) = A(1:n, k+1:n)− (A(1:n, k+1:n) ~uk)

(

2~u>k

)

9: end for

Remark 7.7

As in the case of applying Householder reflection for computing QR factorisa-

tion, the sequence of matrices Qk, k = 1, . . . , n−2 are not formulated explicitly

and can be reconstructed from ~uk, k = 1, . . . , n− 2 if necessary.

At k-th iteration of the above algorithm, the work required in computing the

unit vector ~uk is proportional to n− k (Steps 2-4). Similarly, the work required

in applying Householder reflection to ~b is about n− k flops (Steps 5 and 6). The

dominating cost lies in the last two lines inside the for loop.

In Step 7, the operations A(k+1:n, k+1:n)−· · · and (2~uk) (· · · ) requires (n−

k)2 flops, whereas

(

~u>k A(k+1:n, k+1:n)

)

requires 2(n− k)2 flops (multiplication

and addition). Thus, the work of Step 7 is about 4(n− k)2. Step 8 needs more

work, as the operations · · · (2~u>k ) and A(1:n, k+1:n)−· · · requires n(n−k) flops,

whereas (A(1:n, k+1:n) ~uk) requires 2n(n− k) flops. Thus, the work of Step 8 is

about 4n(n− k).

This way, the total work of applying the Householder reduction to transform

132 Chapter 7. QR Algorithm for Eigenvalues

matrix to the Hessenberg Form is about:

W =

n−2∑

k=1

4n(n− k) + 4(n− k)2 +O(n− k)

= 4n

n−2∑

k=1

(n− k) + 4

n−2∑

k=1

(n− k)2 +O

(

n−2∑

k=1

(n− k)

)

= 2n3 +

4

3

n3 +O(n2)

=

10

3

n3 +O(n2). (7.6)

As expected, the dominant term in the expression for the computational work

is proportional to n3. We say that the computational complexity of the trans-

formation to the Hessenberg Form is cubic in the size of the square matrix,

n.

7.2.3 The Symmetric Case: Reduction to Tridiagonal Form

If the matrix is symmetric, the above algorithm produces a tridiagonal matrix.

Theorem 7.8

The Householder reduction of a symmetric matrix takes a symmetric tridiag-

onal form.

Proof. Since A is symmetric, Q>AQ is also symmetric. A symmetric Hessen-

berg matrix T has zero entries below the first subdiagonal (by the definition

of Hessenberg matrix) and zero entries above the first superdiagonal (by sym-

metry), and thus is tridiagonal.

By using the symmetry, the cost of applying the left and right Householder

reflections (Steps 5-8) can be further reduced. The resulting algorithm is formu-

lated below.

7.2. Hessenberg Form and Tridiagonal Form 133

Algorithm 7.9: Householder Reduction to Tridiagonal Form

Input: A matrix A ∈ Rn×n

Output: A Hessenberg matrix A ∈ Rn×n and a sequence of vectors ~uk, k =

1, . . . , n− 2 that defines the sequence of unitary similarity transformations.

1: for k = 1, . . . , n− 2 do

2: ~b = A(k+1:n, k)

3: ~v = ~b+ sign(~b(1))‖~b‖~e1

4: ~uk = ~v/‖~v‖

5: A(k+1, k) = −sign(~b(1))‖~b‖

6: A(k, k+1) = A(k+1, k)

7: A(k+2:n, k) = 0

8: A(k, k+2:n) = 0

9: ~t = A(k+1:n, k+1:n)~uk

10: σ = 2~u ∗k~t

11: ~p = 2(~t− σ~uk)

12: A(k+1:n, k+1:n) = A(k+1:n, k+1:n)− ~p~u ∗k − ~uk~p ∗

13: end for

At iteration k, the matrix Q ∗k−1 · · ·Q ∗1AQ1 · · ·Qk−1 is symmetric and is tridi-

agonal in the submatrix A(1:k-1, 1:k-1). This way, the left and right multi-

plication with Qk effectively creates zeros below A(k + 1, k) and to the right

of A(k, k + 1), and then multiplies Uk on the left and right on the submatrix

A(k+1:n, k+1:n). The key to reducing the computational cost is to reformulate

the following operation:

U ∗k A(k+1:n, k+1:n)Uk =A(k+1:n, k+1:n) + 4~uk (~u

∗

kA(k+1:n, k+1:n)~uk) ~u

∗

k−

2~uk (~u

∗

kA(k+1:n, k+1:n))− 2 (A(k+1:n, k+1:n)~uk) ~u ∗k

by introducing

~t = A(k+1:n, k+1:n)~uk (7.7)

σ = 2~u ∗k~t (7.8)

~p = 2(~t− σ~uk) (7.9)

This way, we can rewrite U ∗k A(k+1:n, k+1:n)Uk as a rank-2 update in the form

of

U ∗k A(k+1:n, k+1:n)Uk = A(k+1:n, k+1:n)− ~p~u ∗k − ~uk~p ∗.

Since we only need to store and operate with the half number of entries of a

symmetric matrix, the work of the above operation is about 2(n − k)2 flops,

together with the 2(n−k)2 flops required by computing ~t. The dominating work

in each iteration is about 4(n − k)2, which brings the total work estimate to

∼ 43n3.

Remark 7.10

Algorithm 7.9 is provided as background information for intereted readers.

The key message here is that the symmetry can reduce the total work load to

∼ 43n3 by 1) avoiding unnecessry zeros and 2) only operating with either lower

or upper triangular part of the matrix.

134 Chapter 7. QR Algorithm for Eigenvalues

7.2.4 QR Factorisation of Hessenberg Matrices

The Hessenberg and tridiagonal matrices provide substantial computational ad-

vantages in computing matrix factorisations such as LU and QR compared with

applying such factorisations to general square matrices. Here we give examples

on QR factorisation to demonstrate the computational reduction of using the

Hessenberg and tridiagonal matrices in terms of operation counts.

Recall that that QR factorisation transform a matrix A ∈ Rn×n into the

product of an orthogonal matrix Q ∈ Rn×n and an upper-triangular matrix

R ∈ Rn×n. The Householder reflection finds a sequence of Q1, Q2, . . . and hence

the Q = Q1Q2 . . . to achieve this.

Given a Hessenberg matrix H ∈ Rn×n, we can partition H as

H =

H11 H12 H13 × · · · ×

H21 H22 H23 × × ×

H32 × × × ×

× × × ×

× × ×

× ×

=

~h1 ~a

>

1

~01 H2

. (7.10)

where ~h1 = H(1:2, 1) ∈ R2, ~a ∗1 = H(1, 2:end) ∈ Rn−1, ~01 ∈ Rn−2 and H2 =

H(2:end, 2:end) ∈ R(n−1)(n−1). Note that H2 is also a Hessenberg matrix.

Applying the first step of the Householder reflection, we aim to find Q1 to

create zeros below the first row of H(:,1). We need to have

Q1H(:,1) = Q1

H11

H21

~01

=

±‖~h1‖

0

~01

.

Fortunately the first column of H is filled by zeros below the second row. Thus

we only need to apply a 2-dimensional Householder reflection to ~h1 ∈ R2. This

way we want to find a 2-by-2 orthogonal matrix U1 such that

U1~h1 =

[

H11

H21

]

=

[

±‖~h1‖

0

]

.

Using the procedure introduced in Householder reflection, we have

~t = ~h1 − U1~h1 = ~h1 + (~h1(1)) ‖~h1‖~e1, (7.11)

~s = ~t/‖~t‖, (7.12)

U1 = I2 − 2~s~s ∗. (7.13)

All the above operations are carried in a 2-dimensional space. Then the matrix

Q1 takes the form of

7.2. Hessenberg Form and Tridiagonal Form 135

This way, the first full Householder transformation can be written as

Q1H = Q1

H11 H12 H13 × · · · ×

H21 H22 H23 × · · · ×

H32 × × × ×

× × × ×

× × ×

× ×

=

r11 ~r

>

1

0

0

... H˜2

0

0

. (7.14)

Note that only the first two rows of the matrix H (marked in red) are modified

by Q1. The resulting H˜2 is also a Hessenberg matrix. In fact, H˜2 in (7.14)

and H2 in (7.10) only differ in the first row. Then we can repeatedly carry this

operation for n− 1 steps as in the QR factorisation.

At each step k, the dimension of the Hessenberg matrix to be transformed is

n−k+1, thus the amount of work required in applying Qk is ∼ 7(n−k+1)+O(1).

Overall, the work of applying n − 1 step of Householder transformation to a n-

by-n Hessenberg matrix requires ∼ 72n2 flops. We say that the computational

complexity of QR factorisation of a Hessenberg matrix is quadratic in the size

of the matrix.

If the matrix H is tridiagonal, the number of multiplication with the House-

holder matrix Qk required is 3 in each iteration, as shown below:

Q1H = Q1

× ×

× × ×

× × ×

× × ×

× × ×

× ×

=

r11 ~r

>

1

0

0

... H˜2

0

0

. (7.15)

Therefore, the work of applying n − 1 step of Householder transformation to a

n-by-n tridiagonal matrix is linearly proportional to the size of the matrix, n.

We say that the computational complexity of QR factorisation of a tridiagonal

matrix is linear in the size of the matrix.

136 Chapter 7. QR Algorithm for Eigenvalues

7.3 QR algorithm without shifts

The QR algorithm, which iterative carries the QR factorisation at its core, is

one of the most celebrated algorithms in scientific computing. Here we show its

simplest form and look into several fundamental aspects of this algorithm.

Algorithm 7.11: QR Algorithm Without Shifts

Input: Matrix A ∈ Rn×n.

Output: A unitary matrix Q(k) and a matrix A(k)

1: A(0) = A

2: Q(0) = I

3: for k = 1, 2, . . . do

4: U (k)R(k) = A(k−1) . Apply the QR factorisation to A(k−1)

5: A(k) = R(k)U (k) . Recombine factors in reverse order

6: Q(k) = Q(k−1)U (k)

7: end for

At the core, all we do is compute the QR factorisation, and then multi-

ply R and U in the reverse order RU , and repeat. Using the identity R(k) =

(U (k))∗A(k−1), it can be shown that this algorithm is applying a sequence of

unitary similarity transformation to in the input matrix A, in the form of

A(k) = (U (k))∗A(k−1)U (k)

= (U (k))∗(U (k−1))∗ · · · (U (1))∗︸ ︷︷ ︸

(Q(k))∗

A U (1) · · ·U (k−1)U (k)︸ ︷︷ ︸

Q(k)

. (7.16)

Under certain assumptions, this simple algorithm converges to the Schur factori-

sation. That is, A(k) will be upper triangular if A is arbitrary, diagonal if A is

symmetric.

7.3.1 Connection with Simultaneous Iteration

One way to understand the QR algorithm is to relate it to the power iteration.

Here we consider applying power iteration to several vectors simultaneously.

This is also often referred to as block power iteration. Now consider we have a

set of orthonormal initial vectors {~p1, . . . ~ps}, we apply the power iteration to

this set of vectors P (such that P (:,j) = ~pj) and normalise the new set of vectors

AP (k−1) in each iteration using the QR factorisation. This leads to the following

algorithm.

Algorithm 7.12: Simultaneous Iteration

Input: Matrix A ∈ Rn×n and a set of orthonormal initial vectors P (0).

Output: A matrix P (k)

1: P (0) = I

2: for k = 1, 2, . . . do

3: Z(k) = AP (k−1) . Apply the matrix A

4: P (k)T (k) = Z(k) . QR factorisation

5: end for

As a result of P (k) = AP (k−1)(T (k))−1, we have

AkP (0) = P (k) T (k)T (k−1) · · ·T (1)︸ ︷︷ ︸

T (k)

.

7.3. QR algorithm without shifts 137

Using the following property of triangular matrices, we can show that the matrix

T (k) = T (k)T (k−1) · · ·T (1) is upper triangular. Therefore the simultaneous iter-

ation effectively computes (in exact arithmetic) the QR factorisation of AkP (0).

Remark 7.13: Properties of Triangular Matrices

The product of two upper triangular matrices is upper triangular and the

inverse of an triangular matrices is upper triangular.

Theorem 7.14

Given an initial matrix P (0) = I to the simultaneous iteration, it is equivalent

to the QR algorithm without shifts.

Proof. This can be shown by induction. Throughout the proof, we assume

the upper triangular matrices of the QR factorisations used by both the QR

algorithm and the simultaneous iteration have positive diagonal entries. We

carry the QR algorithm without shifts and the simultaneous iteration for the

first step. This leads to

QR algorithm:

A(0) = A,

Q(0) = I,

U (1)R(1) = A(0) = A, (7.17)

Q(1) = Q(0)U (1) = U (1), (7.18)

A(1) = R(1)U (1) = (Q(1))∗AQ(1), (7.19)

Simultaneous iteration:

P (0) = I,

Z(1) = AP (0) = A, (7.20)

P (1)T (1) = Z(1) = A. (7.21)

After the first iteration, we can verify that Q(1) = P (1) and R(1) = T (1), and

thus, these two algorithms are equivalent after the first iteration.

In the second iteration, these two algorithms are carried forward as following:

QR algorithm:

U (2)R(2) = A(1) = (U (1))∗AU (1) (7.22)

Q(2) = Q(1)U (2) = U (1)U (2), (7.23)

A(2) = R(2)U (2) = (Q(2))∗AQ(2), (7.24)

Simultaneous iteration:

Z(2) = AP (1) = AU (1), (7.25)

P (2)T (2) = Z(2) = AU (1). (7.26)

Since A(1) = (Q(1))∗AQ(1), we have A = Q(1)A(1)(Q(1))∗, and hence Equation

(7.26) can be written as

P (2)T (2) = Q(1)A(1).

138 Chapter 7. QR Algorithm for Eigenvalues

Multiplying both sides of the above equation by (Q(1))∗ leads to(

(Q(1))∗P (2)

)

T (2) = A(1)

From the QR algorithm, we have that U (2)R(2) = A(1). This leads to

T (2) = R(2) and (Q(1))∗P (2) = U (2),

and hence

P (2) = Q(1)U (2) = Q(2).

Thus, these two algorithms are equivalent after two iterations.

Suppose P (k−1) = Q(k−1) and T (k−1) = R(k−1) hold, at k-th iteration, these

two algorithms satisfy following:

QR algorithm:

U (k)R(k) = A(k−1) = (Q(k−1))∗AQ(k−1) (7.27)

Q(k) = Q(k−1)U (k), (7.28)

A(k) = (Q(k))∗AQ(k), (7.29)

Simultaneous iteration:

Z(k) = AP (1) = AQ(k−1), (7.30)

P (k)T (k) = Z(k) = AQ(k−1). (7.31)

Since A(k−1) = (Q(k−1))∗AQ(k−1), we have A = Q(k−1)A(k−1)(Q(k−1))∗, and

hence Equation (7.31) can be written as

P (k)T (k) = Q(k−1)A(k−1).

Multiplying both sides of the above equation by (Q(k−1))∗ leads to(

(Q(k−1))∗P (k)

)

T (k) = A(k−1).

Thus we can show that T (k) = R(k) and (Q(k−1))∗P (k) = U (k). The latter

leads to P (k) = Q(k−1)U (k) = Q(k).

The above proof employs a property of the QR factorisation:

Theorem 7.15

For any nonsingular matrix A, there exists a unique pair of unitary matrix

Q and upper triangular matrix R with positive diagonal entries such that

A = QR.

Remark

The product of two upper triangular matrices with positive diagonal entries

is also an upper triangular matrix with positive diagonal entries. The inverse

of an upper triangular matrix with positive diagonal entries is also an upper

7.3. QR algorithm without shifts 139

triangular matrix with positive diagonal entries.

Remark 7.16

At this point, we are able to show that the sequence of unitary similarity

transformations in the QR algorithm

A(k) = (Q(k))∗AQ(k), (7.32)

Q(k) = U (1) · · ·U (k−1)U (k), (7.33)

can be defined by the QR factorisation of Ak in the form of

Ak = Q(k)T (k) (7.34)

T (k) = R(k)R(k−1) · · ·R(1). (7.35)

This relation is the key to understand the QR algorithm and to analyse its

convergence.

7.3.2 Convergence to Schur Form

Yet the remaining question is that why the sequence of transformations A(k) =

(Q(k))∗AQ(k) is able to construct a Schur form?

This is not very surprising, since the sequence Q(k) = U (1) · · ·U (k−1)U (k) is

orthogonal and converges, then Q(k+1) = Q(k)U (k) should be arbitrarily close to

Q(k) for sufficiently large k. This way we have U (k) = I for sufficiently large k.

Recall that in each iteration of the QR algorithm we have the QR factorisation

U (k)R(k) = A(k−1), we can see that A(k−1) is upper triangular if U (k) = I. We

formalise this intuition below.

Theorem 7.17

Let A ∈ Rn×n be a real matrix with distinct eigenvalues and all eigenvalues

are greater than zero,

λ1 > λ2 > · · · > λn > 0.

Suppose A has the eigendecomposition A = V ΛV −1 and the matrix V has the

QR factorisation V = QR where R is upper triangular with positive entries.

Then Q(k) converges to the QR factorisation of V as

‖Q(k)D −Q‖ = O(σk),

for some diagonal matrix D such that Dii = ±1, where σ < 1 is a constant

such that

σ = max

{∣∣∣∣λ2λ1

∣∣∣∣ , · · · , ∣∣∣∣ λnλn−1

∣∣∣∣} .

Proof. Given the eigendecomposition A = V ΛV −1, we have that Ak =

V ΛkV −1. After k steps of simultaneous iteration Ak has the QR factorisation

Ak = Q(k)T (k). Thus the following relation holds:

V ΛkV −1 = Q(k)T (k).

140 Chapter 7. QR Algorithm for Eigenvalues

Considering the QR factorisation of V −1 and substituting V −1 = LU into the

above equation leads to

V ΛkLU = Q(k)T (k),

and then by multiplying U−1Λ−k on both sides of the equation, we have

V ΛkLΛ−k = Q(k)T (k)U−1Λ−k. (7.36)

Without loss of generality, we can assume that the diagonal entries of the

matrix L takes value ±1, and diagonal entries of the matrix U are positive.

ΛkLΛ−k =

±1, i = j

0, i < j

Lij

(

λi

λj

)k

, i > j

,

Thus, the ΛkLΛ−k converges to a diagonal matrix D where Dii = Lii as(

λi

λj

)k

→ 0. Since eigenvalues are ordered, the ratio λiλj , i > j is bounded from

the above by the pair of eigenvalues with the largest ratio λi+1λi .

This convergence is on the order of O(σk), where σ is the largest ratio between

a pair of distinct eigenvalues | λiλj |, i > j.

Since the left hand side of Equation (7.36) converges to V D as k → ∞ and

D2 = I, it can be expressed as

V =

(

Q(k)D

)(

DT (k)U−1Λ−kD

)

, k →∞,

where T (k)U−1Λ−k is upper triangular with positive diagonal entries and

DT (k)U−1Λ−kD is also upper triangular with positive diagonal entries. Thus,

this determines a unique QR factorisation of V as k → ∞. Therefore Q(k)D

converges to the orthogonal matrix of the QR factorisation of eigenvectors V .

The assumptions that all eigenvalues of A must be positive can be removed

using the absolute value of eigenvalues instead of eigenvalues in construction

Λ−k. We also do not have to assume that eigenvalues are non-repeating, as

we can specify orthogonal eigenvectors (basis vectors of the eigenspace) for an

eigenvalue with geometric multiplicity larger than one.

7.3.3 The Role of Hessenberg Form

As we discussed early on, transforming a matrix to the Hessenberg form allows for

a significant reduction in computing the QR factorisation—O(n2) for a general

matrices and O(n) for symmetric matrices. It seems we can use this fact to

reduce the operation counts in each iteration of the QR algorithm given that

each of the Hessenberg form can be retained in each iteration. That is if A(0) is

a Hessenberg matrix, then each A(k) is a Hessenberg matrix. Given a Hessenberg

matrix H ∈ Rn×n and its QR factorisation H = QR, we want to verify that if

RQ retains the Hessenberg form.

7.3. QR algorithm without shifts 141

QR

Recall that we can partition the matrix H as

H =

H11 H12 H13 × · · · ×

H21 H22 H23 × × ×

H32 × × × ×

× × × ×

× × ×

× ×

=

~h1 ~a

>

1

~01 H2

.

where ~h1 = H(1:2, 1) ∈ R2, ~a ∗1 = H(1, 2:end) ∈ Rn−1, ~01 ∈ Rn−2 and H2 =

H(2:end, 2:end) ∈ R(n−1)(n−1). Note that H2 is also a Hessenberg matrix.

To create zeros below H(1,1). We need to find a Householder matrix Q1 such

that

Q1H(:, 1) = Q1

H11

H21

~01

=

±‖~h1‖

0

~01

.

Effectively we only need to apply a 2-dimensional Householder reflection matrix

U1 to ~h1 ∈ R2 such that

U1~h1 =

[

H11

H21

]

=

[

±‖~h1‖

0

]

.

Then the matrix Q1 takes the form of

Q1 =

U1 0

0 I

.

Only the top two rows of the matrix H will be modified by Q1H.

Every iteration of the QR factorisation picks the k-th column of the ma-

trix and aims to create zeros below the (k,k) entry of the matrix Hk−1 =

Qk−1 · · ·Q1H (the transformed matrix from the previous iteration) as shown

below.

Hk−1 =

k

× × × × × × ×

× × × × × ×

× × × × ×

k × × × ×

k + 1 × × × ×

× × ×

× ×

This can be archived by constructing a Householder matrix Uk w.r.t. the vector

Hk(k:k+1,k) (since all the entries of Hk below (k+1,k) are zero). This leads to

142 Chapter 7. QR Algorithm for Eigenvalues

a Householder matrix that can be applied to the original matrix,

Qk =

1 1

1

−→ I1

Uk ←−

[

U11 U12

U21 U22

]

I2 ←−

[

1

1

]

where I1 is a k − 1 dimensional identity matrix and I2 is a n − k − 1 dimen-

sional identity matrix. The following equation demonstrates the multiplication

of QkHk−1, where red marks are entries being modified in the process.

QkHk−1 =

1 1

1

−→ I1

Uk ←−

[

U11 U12

U21 U22

]

I2 ←−

[

1

1

]

k

× × × × × × ×

× × × × × ×

× × × × ×

× × × × k

× × × × k+1

× × ×

× ×

=

k

× × × × × × ×

× × × × × ×

× × × × ×

× × × × k

0 × × × k+1

× × ×

× ×

.

This way, we have the QR factorisation of the matrix H defined as

R = Qn−1 · · ·Q1︸ ︷︷ ︸

Q∗

H.

Note that we take the Qn out of the standard Householder reflection process, as

it only flips the value of bottom right entry of the matrix Hn−1.

RQ

Using this identity, we can express the matrix RQ as

RQ = RQ1Q2 · · ·Qn−1.

Note that we drop the (·)∗ here as each Householder reflection matrix Qj is

symmetric. Denote Rk = RQ1Q2 · · ·Qk and set R0 = R, in each multiplication

of RkQk+1, only two columns Rk(:,k:k+1) are modified by the matrix Uk+1.

This is summarised in the following equations.

7.3. QR algorithm without shifts 143

In the first step, entries below R1(2, 1:2) are zero as R is upper triangular.

The resulting matrix R1 has a Hessenberg form with the submatrix R1(2:n, 2:n)

is upper triangular, as shown in Equation (7.37).

R1 = R0Q1

=

1 2

1 × × × × × × ×

2 0 × × × × × ×

× × × × ×

× × × ×

× × ×

× ×

×

[

U11 U12

U21 U22

]

−→ U1

I ←−

1

1

1

1

1

=

1 2

1 × × × × × × ×

2 × × × × × × ×

× × × × ×

× × × ×

× × ×

× ×

×

(7.37)

If the matrix Rk−1 has a Hessenberg form, and the submatrix Rk−1(k:n,k:n)

is upper triangular, multiplying with Qk will produce a Hessenberg matrix

Rk with an upper triangular submatrix Rk(k+1:n, k+1:n)—only two columns

Rk−1(:,k:k+1) are modified by the matrix Uk in this step, and Rk(k+2:n,k:k+1)

are zero as Rk has a Hessenberg form.

Rk = Rk−1Qk

=

k k + 1

× × × × × × ×

× × × × × × ×

× × × × × ×

k × × × × ×

k + 1 0 × × ×

× ×

×

1 1

1

−→ I1

Uk ←−

[

U11 U12

U21 U22

]

I2 ←−

[

1

1

]

=

k k + 1

× × × × × × ×

× × × × × × ×

× × × × × ×

k × × × × ×

k + 1 × × × ×

× ×

×

(7.38)

Also from this process we can conclude that computing QR = H and combining

factors in the reverse order RQ have the same total work. In QR = H, the

number of flops is about O(n− k) in iteration k, and hence a total of O(n2) for

144 Chapter 7. QR Algorithm for Eigenvalues

computing QR = H. Therefore, each step of the QR algorithm requires O(n2)

operation counts.

7.4. Shifted QR algorithm 145

7.4 Shifted QR algorithm

The QR algorithm without shift is able to iteratively decompose a matrix to a

Schur factorisation. Using its equivalence with the simultaneous iteration, we can

show its convergence property—they are equally slow. Like the Rayleigh quotient

iteration, this algorithm can be modified to incorporate shifted inverse iteration

and eigenvalue estimates. This new algorithm is outlined as the following:

Algorithm 7.18: Shifted QR Algorithm

Input: Matrix A ∈ Rn×n.

Output: A unitary matrix Q(k) and a matrix A(k)

1: A(0) = (Q(0))∗AQ(0) . Transform A to Hessenberg form.

2: for k = 1, 2, . . . do

3: Pick a shift µ(k) . E.g., µ(k) = A(k−1)(n, n)

4: U (k)R(k) = A(k−1) − µ(k)I . QR factorisation to A(k−1) − µ(k)I

5: A(k) = R(k)U (k) + µ(k)I . Recombine factors in reverse order

6: if any off diagonal entry A(k)(j+1, j) is sufficiently close to 0 then

7: Set A(k)(j+1, j) = 0 and partition H(k) as

A(k) =

[

A1 A3

0 A2

]

,

and apply the same QR algorithm to A1 and A2 separately.

8: end if

9: Q(k) = Q(k−1)U (k)

10: end for

Here Line 3 is picks the shift value, Lines 4 and 5 perform one step of inverse

iteration, and Lines 6-8 perform an operation called deflation. These steps will

be explained in the rest of this section. To keep the concept simple, we assume

the matrix A ∈ Rn×n is symmetric (and tridiagonal) and invertible in the rest of

this section. We will also only focus on the eigenvalues. The material in section

is based on [Trefethen and Bau III, 1997].

7.4.1 Connection with Inverse Iteration

To understand this algorithm, we will first find its connection with the power

iteration applied to the inverse of the matrix A−1, or inverse iteration without

shift. Recall the results from the last section, we have

Ak = Q(k)T (k), where T (k) = R(k)R(k−1) · · ·R(1),

as the result of the QR algorithm without shift. Inverting the above equation

and taking the transpose, we have

(

A−k

)>

=

((

T (k)

)−1 (

Q(k)

)>)>

.

Using the fact that A is symmetric, this leads to

A−k = Q(k)

(

T (k)

)−>

, (7.39)

146 Chapter 7. QR Algorithm for Eigenvalues

where the term

(

T (k)

)−>

is lower triangular.

Consider a permutation matrix P that reverses the row or column order

P =

1

1

...

1

.

Remark 7.19

Multiplying P on the right of a matrix reverses the order of column order, and

multiplying P on the left of a matrix reverses the row order. This takes the

form of

A =

~a1 ~a2 · · · ~an−1 ~an

, AP =

~an ~an−1 · · · ~a2 ~a1

,

and

B =

~b>1

~b>2

...

~b>n−1

~b>n

, PB =

~b>n

~b>n−1

...

~b>2

~b>1

.

We also have that P 2 = I, so P is orthogonal.

Multiplying both sides of Equation (7.39) by the permutation matrix P on

the right, we have

A−kP =

(

Q(k)P

)(

P

(

T (k)

)−>

P

)

, (7.40)

The first factor Q(k)P is orthogonal, and the second factor P

(

T (k)

)−>

P is

upper triangular (by revering the column and row orders of a lower triangular

matrix). Thus, Equation (7.40) can be interpreted as the QR factorisation of

A−kP . The QR algorithm without shift effectively also carries the simultaneous

iteration on A−1 with an initial matrix P . This can be expressed as the following

A−k

1

1

1

...

1

=

~q(k)n ~q(k)n−1 · · · ~q(k)2 ~q(k)1

︸ ︷︷ ︸

Q(k)P

(

P

(

T (k)

)−>

P

)

︸ ︷︷ ︸

upper triangular

.

The last column of Q(k) is the result of applying the inverse iteration on ~en.

7.4. Shifted QR algorithm 147

7.4.2 Connection with Shifted Inverse Iteration

The significance of the inverse iteration is that it can be shifted to amplify

the difference between eigenvalues. Since the QR algorithm is both simul-

taneous iteration Ak = Q(k)T (k) and simultaneous inverse iteration A−kP =(

Q(k)P

)

(P (T (k))−>P ), we are able to incorporate shift into the QR algorithm

by simply carrying QR factorisation on the shifted matrix A− µI.

Let µ(k) denote the shift used in k-th step, one step of the shifted QR proceeds

as the following:

U (k)R(k) = A(k−1) − µ(k)I, (7.41)

A(k) = R(k)U (k) + µ(k)I. (7.42)

This implies

A(k) =

(

U (k)

)>

A(k−1)U (k),

and by induction

A(k) =

(

Q(k)

)>

AQ(k), (7.43)

Q(k) = U (1)U (2) · · ·U (k). (7.44)

Note that here each pair of Uk and R(k) is different from the QR algorithm

without shifts. Using a similar proof as in Theorem 7.14, we can show that the

shifted QR algorithm also has the following factorisation

(A− µ(k)I)(A− µ(k−1)I) · · · (A− µ(1)I) = Q(k)T (k), (7.45)

T (k) = R(k)R(k−1) · · ·R(1). (7.46)

Using the connection between the QR algortihm and simultaneous shifted

inverse iteration, we can show that

k∏

j=1

(

A− µ(j)I

)−1

1

1

1

...

1

=

~q(k)n ~q(k)n−1 · · · ~q(k)2 ~q(k)1

︸ ︷︷ ︸

Q(k)P

(

P

(

T (k)

)−>

P

)

︸ ︷︷ ︸

upper triangular

.

Q(k) is the orthogonalisation of

∏1

j=k(A−µ(j)I), while Q(k)P is the orthogonal-

isation of

∏k

j=1(A − µ(j)I)−1. That is, the last column of Q(k) is the result of

applying inverse iteration (using the shift µ(k) to µ(1)) to the vector ~en. Generally

speaking, the last column of Q(k) converges fast to an eigenvector.

7.4.3 Connection with Rayleigh Quotient Iteration

To complete the loop, we need to pick a shift value to archive fast convergence

in the last column of Q(k), a natural choice is to use the Rayleigh quotient

µ(k) =

(~q

(k)

n )∗A~q

(k)

n

(~q

(k)

n )∗~q

(k)

n

= (~q(k)n )

∗A~q(k)n ,

148 Chapter 7. QR Algorithm for Eigenvalues

as Q(k) is orthogonal. Furthermore we have ~q

(k)

n = Q(k)~en, since

A = Q(k)A(k−1)

(

Q(k)

)>

,

we have

(~q(k)n )

∗A~q(k)n = (~en)

>A(k−1)~en = A(n, n).

Thus the (n, n) entry of the matrix A(k−1) gives an eigenvalue estimate for the

last column of Q(k) without any additional work. This is usually referred to as

the Rayleigh quotient shift.

7.4.4 Wilkinson Shift

The Rayleigh quotient shift does not guarantee the convergence. It may stall for

certain types of matrices. For example, the following matrix

A =

[

0 1

1 0

]

.

Applying the QR without shifts to this matrix does not converge as

A = QR =

[

0 1

1 0

] [

1 0

0 1

]

,

and

RQ =

[

0 1

1 0

]

= A.

The Rayleigh quotient is A(2, 2) = 0, and hence it does not shift the matrix

neither. The problem is that the matrix A has two eigenvalues 1 and -1, the

eigenvalue estimate 0 is between two eigenvalues. It has an equal tendency

towards both eigenvalues.

One particular method that can break the symmetry is call Wilkinson shift.

Instead of using the lower-rightmost entry of A, it uses the lower-rightmost 2-by-

2 submatrix of A, denoted by B = A(n-1:n, n-1:n). Suppose B takes the form

of

B =

[

a1 b1

b2 a2

]

The Wilkinson shift is the eigenvalue of B that is closer to a2. If there is a tie,

it will pick one of the two eigenvalues arbitrarily. A numerical stable formula of

µ = a2 − (δ)b1b2|δ|+√δ2 + b1b2

, where δ =

a1 − a2

2

,

where (δ) is set arbitrarily to either 1 or -1 if δ = 0. The Wilkinson shift provides

the same convergence as the Rayleigh quotient, cubic for symmetric matrices and

quadratic for general matrices. Its convergence is guaranteed.

7.4.5 Deflation

In Lines 6-8 of Algorithm 7.18, if any off diagonal entry A(j+1, j) is sufficiently

close to 0 then we can set A(j+1, j) = 0 and partition the matrix as following

A(k) =

[

A1 A3

0 A2

]

.

7.4. Shifted QR algorithm 149

This technique is call deflation. It divides the problem to sub-problems and tackle

them individually. Here we briefly explain the concept on general matrices.

Since det(A(k)) = det(A1) det(A2), finding the eigenvalues (or computing its

Schur form) of A(k) simply becomes computing the Schur form of A1 and A2

separately. Suppose we have computed the Schur factorisation of A1 and A2 in

the form of

A1 = U1T1U

∗

1 , (7.47)

A2 = U2T2U

∗

2 , (7.48)

respectively. We can construct a n-by-n unitary matrix

U (k+1) =

[

U1

U2

]

,

so that

(U (k+1))∗A(k)U (k+1) =

[

U∗1

U∗2

] [

A1 A3

0 A2

] [

U1

U2

]

=

[

T1 A˜3

0 T2

]

,

where A˜3 = U

∗

1A3U2.

150 Chapter 7. QR Algorithm for Eigenvalues

Chapter 8

Singular Value

Decomposition

8.1 Singular Value Decomposition

The singular value decomposition of a matrix is often referred to as the SVD.

The SVD factorizes a matrix A ∈ Rm×n into the product of three matrices

A = UΣV >

where U and V are orthogonal, and Σ is diagonal. Here A can be any matrix,

e.g., non-symmetric or rectangular.

8.1.1 Understanding SVD

A matrix A ∈ Rm×n is a linear transformation taking a vector ~x ∈ Rn in its row

space (or preimage), row(A), to a vector ~y = A~x in its column space (or range),

col(A). The SVD is motivated by the following geometric fact: the image of the

unit sphere under any m-by-n matrix is a hyper-ellipse.

Remark 8.1

The hyper-ellipse is a generalisation of an ellipse. In the space Rm, a

hyper-ellipse can be viewed as the surface obtained by stretching a unit

sphere in Rm by some factors σ1, σ2, . . . , σm, along some orthogonal direc-

tions ~u1, ~u2, . . . , ~um. Here each of the ~ui, i = 1, . . . ,m is a unit vector. The

vectors {σi~ui} are the principle semiaxes of the hyper-ellipse.

Figure 8.1.1: Geometrical interpretation of a linear transformation.

151

152 Chapter 8. Singular Value Decomposition

Figure 8.1.1 shows a unit sphere and the hyper-ellipse that is the image of

the unit sphere transformed by a matrix A ∈ Rm×n. Assume that m > n and

the matrix A has a rank r ≤ min(m,n), three key components of the SVD can

be defined as:

• The singular values of the matrix A are the lengths of the principle semi-

axes, σ1, σ2, . . . , σr. We often assume that the singular values are non-

negative and ordered as σ1 ≥ σ2 ≥ . . . ≥ σr > 0.

• The left singular vectors of A are orthogonal unit vectors ~u1, ~u2, . . . , ~ur

that are in the column space of A and oriented in the direction of principle

semiaxes.

• We also have the right singular vectors ~v1, ~v2, . . . , ~vr that are orthogonal

unit vectors in the row space of the matrix A such that

A~vi = ~uiσi, i = 1, . . . , r. (8.1)

The relationship between the right singular vectors, left singular vectors, and

singular values can be understood as the following: the first right singular vector

is a unit vector ~v such that the 2-norm of the vector A~v is maximised. This way,

we have

~v1 = argmax

‖~v‖=1

‖A~v‖.

The corresponding first singular value is defined as σ1 = ‖A~v1‖ and the first

left singular vector is A~v1σ1 . Then, the second right singular vector is defined as

the next unit vector ~v that is orthogonal to ~v1 and maximises the 2-norm of the

vector A~v. We have

~v2 = argmax

~v>~v1=0,‖~v‖=1

‖A~v‖.

The corresponding second singular value is σ2 = ‖A~v2‖ and the second left

singular vector is A~v2σ2 . Repeating this process we can define all the singular

values and singular vectors.

In summary, transforming a right singular vector ~vi using the matrix A leads

to the left singular ~ui multiplied with σi. Thus, right singular vectors and left

singular vectors characterise the principle directions (in row space and column

space) of the linear transformation defined by A. Singular values characterise

the “stretching” effect of this linear transformation.

Remark 8.2

The right and left singular vectors also satisfy the following duality:

A~vi = ~uiσi, A

>~ui = ~viσi,

for i = 1, . . . , r.

8.1. Singular Value Decomposition 153

8.1.2 Full SVD and Reduced SVD

Assuming m ≥ n, the collection of the equations (8.1) for all i = 1, . . . , r can be

expressed as a matrix equation

A

~v1 ~v2 . . . ~vr

=

~u1 ~u2 . . . ~ur

σ1

σ2

. . .

σr

, (8.2)

or

AVˆ = Uˆ Σˆ, (8.3)

in a matrix form. In this matrix form, Vˆ ∈ Rn×r and Uˆ ∈ Rm×r are matrices

with orthonormal columns, and Σˆ ∈ Rr×r is a diagonal matrix.

Columns of the matrices Vˆ ∈ Rn×r and Uˆ ∈ Rm×r are orthonormal vectors,

however, they do not form complete bases of Rn and Rm unless m = n = r. By

adding m− r unit vectors that are orthogonal to columns of Uˆ and adding n− r

unit vectors that are orthogonal to columns of Vˆ , we can extend the matrix Uˆ

to an orthogonal matrix U ∈ Rm×m and the matrix Vˆ to an orthogonal matrix

V ∈ Rn×n.

If Uˆ and Vˆ are replaced by U and V in Equation (8.3), then Σˆ will have to

change too. We can add an (m − r)× r block of zeros under the matrix Σˆ and

an m× (n− r) block of zeros on the right of Σˆ to form a new matrix Σ. This is

demonstrated as the following:

This way, we have

AV = UΣ, (8.4)

where both U and V are orthogonal matrices. This is exactly the same as

Equation (8.3) as those additional columns in U are multiplied with zeros, and

those additional columns of V are in the null space of A. Multiply both sides of

Equation (8.4) by V > on the right, we obtain the full SVD.

154 Chapter 8. Singular Value Decomposition

Definition 8.3

For a matrix A ∈ Rm×n, where m > n, the full singular value decompo-

sition is defined by an orthogonal matrix U ∈ Rm×m, an orthogonal matrix

V ∈ Rn×n, and a diagonal matrix Σ ∈ Rm×n with non-negative diagonal en-

tries in the form of

A = UΣV >. (8.5)

Definition 8.4

By eliminating those columns in U and V that are multiplied with zeros in Σ in

the full SVD, we can also define the reduced singular value decomposition

as

A = Uˆ ΣˆVˆ > =

r∑

i=1

σi ~ui ~v

>

i . (8.6)

The full SVD is often useful in deriving properties of a matrix, whereas the

reduced SVD is often very valuable for computational tasks. The full SVD and

the reduced SVD can be summarised by the following figure:

Remark 8.5

Considering the full SVD factorization, A = UΣV >, the linear transformation

defined by A can be decomposed into several steps (as shown in Figure 8.1.2:

1. Given a unit sphere in the row space of the matrix A.

2. Multiplication with V >. This is a rotation, since V is an orthogonal

matrix.

3. Multiplication with Σ. The diagonal matrix Σ stretches the new unit

sphere along its canonical basis vectors (grey lines) with singular values

σ1, σ2, . . ..

4. Multiplication with U . This is another rotation, since U is also an or-

thogonal matrix.

Thus, SVD connects the four fundamental subspaces of a linear transformation:

1. ~v1, ~v2, . . . , ~vr: an orthonormal basis for the row space of A, row(A)

2. ~u1, ~u2, . . . , ~ur: an orthonormal basis for the column space of A, col(A)

3. ~vr+1, . . . , ~vn: an orthonormal basis for the null space of A, null(A)

4. ~ur+1, . . . , ~um: an orthonormal basis for the left null space of A, null(A

>)

8.1. Singular Value Decomposition 155

Figure 8.1.2: Geometrical interpretation of the SVD.

Remark 8.6: The m < n case

For a matrix A ∈ Rm×n, where m < n, both reduced SVD and full SVD can

also be defined—a quick way of doing so is to apply the above process to the

matrix A>.

8.1.3 Properties of SVD

It is important to know that SVD exists for any general matrix A ∈ Rm×n.

Theorem 8.7

Every matrix A ∈ Rm×n has a singular value decomposition.

Proof. This can be shown using induction, we omit the proof here.

As stated in Remark 8.5, SVD can characterise all four fundamental sub-

spaces of a matrix. Here we use the full SVD of a matrix to explore some

important properties of a matrix.

Theorem 8.8

The rank of a matrix A ∈ Rm×n is equal to the number of its nonzero singular

values.

Proof. Consider the full SVD of A = UΣV >. Suppose that there are r

nonzero singular values, and hence rank(Σ) = r as the rank of a diagonal

matrix is equal to the number of nonzero entries. Since U and V are full rank,

we have rank(A) = rank(Σ) = r.

156 Chapter 8. Singular Value Decomposition

Theorem 8.9

The Frobenius norm of a matrix A ∈ Rm×n is equal to the square root of the

sum of square of its nonzero singular values, i.e.,

‖A‖F =

√√√√ r∑

i=1

σ2i .

Proof. Consider the full SVD of A = UΣV >. Since the Frobenius norm

is preserved under multiplication with orthogonal matrices, we have ‖A‖F =

‖Σ‖F . Given that ‖Σ‖F =

√∑r

i=1 σ

2

i , we have ‖A‖F =

√∑r

i=1 σ

2

i .

Theorem 8.10

The 2-norm of a matrix A ∈ Rm×n is equal to the largest singular value of the

matrix A, i.e.,

‖A‖2 = σ1.

Proof. Consider the full SVD of A = UΣV >. Since the 2-norm is preserved

under multiplication with orthogonal matrices, we have ‖A‖2 = ‖Σ‖2. Given

that ‖Σ‖2 = σ1, we have ‖A‖2 = σ1.

8.1.4 Compare SVD to Eigendecomposition

The theme of diagonalising a matrix by expressing it in terms of a new basis is

not new—it has already been discussed in eigendecomposition. A nondefective

square matrix can be transformed to a diagonal matrix of eigenvalues using a

similarity transformation defined by its eigenvectors. For a general nondefective

square matrix A ∈ Rn×n, its eigendecomposition takes the form of

A = WΛW−1,

where W is the matrix of n distinct eigenvectors and Λ is the diagonal matrix

consists of the eigenvalues of A.

SVD is fundamental different from the eigendecomposition in several aspects:

1. The SVD uses two bases U and V , whereas the eigendecomposition only

uses one.

2. The matrix W in the eigendecomposition may not be orthogonal, but the

matrices U and V in the SVD are always orthogonal.

3. The SVD does not require that the matrix A is a square matrix, as it works

for any matrices.

In applications, the eigendecomposition is usually more relevant to matrix func-

tions, e.g., Ak and exp(tA). The SVD is usually more relevant to the matrix

itself and its inverse.

8.1. Singular Value Decomposition 157

Real and symmetric matrices have a special eigendecomposition. We know

that (by Theorem 6.24) if A ∈ Rn×n is symmetric and real-valued, it has orthog-

onal eigenvectors and the eigendecomposition

A = QΛQ>

where Q is the matrix of n distinct eigenvectors and Λ is the diagonal matrix

consists of the eigenvalues of A. In this case, the singular values of A is just the

absolute value of the eigenvalues of A. Using the eigendecomposition of A, we

can express the SVD as

A = Q|Λ|sign(Λ)Q> = Q|Λ| (Q sign(Λ))> .

The left singular vectors are the same as eigenvectors and the right singular

vectors are eigenvectors flipped by the sign of the eigenvalues—if an eigenvalue

is negative, we set the singular value to be the absolute value of the eigenvalue,

and multiply the corresponding eigenvector(s) by -1 to obtain the right singular

vectors.

158 Chapter 8. Singular Value Decomposition

8.2 Computing SVD

8.2.1 Connection with Eigenvalue Solvers

Computing the orthonormal bases {ui}ri=1 and {vi}ri=1 for the column space and

the row space of a matrix A ∈ Rm×n is easy, e.g., Gram-Schmidt process can

be used for this purpose. However, in general, there is no reason to expect the

matrix A to transform an arbitrary choice of bases {vi}ri=1 to another orthogonal

bases. For a general rank-r matrix A with m rows and n columns, the SVD aims

at finding a set of orthonormal bases {vi}ri=1 for the row space of A that gets

transformed into a set of orthonormal bases {ui}ri=1 for the column space of A,

stretched by some {σi}ri=1, i.e.,

A~vi = ~uiσi, σi > 0, i = 1, . . . , r.

The key step towards finding the orthonormal matrices U and V is to use

the full SVD

A = UΣV >.

Rather than solving for U , V and Σ simultaneously, we can take the following

steps to obtain the SVD of a matrix A (assuming m > n):

1. Multiplying both sides by A> = V ΣU> on the left to get

A>A = V ΣU>UΣV >

= V Σ2V >

=

[

~v1 ~u2 . . . ~vn

]

σ21

σ22

. . .

σ2n

[~v1 ~v2 . . . ~vn]> .

This problem can be solved by the eigendecomposition of the symmetric,

n×n matrix A>A, where {~vi}ni=1 are the eigenvectors and {σ2i }ni=1 are the

eigenvalues.

2. Compute the eigendecomposition of A>A = V ΛV >. Then set V be the

right singular vectors and Σ =

√

Λ be the singular values.

3. We can solve the linear system UΣ = AV to obtain the left singular vectors

U . In the absence of numerical error, this is equivalent to solving the

eigendecomposition of AA> = UΣ2U>.

Note that we have at most min{m,n} nonzero eigenvalues.

Remark 8.11

The above method is widely used in many areas for computing SVD of a matrix,

for example, in the principle component analysis. However, a major shortfall

of this method is that it is not numerically stable for computing singular values

σi ‖A‖.

Suppose we have an input matrix A. The floating point representation of A

has an error on the order of machine‖A‖. A numerical stable algorithm requries

8.2. Computing SVD 159

that the error of an estimated singular value σ˜i is on the order of machine‖A‖.

That is,

|σ˜i − σi| = O (machine ‖A‖) .

Consider the above process, the error in estimating the eigenvalues of A>A

(singular values squared) using a numerically stable eigenvalue solver is about

|σ˜2i − σ2i | = O

(

machine ‖A>A‖

)

.

The error of computing the square root to find σ˜i is on the order of

|σ˜2i−σ2i |

σi

.

Thus, the error of an estimated singular value σ˜i using the above process is

|σ˜i − σi| = O

(

machine ‖A>A‖

σi

)

= O

(

machine ‖A‖2

σi

)

.

An intuitive way to understand this is the following: the product A>A am-

plifies the numerical error quadratically in the eigenvalue estimation step, and

then the absolute error in computing a singular value (by solving the square root

of an eigenvalue) is on the order of the error of estimated eigenvalue divided by

σi. This way, the above method is usually fine for computing dominate singular

values, i.e., σi 0. However, for computing those singular values σi ‖A‖, the

resulting singular value estimate will be dominated by the error.

8.2.2 A Different Connection with Eigenvalue Solvers

An alternative way to computing the SVD of A ∈ Rm×n using eigendecomposi-

tion is to consider the following (n+m)-by-(n+m) matrix

S =

[

0 A>

A 0

]

.

The eigenvector and eigenvalue of the matrix S can be expressed as[

0 A>

A 0

] [

~v

~u

]

= λ

[

~v

~u

]

.

where ~v ∈ Rn and ~u ∈ Rm. This equation leads to{

A>~u = λ~v

A~v = λ~u

,

which implies A>A~v = λ2~v and AA>~u = λ2~u. Thus, if the matrix S has an

eigenvalue λ ≥ 0, then the corresponding eigenvector

[

~v

~u

]

defines a pair of right

and left singular vectors given that both ~v and ~u are unit vectors. The eigenvalue

λ ≥ 0 defines the corresponding singular value.

We note that if λ is an eigenvalue of S, then −λ is also an eigenvalue associ-

ated with an eigenvector

[

~v

−~u

]

. This can be easily verified by

[

0 A>

A 0

] [

~v

−~u

]

= −λ

[

~v

−~u

]

.

160 Chapter 8. Singular Value Decomposition

Thus, both the singular values of a matrix A and its negatives are eigenvalues of

S.

Now we can express the eigendecomposition of the matrix S by the SVD of

A = UΣV >, and vice versa. We consider that the matrix A is a square matrix

(i.e., m = n)—in fact, computing the SVD of a general matrix with m 6= n can

be effectively reduced to computing the SVD of a square matrix, this will be

shown in later part of this section. This way, we have[

0 A>

A 0

] [

V V

U −U

]

=

[

V V

U −U

] [

Σ 0

0 −Σ

]

.

Since singular vectors are unit vectors, we can normalise an eigenvector

[

~v

~u

]

or[

~v

−~u

]

by scaling it by a factor of 1/

√

2. Thus, using an orthogonal matrix

Q =

1√

2

[

V V

U −U

]

,

we can express the eigendecomposition of S in the form of

S = Q

[

Σ 0

0 −Σ

]

Q>.

Therefore, the SVD can be obtained by computing the eigendecomposition of

the matrix S. In contrast to the method using the eigendecomposition A>A,

the new method is numerically stable as it does not involve the square root of

eigenvalues.

Remark 8.12

In practice, the matrix S is never formed explicitly. Factorisations of S, such

as the QR factorisation and the eigendecomposition, can be obtained by using

the matrix A and the symmetry.

8.2.3 Bidiagonalisation

As in the eigenvalue solvers, algorithms for computing SVD are also often have

two phases: In the phase 1, the matrix A is reduced to a bidiagonal form, in

order to save floating point operations in computing the eigendecomposition of

S or A>A. In the phase 2, eigenvalue solvers such as the shifted QR algorithm

can be used to diagonalise S or A>A, and hence A, to find singular values. This

process is shown as the following:

××××

××××

××××

××××

××××

××××

××××

A

Phase 1−−−−−→

U>0 AV0

××

××

××

×

Bidiagonal B

Phase 2−−−−−→

U>BV

×

×

×

×

Diagonal Σ

We will focus on the phase 1 of this process and omit details of the phase 2.

8.2. Computing SVD 161

Remark 8.13

Suppose the matrix A ∈ Rm×n and m > n. In the bidiagonalisation step, both

U0 ∈ Rm×m and V0 ∈ Rn×n are orthogonal matrices, and the last m− n rows

of B have zero values, which can be shown as the following:

Consider the nonzero block of the matrix B, denoted by Bˆ, and its SVD,

Bˆ = UBΣˆV

>

B , where UB ,Σ, VB ∈ Rn×n. Constructing the orthogonal matrix

QB =

[

UB 0

0 I

]

,

and the zero padded matrix

Σ =

[

Σˆ

0

]

,

we can define the SVD of the matrix B as

B = QBΣV

>

B .

This is demonstrated as the following:

Since B = U>0 AV0, we have A = U0BV

>

0 , and thus

A = U0QBΣV

>

B V

>

0 = (U0QB)︸ ︷︷ ︸

U

Σ (V0VB)

>︸ ︷︷ ︸

V >

.

This way, computing the SVD of the original matrix A can be effectively

reduced to computing the SVD of a n-by-n matrix Bˆ.

Golub-Kahan Bidiagonalisation

The goal of bidiagonalisation is to multiply the matrix A by a sequence of uni-

tary/orthogonal matrices on the left, and another sequence of unitary/orthogo-

nal matrices on the right to obtain a bidiagonal matrix that has zeros below its

diagonal and zeros above its first superdiagonal.

162 Chapter 8. Singular Value Decomposition

This process is significantly different from the reduction of a matrix to the

tridiagonal form. In the reduction to the tridiagonal form, the input matrix

should be square and the same sequence of unitary/orthogonal matrices are ap-

plied on both sides of the matrix. In the bidiagonalisation, the input matrix does

not need to be a square matrix, and two different sequences of unitary/orthogo-

nal matrices are applied on the left and on the right of the matrix—the numbers

of matrices applied in the two sequences are not necessarily the same.

The simplest method for accomplishing this is the Golub-Kahan bidiagonali-

sation. It applies Householder reflection alternately on the left and on the right

of a matrix. The left Householder reflection aims to introduce zeros below the

diagonal, whereas the right Householder reflection aims to introduce zeros to the

right of the first superdiagonal. This way, zeros introduced by the left House-

holder reflection will not be modified by the right Householder reflection, and

previously introduced zeros will not be modified by later Householder reflections.

This process can be demonstrated by the following example.

Example 8.14

Consider a matrix A ∈ R7×4, applying Householder reflection alternately on

the left and on the right of A produces a bidiagonal form. This Golub-Kahan

bidiagonalisation can be shown as:

××××

××××

××××

××××

××××

××××

××××

A

U>1 (·)−−−−→

××××

×××

×××

×××

×××

×××

×××

U>1 A

(·)V1−−−→

××

×××

×××

×××

×××

×××

×××

U>1 AV1

U>2 (·)−−−−→

××

×××

××

××

××

××

××

U>2 U

>

1 AV1

(·)V2−−−→

××

××

××

××

××

××

××

U>2 U

>

1 AV1V2

U>3 (·)−−−−→

××

××

××

×

×

×

×

U>3 U

>

2 U

>

1 AV1V2

U>4 (·)−−−−→

××

××

××

×

U>4 U

>

3 U

>

2 U

>

1 AV1V2

.

The four left multiplications introduce zeros below diagonal, and the two right

multiplications introduce zeros above the first superdiagonal.

For a matrix A ∈ Rm×n, n Householder reflections have to be applied on the

left and n− 2 Householder reflections have to be applied on the right. The total

work of the Golub-Kahan bidiagonalisation is about doubling the work of the QR

factorisation—the left Householder reflections have the same work as computing

the QR factorisation of A, and the right Householder reflections have the same

work as computing the QR factorisation of A> except the first row. Thus the

total work of the Golub-Kahan bidiagonalisation is ∼ 4mn2 − 43n3 flops.

8.2. Computing SVD 163

Lawson-Hanson-Chan Bidiagonalisation

For the case where m n, the total work of the Golub-Kahan bidiagonalisa-

tion is unnecessarily high. If we know the matrix in a bidiagonal form has zeros

below its n-th row, then the right Householder reflections in the bidiagonalisa-

tion process should try to avoid modify those entries. This can be accomplished

by first applying a QR factorisation to the input matrix, and then apply the

Golub-Kahan bidiagonalisation to the upper triangular matrix to reduce it to

the bidiagonal form. This procedure is called the Lawson-Hanson-Chan (LHC)

bidiagonalisation. It can be demonstrated as the following:

In the LHC bidiagonalisation, the work of the QR step is ∼ 2mn2 − 23n3

flops, and the work of the subsequent bidiagonalisation of the upper triangular

matrix is about ∼ 4n3 − 43n3 = 83n3 flops. Thus, the total work of the LHC

bidiagonalisation is ∼ 2mn2 + 2n3 flops. This requires less operation counts

than the Golub-Kahan bidiagonalisation if m > 53n.

From Bidiagonal Form of A to Tridiagonal Form of A>A and S

We have seen that in the phase 1 of an eigenvalue solver, a symmetric matrix

can be reduced to a tridiagonal matrix. In computing SVD, reducing a matrix

to a bidiagonal form is an analogy of the phase 1 of eigenvalue solvers. In fact,

reducing a matrix A to a bidiagonal form is equivalent to reducing the matrices

S and A>A to a tridiagonal form.

As shown in Remark 8.13, computing SVD of a general matrix A ∈ Rm×n

with m > n can be effectively reduced to computing SVD of a square bidiagonal

matrix B ∈ Rn×n.

This way, computing SVD using the eigendecomposition of A>A is reduced

to finding the eigendecomposition of B>B. It is easy to verify that the matrix

B>B is a symmetric tridiagonal matrix.

For computing SVD using the eigendecomposition of the matrix S, we effec-

tively solving the eigendecomposition of the matrix

SB =

[

0 B>

B 0

]

.

This matrix SB has a tridiagonal form by swapping rows and columns using

an orthogonal similarity transformation defined by some permutation matrix.

Modified shifted QR algorithms (which can adapt to the structure of SB) are

developed to solve the eigendecomposition of SB . We leave it at this.

164 Chapter 8. Singular Value Decomposition

8.3 Low Rank Matrix Approximation using SVD

Recall the reduced singular value decomposition of a rank-k matrix A ∈ Rm×n,

A = Uˆ ΣˆVˆ > =

r∑

i=1

σi ~ui ~v

>

i .

This decomposition into a summation of rank-one matrices, σi ~ui ~v

>

i , has a cele-

brated property: the k-th partial sum captures the energy of the matrix A as much

as possible. Here the “energy” is define by either the 2-norm or the Frobenius

norm.

Definition 8.15

Given the SVD of the matrix A ∈ Rm×n, the truncated singular value

decomposition is defined by only retain the first k singular values, and first

k left and right singular vectors. Let A = UΣV >, and then the truncated SVD

takes the form of

A ≈ Ak := Uk︸︷︷︸

U(:,1:k)

Σk︸︷︷︸

Σ(1:k,1:k)

V >k︸︷︷︸

V (:,1:k)>

=

k∑

i=1

σi ~ui ~v

>

i , (8.7)

for k < r. The matrix Ak =

∑k

i=1 σi ~ui ~v

>

i is a rank-k approximation to A.

Theorem 8.16

Given a matrix A and its SVD, the rank-k approximation Ak where k < r

defined by the truncated SVD provides the best approximation to A in either

the 2-norm or the Frobenius norm. That is,

‖A−Ak‖2 ≤ ‖A−B‖2, for all B ∈ Rm×n with rank k.

and

‖A−Ak‖F ≤ ‖A−B‖F , for all B ∈ Rm×n with rank k.

Proof. This can be shown by contradiction. We omit the proof here.

Example 8.17

A natural application of this theorem is that we can compress a data set or a

picture using the truncated SVD. A matrix A ∈ Rm×n requires mn floating-

point numbers of memory to store, whereas its truncated SVD only requires

mk+nk+k = (m+n+1)k floating-point numbers. Following Theorem 8.9 and

8.10, the compression error, in terms of the Frobenius norm and the 2-norm

can be given by the residual singular values after the truncation, i.e.,

‖A−Ak‖F =

√√√√ r∑

i=k+1

σ2i ,

‖A−Ak‖2 = σk+1.

8.3. Low Rank Matrix Approximation using SVD 165

Considering the following grey scale picture (on the left) that consists of 900×

703 pixels,

we can treat it as a matrix, and hence the truncated SVD can be applied to

compress this image. The picture on the right shows the compressed image

created by the truncated SVD with k = 30.

166 Chapter 8. Singular Value Decomposition

8.4 Pseudo Inverse and Least Square Problems using SVD

Recall the linear least-squares (LS) problem:

Definition 8.18: Least-Squares Problem

Let A ∈ Rm×n with m > n. Find ~x that minimizes f(~x) = ‖~b−A~x‖22.

Example 8.19: Polynomial Least Square Fitting

Suppose we have m distinct points, s1, s2, . . . , sm ∈ R and data b1, b2, . . . , bm ∈

R observed at these points. We aim to find a polynomial of degree n− 1

p(x) = x1 + x2s · · ·+ xnsn−1 =

n∑

i=1

xis

i−1,

defined by coefficients {xi}ni=1, that best fits the data in the least square sense.

The relationship of the data {si}mi=1, {bi}mi=1 to the coefficients {xi}ni=1 can be

expressed by the Vandermonde system as:

1 s1 s

2

1 s

n−1

1

1 s2 s

2

2 s

n−1

2

1 s3 s

2

3 s

n−1

3

...

...

1 sm−1 s2m−1 s

n−1

m−1

1 sm s

2

m s

n−1

m

︸ ︷︷ ︸

A

x1

x2

x3

...

xn

︸ ︷︷ ︸

~x

=

b1

b2

b3

...

bm−1

bm

︸ ︷︷ ︸

~b

To determine the coefficients {xi}ni=1 from data, we can solve a least

square system A~x = ~b. The following figure presents an exam-

ple of this process. We have 51 data points which is the func-

tion sin(10s) observed at discrete points 0, 0.02, 0.04, . . . , 1, represented by

crosses. We construct a polynomial of degree 11 to fit this data set.

8.4. Pseudo Inverse and Least Square Problems using SVD 167

Pseudoinverse

One way to solve the least square problem is solving the normal equation

ATA~x = AT~b. (8.8)

This leads to the definition of pseudoinverse of a matrix.

Definition 8.20

For a full rank matrix A ∈ Rm×n, the matrix (A>A)−1A> is called pseudoin-

verse of A, denoted by A+,

A+ = (A>A)−1A> ∈ Rn×m.

Using the pseudoinverse, the solution of the normal equation can be expressed

as

~x = A+~b.

Defining the projector P = AA+ which is an orthogonal projector onto range(A),

the solution ~x minimising the least square problem satisfies that

A~x = P~b,

where the right hand side is the data projected onto the range of A.

Theorem 8.21

Given the pseudoinverse of matrix A, denoted by A+, the matrix P = AA+ is

an orthogonal projector onto range(A).

QR

Solving Equation (8.8) is computationally fast but can be numerically unstable.

The practical method for solving the least square problem uses the reduced QR

factorisation A = QˆRˆ. This way, the projection onto the range of A is defined

by P = QˆQˆ>. Then the equation A~x = P~b can be expressed as

QˆRˆ~x = QˆQˆ>~b,

and left-multiplication by Qˆ> leads to

Rˆ~x = Qˆ>~b. (8.9)

Remark 8.22

Multiplying by Rˆ−1 leads to an alternative definition of pseudoinverse in the

form of

A+ = Rˆ−1Qˆ>. (8.10)

SVD

Alternatively, SVD provides a geometrically intuitive way to understand and

solve the least square problem. This is particularly useful for rank-deficient

168 Chapter 8. Singular Value Decomposition

systems and the case m < n (e.g., the X-ray imaging). Suppose the matrix

A ∈ Rm×n has a rank-r reduced SVD

A = Uˆ ΣˆVˆ > =

r∑

i=1

σi ~ui ~v

>

i .

Recall that the columns of Vˆ span the row space of A, the columns of Uˆ spans

the column space (range) of A, and Σˆ representing the stretching effect of the

linear transformation.

The left singular vectors define an orthogonal projector P = Uˆ Uˆ>. The data

~b can be projected onto the range of A, spanned by the columns of Uˆ . The

projected data, P~b, can be expressed as a linear combination of the columns of

Uˆ—the associated coefficients is defined by the vector Uˆ>~b ∈ Rr.

Then the equation A~x = P~b can be expressed as

Uˆ ΣˆVˆ >~x = Uˆ Uˆ>~b,

and left-multiplication by Uˆ> leads to

ΣˆVˆ >~x = Uˆ>~b. (8.11)

Solving this equation we obtain the least square solution

~x = Vˆ Σˆ−1Uˆ>~b. (8.12)

This way, we know that the least solution ~x is a linear combination of the columns

of Vˆ—the associated coefficients is defined by the vector Vˆ >~x ∈ Rr—and hence

it is in the row space of A.

The least square system can be understood as the following: projecting the

data to the range of the matrix A (defining ~q = Uˆ>~b ∈ Rr), we seek a solution

~x to the least square problem in the row space of A. Expressing the solution ~x

as a linear combination of the columns of Vˆ ,

~x = Vˆ ~p, where ~p = Vˆ >~x ∈ Rr,

the least square problems reduce to a r-dimensional linear system

Σˆ~p = ~q.

Recall the geometric interpretation of SVD (Figure 8.1.2), solving the least

square problem effectively inverts the stretching effect of a linear transform

within the rank-r row space and column space of A. This will be the key to

understanding X-ray imaging in the next section.

Remark 8.23

SVD also defines the pseudoinverse of A in the form of

A+ = Vˆ Σˆ−1Uˆ>. (8.13)

8.4. Pseudo Inverse and Least Square Problems using SVD 169

Algorithm 8.24: Least Squares via SVD

Given a matrix A ∈ Rm×n and the data ~b ∈ Rm, the solution ~x of the least

square problem f(~x) = ‖~b−A~x‖22 can be obtained as following:

1. Compute the reduced SVD, A = Uˆ ΣˆVˆ >.

2. Compute the vector ~q = Uˆ>~b ∈ Rr.

3. Solving the linear system Σˆ~p = ~q.

4. Set ~x = Vˆ ~p.

170 Chapter 8. Singular Value Decomposition

8.5 X-Ray Imaging using SVD

In this section, we use an industrial process imaging problem as the example

to demonstrate the X-ray imaging. The setup of the problem is demonstrated

in Figure 8.5.1. The true object consists of three circular inclusions, each of

uniform density, inside an annulus. Ten X-ray sources are positioned on one

side of a circle, and each source sends a fan of 100 X-rays that are measured by

detectors on the opposite side of the object. Here, the 10 sources are distributed

evenly so that they form a total illumination angle of 90 degrees, resulting in a

limited-angle X-ray problem. The goal is to reconstruct the density of the object

(as an image) from measured X-ray signals.

Figure 8.5.1: Left: discretised domain, true object, sources (red dots), and de-

tectors corresponding to one source (black dots). The fan transmitted by one

source is illustrated in gray. The density of the object is 0.006 in the outer ring

and 0.004 in the three inclusions; the background density is zero. Right: the

noise free measurements (black line) and the noisy measurements (red dots) for

one source.

8.5.1 Mathematical Model

When an X-ray travels through a physical object along a straight line l(s), where

s is the spatial coordinate, interaction between radiation and matter lowers the

intensity of the ray. Suppose that an X-ray has initial intensity I0 at the radiation

source. The intensity measured at the detector I1 is smaller than I0, as the

intensity of the X-ray decreases proportionally to the relative intensity loss of

the matter along the line l. We can representing the relative intensity loss of the

matter by an attenuation coefficient function f(s), whose value gives the relative

intensity loss of the X-ray within a small distance ds,

dI

I

= −f(s)ds.

8.5. X-Ray Imaging using SVD 171

Density of material is often correlated with the relative intensity loss. Material

with a higher density (e.g., medal) often has higher attenuation coefficient than

material with a lower density (e.g., wood). Thus, recovering the unknown at-

tenuation coefficient function f(s) from X-ray signals is used as a surrogate for

reconstructing the actual material density.

Integration from the initial state to the final state along a line l(s) gives∫

l(s)

I ′(s)

I(s)

= −

∫

l(s)

f(s)ds,

where the left hand side gives log(I1)− log(I0) = log( I1I0 ). Thus we have

log(I0)− log(I1) =

∫

l(s)

f(s)ds.

Now the left hand side of the above equation is known from measurements (I0 by

the equipment setup and I1 from detector), whereas the right hand side consists

of integrals of the unknown function f(x) over straight lines.

8.5.2 Computational Model

Figure 8.5.2: Left: discretised object and an X-ray travelling through it. Right:

four pixels from the left side picture and the distances (in these pixels) travelled

by the X-ray corresponding to the measurement d7. Distance ai,j corresponds

to the element on the i-th row and j-th column of matrix F .

Computationally we can represent the continuous function f(s) by n pix-

els (or voxels in 3D), as shown in Figure 8.5.2. Now each component of ~x =

[x1, x2, . . . , xn]

> represents the value of the unknown attenuation coefficient func-

tion f(s) in the corresponding pixel. Assuming we have a measurement di of the

line integral of f(s) over line li(s), we can approximate

di =

∫

li(s)

f(s)ds =

n∑

j=1

ai,jxj ,

where ai,j is the distance that the line li(s) “travels” in the j-th pixel correspond-

ing to xj . If we have m measurements (m X-rays travels through the object),

then we have a linear equation

~d = F~x,

where Fij = ai,j and ~d = [d1, d2, . . . , dm]

>.

172 Chapter 8. Singular Value Decomposition

8.5.3 Image Reconstruction

We move from the problem of computing the observables ~d for a given an attenu-

ation coefficient function to the image reconstruction. Consider the measurement

process can be expressed as

~d = F~x+ ~e,

where ~e represents possible measurement noise of the instrument (as all the

real world measurements are noisy) and other source of errors in the modelling

process.

Remark 8.25

The error in the measurement process is not negligible.

The process of determining ~d given a known ~x is called the forward problem.

In contrast, image reconstruction is an inverse problem where we aim to recover

~x from measured data ~d.

In many cases, especially in industrial imaging, the x-rays travel through the

physical object only from a restricted angle of view and we often have m < n.

This way, the reconstruction process is very sensitive to measurement error. To

understand the reconstruction process and the role of measurement error, we

generate a noise free data by ~dt = F~xt and its noise corrupted version ~dn for a

given “true” test image ~xt, as shown in Figure 8.5.1. Furthermore, the reduced

SVD of F , F = Uˆ ΣˆVˆ >, will also be used.

Inverse Crime

Figure 8.5.3: Left: reconstruction from noise free data. Right: reconstruction

from noisy data.

Given a measured data set ~dn, a natural thing to try is to recover ~x by using

the pseudoinverse of F (as discussed in the previous section) as F ∈ Rm×n may

not be invertible. This way, we have the reconstructed image,

~x+ = F+~dn = Vˆ Σˆ

−1Uˆ>~dn.

To demonstrate the impact of measurement error, we consider the following

experiments:

1. Reconstruct ~x using the noise free data—this not realistic in practice.

2. Reconstruct ~x using the noisy data—the realistic case.

8.5. X-Ray Imaging using SVD 173

Figure 8.5.3 shows the reconstructed image for both experiments. The experi-

ment 1 is often referred to as the inverse crime, or too-good-to-be-true recon-

structions. This is a reconstruction given the perfect knowledge about measure-

ment process and provided noisy free data. In practice, a small error in the data

can lead to a rather large error in the reconstruction, as shown in the experiment

2. Thus, we aim to find reconstructions that is robust to error.

Reconstruction using truncated SVD

Consider the noisy data generated from a true image ~xt, ~dn = F~xt + ~e, the

reconstruction using the pseudoinverse can be expressed as

~x+ = F+ ~dn = Vˆ Σˆ

−1Uˆ>︸ ︷︷ ︸

F+

Uˆ Σˆ−1Vˆ >︸ ︷︷ ︸

F

~xt + Vˆ Σˆ

−1Uˆ>︸ ︷︷ ︸

F+

~e.

Thus, we have

~x+ = Vˆ Vˆ >~xt + F+~e.

The reconstructed image ~x+ consists of the true image ~xt projected onto the row

space of F and the pseudoinverse multiplied by the noise F+~e. This way the

reconstruction error ‖~x+ − ~xt‖ can be bounded as

‖~x+ − ~xt‖ = ‖(Vˆ Vˆ > − I)~xt + F+~e‖

≤ ‖(I − Vˆ Vˆ >)~xt‖+ ‖F+‖‖~e‖, (8.14)

by the triangle inequality.

We know that the 2-norm of the pseudoinverse, ‖F+‖, is given by 1/σr. Then

the error bound can be expressed as

‖~x+ − ~xt‖ ≤ ‖(I − Vˆ Vˆ >)~xt‖+ 1

σr

‖~e‖. (8.15)

Thus, the reconstruction error is subject to the smallest nonzero singular values.

The singular values of the example used here is shown in Figure 8.5.4.

Figure 8.5.4: Singular values of F .

174 Chapter 8. Singular Value Decomposition

To control the reconstruction error, one can use the truncated SVD to define

the approximated pseudoinverse of F . Given the truncated SVD

F ≈

k∑

i=1

σi ~ui ~v

>

i ,

for k < r, the rank-k approximated pseudoinverse can be defined as

F+k = UkΣkV

>

k =

k∑

i=1

1

σi

~ui ~v

>

i

This way, the corresponding reconstruction error bound of the reconstructed

image

~x+k = VkΣ

−1

k U

>

k

~d.

takes the form

‖~x+k − ~xt‖ ≤ ‖(I − VkV >k )~xt‖︸ ︷︷ ︸

Representation error

+

1

σk

‖~e‖. (8.16)

Figure 8.5.5 shows the reconstructed image using k = 50, k = 500, and k = 940.

The left reconstructed image in Figure 8.5.5 has a rather large representation

Figure 8.5.5: Left: k = 50. Middle: k = 500. Right: k = 940.

error as we truncated the SVD too aggressively. The right reconstructed image in

Figure 8.5.5 is not robust to noise as the last singular value σk in the truncated

SVD is too small. The middle reconstructed image in Figure 8.5.5 seems to

archive a suitable balance.

L-curve

We aim to find a k such that the reconstruction error is robust with respect to

the noise ~e, while has a minimal representation error. If the true image ~xt and

the noise ~e are known, we can precisely compute the reconstruction error to pick

the best k that reconstructs the optimal image. However, both ~xt and ~e are

unknown, so we have to derive heuristics for choosing the best k.

We can measure the representation error as how well the reconstructed image

fits the noisy data, in the form of

‖F~x+k − ~dn‖,

and measure the robustness of the reconstruction by ‖~x+k ‖, which is bounded as

‖~x+k ‖ ≤ ‖VkV >k ~xt‖+

1

σk

‖~e‖.

8.5. X-Ray Imaging using SVD 175

The smaller the former is, the better the reconstructed image explains the data,

while the smaller the latter is, the reconstruction is more robust.

For a suitably chosen k, the norm ‖F~x+k − ~dn‖ should close to the norm of

measurement noise ‖~e‖. For a rather small k, we expect the norm ‖F~x+k − ~dn‖

can be rather large. If we increase k, then the norm ‖F~x+k − ~dn‖ should decrease

until it reaches the order of measurement noise ‖~e‖. However, at the same time,

the robustness of the reconstruction decreases if k is too large. Thus, we expect

the norm of ~x+k increases drastically if the k is chosen such that σk is too small.

We often plot the norm ‖~x+k ‖ (on the horizontal axes) versus the norm ‖F~x+k −

~dn‖ (on the vertical axes) with different k values. This leads to the so-called L-

curve. Figure 8.5.6 shows the L-curve computed for the example used here.

The corner (represented by the black dot) represents a reasonable k value that

balance the norm ‖F~x+k −~dn‖ (fit to the data) and the norm ‖~x+k ‖ (reconstruction

robustness).

Figure 8.5.6: Top left: the L-curve. Top right: the reconstructed image using

k = 840, which the corner on the L-curve. Bottom left: reconstruction robustness

‖~x+k ‖ versus the rank k. Bottom right: fit to the data ‖F~x+k − ~dn‖ versus the

rank k.

176 Chapter 8. Singular Value Decomposition

Chapter 9

Krylov Subspace Methods

for Eigenvalues

In this chapter we will consider solving the eigenvalue problems for very large

martices A ∈ Rn×n. For example, in the X-ray imaging case, a 3-D image discre-

tised into 100 intervals on each dimension (not even a very fine resolution image)

has a million voxels to reconstruct and we may use the same order of number of

X-rays in the reconstruction. This leads to an eigenvalue problems with a million

dimensional matrix. In this scenario, it is no longer computationally feasible to

directly apply eigenvalue solvers such as the QR algorithm that operates on the

full matrix (with operation counts O(n3) in Phase 1 and O(n2) in each iteration

of Phase 2).

Instead of solving the original eigenvalue problem in Rn, we seek to project

the original problem onto a lower dimensional subspace, the Krylov subspace, and

then solve a reduced dimensional eigenvalue problem. In this Chapter, we will

discuss two algorithms for computing eigenvalues using the Krylov subspace, the

Arnoldi method and the Lanczos method, which are designed for general square

matrices and symmetric matrices, respectively.

9.1 The Arnoldi Method for Eigenvalue Problems

Objective

We recall that the CG method and the GMRES method for solving linear systems

with A~x = ~b minimises the residual A~x−~b projected onto the Krylov subspace

generated by the matrix A and the vector ~b:

Kk+1(~b,A) = span{~b,A~b,A2~b, . . . , Ak~b}.

Given a general square matrix A ∈ Rn×n, the goal of the Arnoldi method is to

construct an orthonormal basis Qk+1 of the Krylov subspace Kk+1(~b,A) for some

k > 0 such that the projection of the matrix A onto Kk+1(~b,A) with respect to

the basis of columns of Qk+1,

Hk+1 = Q

∗

k+1AQk+1, Hk+1 ∈ R(k+1)×(k+1),

is a Hessenberg matrix. Under certain technical conditions, the eigenvalues of

the Hessenberg matrix Hk+1 (the so-called Arnoldi eigenvalue estimates) can be

good approximations of the eigenvalues of A.

177

178 Chapter 9. Krylov Subspace Methods for Eigenvalues

In the rest of this section, we will show the Arnoldi procedure for constructing

such a matrix Qk+1 and some of its important properties for solving eigenvalue

problems and linear systems.

Arnoldi Procedure

Recall that a complete reduction of A ∈ Rn×n to a Hessenberg form by a unitary

similarity transformation can be written as

H = Q∗AQ, or AQ = QH.

In Phase 1 of the eigenvalue solvers we learned in Chapters 6 and 7, the matrix

Q is constructed by a sequence of n−2 Householder reflections. For a large n, it

is not feasible to apply this process which requires O(n3) operation counts. This

way, we can only focus on the first k + 1 columns of AQ = QH.

Furthermore, recall that for computing the QR factorisation of A, QR = A,

we have discussed two methods: Householder reflection and (modified) Gram-

Schmidt. While the former is more numerically stable, the (modified) Gram-

Schmidt has the advantage that it can be stopped part-way, leaving one with

a reduced QR factorisation. The process of using the Arnoldi procedure to

construct the first k + 1 columns of AQ = QH draws an analogy to this.

Arnoldi generates an orthonormal basis for the Krylov space Kk+1(~b,A) by

setting

~q0 = ~b/‖~b‖,

and applying modified Gram-Schmidt to orthogonalise the vectors

{~q0, A~q0, A~q1, . . . , A~qk}.

In every iteration, the Arnoldi method computes a vector A~qk, and orthog-

onalise this vector to the previous {~q0, ~q1, . . . , ~qk} using the modified Gram-

Schmidt process to generate a new vector ~qk+1. This is essentially subtracting

from A~qk the components in the directions of the previous ~qj :

~vk+1 = A~qk − h0,k~q0 − h1,k~q1 − . . .− hk,k~qk,

where the projection coefficients hj,k are determined as hj,i = (A~qk)

∗~qi. The

new orthonormal vector ~qk+1 is then determined by normalising ~vk+1:

~qk+1 = ~vk+1/hk+1,k

where hk+1,k = ‖~vk+1‖. So the basis vectors {~q0, ~q1, . . . , ~qk, ~qk+1} satisfy

hk+1,k~qk+1 = A~qk − h0,k~q0 − h1,k~q1 − · · · − hk,k~qk,

or

A~qk = h0,k~q0 + h1,k~q1 + · · ·+ hk,k~qk + hk+1,k~qk+1.

This procedure to generate an orthonormal basis of the Krylov space is called

the Arnoldi procedure. It can easily be shows that the resulting set of Arnoldi

vectors, {~q0, ~q1, . . . , ~qk, ~qk}, is a basis for Kk+1(~b,A) = span{~b,A~b,A2~b, . . . , Ak~b}.

Theorem 9.1

Let {~q0, . . . , ~qk} be the vectors generated by the Arnoldi procedure. Then

span{~q0, . . . , ~qk} = span{~b,A~b,A2~b, . . . , Ak~b}.

9.1. The Arnoldi Method for Eigenvalue Problems 179

Proof. This can be shown by induction. The case for k = 0 is trivial. For

k > 0, suppose that

span{~q0, . . . , ~qk} = span{~b,A~b,A2~b, . . . , Ak~b},

holds. Given the relationship between A~qk and the Arnoldi vectors:

A~qk = h0,k~q0 + h1,k~q1 + · · ·+ hk,k~qk + hk+1,k~qk+1,

we know that the vector A~qk is a linear combination of {~q0, . . . , ~qk, ~qk+1}. Thus,

we have

span{~q0, . . . , ~qk, , ~qk+1} = span{~b,A~b,A2~b, . . . , Ak~b,Ak+1~b}.

The Arnoldi procedure is given by:

Algorithm 9.2: Arnoldi Procedure for an Orthonormal Basis of Kk+1(~b0, A)

Input: matrix A ∈ Rn×n; vector ~b0

Output: vectors ~q0, . . . , ~qk that form an orthonormal basis of Kk+1(~b0, A)

1: ~q0 = ~b0/‖~b0‖

2: for i = 0 : (k − 1) do

3: ~v = A~qi

4: for j = 0 : i do

5: hj,i = ~q

∗

j ~v

6: ~v = ~v − hj,i~qj

7: end for

8: hi+1,i = ‖~v‖

9: if hi+1,i < tol then

10: Stop

11: end if

12: ~qi+1 = ~v/hi+1,i

13: end for

We can express the vectors and coefficients computed during the Arnoldi

procedure in a matrix form of:

A

~q0 ~q1 · · · ~qk

︸ ︷︷ ︸

n-by-(k+1)

=

~q0 ~q1 · · · ~qk ~qk+1

︸ ︷︷ ︸

n-by-(k+2)

h0,0 h0,1 h0,2 . . . h0,k

h1,0 h1,1 h1,2

h2,1 h2,2

. . .

...

h3,2

. . .

. . .

. . .

0 hk,k−1 hk,k

hk+1,k

︸ ︷︷ ︸

(k+2)-by-(k−1)

,

180 Chapter 9. Krylov Subspace Methods for Eigenvalues

or

AQk+1 = Qk+2H˜k+1.

Projection onto Krylov Subspaces

We can partition the matrix H˜k+1 as:

H˜k+1 =

h0,0 h0,1 h0,2 . . . h0,k

h1,0 h1,1 h1,2

h2,1 h2,2

. . .

...

h3,2

. . .

. . .

. . .

0 hk,k−1 hk,k

hk+1,k

=

[

Hk+1

hk+1,k~e

>

]

,

where Hk+1 is a (k + 1)-by-(k + 1) square Hessenberg matrix.

Note that the product Q∗k+1Qk+2 =

[

I ~0

]

, which is a (k + 1)-by-(k + 2)

identity matrix, i.e., a matrix with 1 on its main diagonal and zero elsewhere.

Then, we have

Q∗k+1AQk+1 = Q

∗

k+1Qk+2H˜k+1 = Hk+1.

The matrix Hk+1 can be interpreted as the representation in the basis of columns

of Qk+1 of the matrix A projected onto the Krylov subspace Kk+1.

Since the Hessenberg matrix Hk+1 is a projection of A, one might image

that the eigenvalue of Hk+1 can be related to the eigenvalue of A. In fact,

under certain conditions, the eigenvalues Hk+1 (the so-called Arnoldi eigenvalue

estimates) can be very accurate approximations of the eigenvalues of A. This

will be shown in later sections.

Breakdown

Note also that, when k+ 1 = n, the process terminates with hn,n−1 = ‖~vn‖ = 0,

because there cannot be more than n orthogonal vectors in Rn. At this point,

we obtain

AQ = QH,

where Q ∈ Rn×n is orthogonal and H ∈ Rn×n is a square Hessenberg matrix

with zeros below the first subdiagonal:

H =

h0,0 h0,1 h0,2 . . . h0,n−1

h1,0 h1,1 h1,2

h2,1 h2,2

. . .

...

h3,2

. . .

. . .

. . .

0 hn−1,n−2 hn−1,n−1

.

9.1. The Arnoldi Method for Eigenvalue Problems 181

In practice, the Arnoldi procedure will be terminated if the value hk+1,k =

‖~vk+1‖ is close to zero, say, below certain threshold (Lines 9-11 in Algorithm

9.2). This is called a breakdown of the Arnoldi procedure. Very often and

hopefully, the breakdown can occur before k + 1 = n. The breakdown means

exact eigenvalues (up to some numerical error) of A can be obtained from the

matrix Hk+1 and exact solutions of the linear system A~x = ~b can be obtained.

Remark 9.3

Once a breakdown occurs, we have hk+1,k = 0, and then

H˜k+1 =

[

Hk+1

0T

]

.

It then follows that

AQk+1 = Qk+2H˜k+1

= [Qk+1 | ~qk+1]

[

Hk+1

0T

]

= Qk+1Hk+1.

Remark 9.4

Consider that Qk+1 is the first k+ 1 columns of an unitary matrix Q that can

reduce the matrix A to a Hessenberg form, i.e., AQ = QH or Q∗AQ = H.

In this case the matrix H for the full Hessenberg reduction has the following

structure

H =

[

Hk+1 H12

0 H22

]

,

where H12 is a potentially full (k + 1) × (n − k − 1) matrix and H22 is an

(n − k − 1) × (n − k − 1) upper Hessenberg matrix. Thus, H is block upper

triangular. Then the union of the eigenvalues of Hk+1 and the eigenvalues of

H22 are the eigenvalues of A.

Remark 9.5

It is easy to verify that is Hk+1 has an eigenvalue λ with an eigenvector ~v,

then λ is an eigenvalue of A and A has a corresponding eigenvector Qk+1~v.

Proof. Let λ be an eigenvalue of Hk+1 with corresponding eigenvector ~v,

i.e., Hk+1~v = λ~v. Let ~y = Qk+1~v, then

A~y = AQk+1~v = Qk+1Hk+1~v,

as given in (i). Since Hk+1~v = λ~v, we have

A~y = λQk+1~v = λ~y,

Since ~v 6= 0, and since the columns of Qk+1 are linearly independent, it follows

that ~y 6= 0, and hence λ is an eigenvalue of A with eigenvector ~y.

182 Chapter 9. Krylov Subspace Methods for Eigenvalues

Theorem 9.6

Once a breakdown occurs at an iteration k, the Krylov subspace Kk+1(~b,A) =

span{~b,A~b,A2~b, . . . , Ak~b} is an invariant subspace of A, i.e., AKk+1 ⊆ Kk+1.

Proof. Let ~y be an arbitrary vector in AKk+1, then there exists a vector

~z ∈ Kk+1 such that ~y = A~z. Since Kk+1 = span{~q0, · · · , ~qk}, we can express

~z as a linear combination of {~q0, · · · , ~qk} in the form of ~z = Qk+1 ~w for some

~w ∈ Rk+1. It follows that ~y = AQk+1 ~w = Qk+1Hk+1 ~w. This implies that

~y ∈ span{~q0, · · · , ~qk}. Since ~y is arbitrary it follows that AKk+1 ⊆ Kk+1.

Theorem 9.7

Once a breakdown occurs at an iteration k, the Krylov subspaces of A gener-

ated by b, Kk+1(~b,A) = span{~b,A~b,A2~b, . . . , Ak~b}, have the following property:

Kk+1 = Kk+2 = Kk+3 = · · · .

Proof. First we have that Kk+1 ⊆ Kk+2 by the definition of Krylov subspace.

The Krylov subspace Kk+2 is the union of span{~q0} and the subspace AKk+1,

i.e.,

Kk+2 = span{~q0} ∪AKk+1.

After the breakdown, we have AKk+1 ⊆ Kk+1 as the result of Theorem 9.6.

Since span{~q0} ⊆ Kk+1 by definition, we have Kk+2 ⊆ Kk+1. Thus, Kk+1 =

Kk+2. Then, we can prove this theorem by induction.

Theorem 9.8

Suppose that the matrix A is nonsingular. Once a breakdown occurs, the solu-

tion to the linear system A~x = ~b lies in Kk+1.

Proof. If A is nonsingular, then by the result of Remark 9.4, zero can-

not be an eigenvalue of Hk+1. Therefore, Hk+1 is an invertible matrix and

AQk+1H

−1

k+1 = Qk+1. Since

~b = Qk+1

(

~e1‖~b‖

)

, it follows that

AQk+1H

−1

k+1

(

~e1‖~b‖

)

= Qk+1

(

~e1‖~b‖

)

= ~b.

Multiplying both sides on the left by A−1 we obtain

Qk+1H

−1

k+1

(

~e1‖~b‖

)

= A−1~b = ~x.

Thus, we have ~x ∈ Kk+1.

9.2. Lanczos Method for Eigenvalue Problems 183

9.2 Lanczos Method for Eigenvalue Problems

The Lanczos method is the Arnoldi method specialised to the case where the

matrix A is symmetric. If A = AT , then the Hessenberg matrix obtained by the

Arnoldi process (for the case k + 1 = n) satisfies

HT = (QTAQ)T = QTATQ = QTAQ = H,

so H is symmetric, which implies that it is tridiagonal.

Therefore, the Arnoldi update formula simplifies from

hk+1,k~qk+1 = A~qk − h0,k~q0 − h1,k~q1 − . . .− hk,k~qk.

to a three-term recursion relation

hk+1,k~qk+1 = A~qk − hk−1,k~qk−1 − hk,k~qk,

with

A

~q0 ~q1 · · · ~qk

︸ ︷︷ ︸

n-by-(k+1)

=

~q0 ~q1 · · · ~qk ~qk+1

︸ ︷︷ ︸

n-by-(k+2)

h0,0 h0,1 0

h1,0 h1,1 h1,2

h2,1 h2,2 h2,3

...

h3,2 h3,3

. . .

. . .

. . .

0 hk,k−1 hk,k

hk+1,k

︸ ︷︷ ︸

(k+2)-by-(k−1)

.

Taking into account the symmetry further and use the notation T˜ instead of

H˜ to denote a tridiagonal matrix, we have

T˜k+1 =

α0 β0

β0 α1 β1 0

β1 α2 β2

. . .

. . .

. . .

. . .

. . .

0 βk−1 αk

βk

.

In a matrix form, we have

AQk+1 = Qk+2T˜k+1.

The simplification of the Arnoldi procedure to compute the orthonormal basis

{~q0, . . . , ~qi} of the Krylov space based on

A~qk = βk−1~qk−1 + αk~qk + βk~qk+1,

184 Chapter 9. Krylov Subspace Methods for Eigenvalues

where αk = (A~qk)

∗~qk, βk−1 = (A~qk)∗~qk−1, and βk+1 is the 2-norm of A~qk −

βk−1~qk−1 − αk~qk. Note that βk−1 was obtained in the previous iteration. This

procedure is called the Lanczos procedure. It can be shown that the Lanczos

procedure is related to the CG algorithm (just like Arnoldi is used by GMRES).

Properties of the Arnoldi procedure, for example, Remark 9.3 - Theorem 9.7 still

hold for the Lanczos procedure.

The Lanczos procedure is given by:

Algorithm 9.9: Lanczos Procedure for an Orthonormal Basis of Kk+1(~b0, A)

Input: a symmetric matrix A ∈ Rn×n; vector ~b0

Output: vectors ~q0, . . . , ~qk that form an orthonormal basis of Kk+1(~b0, A)

1: β−1 = 0, ~q−1 = 0, ~q0 = ~b0/‖~b0‖

2: for i = 0 : (k − 1) do

3: ~v = A~qi

4: αi = ~q

∗

i ~v

5: ~v = ~v − αi~qi − βi−1~qi−1

6: βi = ‖~v‖

7: if βi < tol then

8: Stop

9: end if

10: ~qi+1 = ~v/βi+1

11: end for

Remark 9.10

Note that each iteration of the Lanczos procedure only operates with three

vectors, as opposite to the i+ 1 vectors used in the Arnoldi iteration.

9.3. How Arnoldi/Lanczos Locates Eigenvalues 185

9.3 How Arnoldi/Lanczos Locates Eigenvalues

Eigenvalues of a matrix is defined by the characteristic polynomial. The Arnoldi/

Lanczos procedure implicitly constructs a sequence of polynomials that approx-

imates the characteristic polynomial of a matrix.

The use of Arnoldi/Lanczos procedure for computing eigenvalues proceeds as

follows. For a matrix A ∈ Rn×n, after k Arnoldi/Lanczos iterations, the eigen-

values and eigenvector of the resulting Hessenberg matrix Hk+1 are computed

by standard eigenvalues solvers such as the shifted QR algorithm. These are the

Arnoldi estimates of eigenvalues.

For a large matrix, we often can only perform the Arnoldi/Lanczos proce-

dure k n number of iterations. In this case, we can only obtain estimates

to a maximum of k + 1 eigenvalues. Some of these eigenvalue estimates con-

verges faster to an eigenvalue of A and some of the estimates converges slower.

Typically, estimates of those “extreme” eigenvalues converges faster. That is,

eigenvalues near the edge of the spectrum, or eigenvalues have a big gap with

adjacent eigenvalues.

Here we want to illustrate the idea behind the Arnoldi/Lanczos procedure,

why it tends to find those extreme eigenvalues.

Arnoldi and Polynomial Approximation

Let ~x be a vector in the Krylov subspace Kk(~b,A) generated by the matrix A

and the vector ~b:

Kk(~b,A) = span{~b,A~b,A2~b, . . . , Ak−1~b}.

Such an ~x can be expressed a linear combination of powers of A times ~b,

~x = c0~b+ c1A~b+ · · ·+ ck−1Ak−1~b =

k−1∑

j=0

cjA

j−1~b.

This expression can also be defined as a polynomial of A multiplied by ~b. If p(z)

is a polynomial c0 + c1z + · · · + ck−1zk−1, then we have the matrix polynomial

of A in the form of

p(A) = c0 + c1A+ · · ·+ ck−1Ak−1 =

k−1∑

j=0

cjA

j−1.

This way, we have ~x = p(A)~b. Krylov subspace methods can be analysed in

terms of matrix polynomials.

Definition 9.11

A monic polynomial of degree k is defined as a polynomial

pk(z) = c0 + c1z + · · ·+ ck−1zk−1 + zk.

That is, the coefficient associated with degree k is 1.

Remark 9.12

The characteristic polynomial of a matrix A, pA(λ), is a monic polynomial.

186 Chapter 9. Krylov Subspace Methods for Eigenvalues

Theorem 9.13

Consider the characteristic polynomial of a matrix A, pA(λ), The Cayley-

Hamilton Theorem asserts that the matrix polynomial pA(A) = 0.

Proof. This can be easily verified for that case where the matrix A has an

eigendecomposition, A = V λV −1. We omit the general proof here.

Remark 9.14

The Arnoldi/Lanczos procedure finds a monic polynomial pk(·) such that

‖pk(A)~b‖ is minimised. (9.1)

Once a breakdown occurs, it is not hard to show that the Arnoldi procedure

obtains a monic polynomial such that ‖pk(A)~b‖ = 0. Here we want to look into

this problem before a breakdown.

Theorem 9.15

As long as the Arnoldi procedure does not breakdown (i.e., the Krylov sub-

space Kk(~b,A) is of rank k), the chracteristic polynomial of HK defines the

polynomial solving the problem (9.1).

Proof. We first note that if pk is a monic polynomial, then the vector pkA~b

can be written as

pk(A)~b =

k−1∑

j=0

cjA

j−1

~b

︸ ︷︷ ︸

=−Qk~y∈Kk(~b,A)

+Ak~b = Ak~b−Qk~y,

for some ~y ∈ Rk. Since Qk is full rank (of rank k), the problem (9.1) becomes

a least square problem of finding ~y such that

‖Ak~b−Qk~y‖

is minimised. The solution can be obtained at Q∗k(A

k~b−Qk~y) = 0, or equiva-

lently

Q∗kp

k(A)~b = 0.

Now the problem boils down to find the monic polynomial that solves the

above equation.

Consider the following unitary matrix

Q =

[

Qk U

]

,

where Qk is the the matrix consists of the Arnoldi vectors, the first column

of U is the Arnoldi vector ~qk+1 and the other columns of U are orthonormal

9.3. How Arnoldi/Lanczos Locates Eigenvalues 187

vectors. This way, we have the unitary similarity transformation of the matrix

A, which takes the form of

Q∗AQ =

[

Q∗kAQk Q

∗

kAU

U∗AQk U∗AU

]

.

Since AQk = Qk+1H˜k, we have Q

∗

kAQk = Hk and X1 = U

∗AQk = U∗Qk+1H˜k

is a matrix of dimension (n− k)-by-k, with all but the upper-right entry equal

to 0. Let X2 = Q

∗

kAU and X3 = U

∗AU , we have

Q∗AQ =

[

Hk X2

X1 X3

]

= H,

which is block Hessenberg.

Since A = QHQ∗, we can show that pk(A) = Qpk(H)Q∗. Thus, we have

Q∗kp

k(A)~b = Q∗kQp

k(H)Q∗~b.

Given ~b = Q~e1‖~b‖ and Q∗kQ =

[

Ik 0

]

, the above equation can be written

as

Q∗kp

k(A)~b =

[

Ik 0

]

pk(H)~e1‖~b‖,

which is essentially the first k entries of the first column of pk(H).

Because of the block Hessenberg structure of H, the first k entries of the first

column of pk(H) can be given by pk(Hk). If p

k(·) is the characteristic polyno-

mial of Hk, i.e., p

k(λ) = pHk(λ), then by the Cayley-Hamilton Theorem, the

matrix polynomial pk(Hk) equals to 0. This way, the characteristic polynomial

of Hk defines a polynomial solving the problem (9.1).

How Arnoldi/Lanczos Locates Eigenvalues

By projecting the matrix A onto the Krylov subspaceKk(~b,A) represented byQk,

we obtain a matrix Hk. The characteristic polynomial of Hk effectively solves

a polynomial approximation problem, or equivalently, a least square problem

involving the Krylov subspace.

What does the characteristic polynomial of Hk have to do with the eigen-

values of A, or equivalently, the characteristic polynomial of A? There is a

connection between these. If a polynomial pk(·) has the property that pk(A) is

small, effectively we can find the root of pk(·) that are close the roots of pA(·).

Remark 9.16

We can express the vector ~b as a linear combination of eigenvectors, ~v1, ~v2, . . .

associated with coefficients a1, a2, . . ., in the form of

~b =

n∑

i=1

aj~vj .

188 Chapter 9. Krylov Subspace Methods for Eigenvalues

Since

p(A)~vi =

k∑

j=1

cjA

j−1~vi =

k∑

j=1

cjλ

j−1

i ~vi = p(λi)~vi

the vector p(A)~b can be written as

p(A)~b =

n∑

i=1

aip(λi)~vi.

Thus, the eigenvalue estimates obtained from the the Arnoldi procedure de-

pend on the quality of the approximation to p(λi) weighed by ai.

Remark 9.17

If the vector ~b is the linear combination of a limited number of eigenvectors.

The Arnoldi will find the monic polynomial such that ‖p(A)~b‖ = 0, as soon

as p(A)~b can be contained by a Krylov subspace, which is exactly the Krylov

subspace after the breakdown, or equivalently, the subspace spanned by all the

eigenvectors used for constructing ~b.

Example

In general, the shape of the characteristic polynomial is dominated by “extreme”

eigenvalues. Here we illustrate this idea using the following example. Let A be

a 19-dimensional matrix

A = diag([0.1, 0.5, 0.6, 0.7, . . . , 1.9, 2.0, 2.5, 3.0]).

The spectrum of A consists of a dense collection of eigenvalues in the interval

[0.5, 2.0] and some outliers 0.1, 2.5, and 3.0, as shown below.

The crosses are the eigenvalue and the blue line is the characteristic polynomial.

We carry out the Lanczos procedure with a random starting vector ~b0.

Figure 9.3.1 plots the monic polynomials obtained in selected iterations the

Lanczos procedure and their roots. We can observe that the outlier eigenval-

ues are identified first, followed by the eigenvalues on the edge of the interval

[0.5, 2.0]. Those eigenvalues in the middle of the cluster are identified the last. In

summary, those eigenvalue estimates in the region where the characteristic poly-

nomial changes more rapidly converges faster than those in the region where the

characteristic polynomial is flat.

9.3. How Arnoldi/Lanczos Locates Eigenvalues 189

Figure 9.3.1: Estimated monic polynomials obtained by the Lanczos procedure.

190 Chapter 9. Krylov Subspace Methods for Eigenvalues

Chapter 10

Other Eigenvalue Solvers

So far all the eigenvalue solvers we have learned involves some polynomials of a

matrix A. For example, the power iteration, inverse iteration, or more advanced

QR algorithms raise the matrix to some power, and Krylov subspace methods

implicitly construct a complicated matrix polynomial. There is more to the

computation of eigenvalues than using matrix polynomials. Here we introduce

some alternatives for computing eigenvalues.

10.1 Jacobi Method

One of the oldest idea for computing eigenvalues of a matrix is the Jacobi method,

introduced by Jacobi in 1845. Consider a symmetric matrix that of dimension

5 or larger. We know that we have apply iterative method to approximate the

eigenvalues. We also know that a real valued symmetric matrix A has an eigen-

decomposition A = QΛQ>, where Q is orthogonal and Λ is diagonal. Now the

question is that can we create a sequence of orthogonal similarity transformations

such that each transformation will transform the matrix to a “more diagonal”

form? This way, the sequence of transformations will eventually produce a diag-

onal matrix.

The Jacobi method use a sequence of 2-by-2 rotation matrix, called the Jacobi

rotation, which are chosen to eliminate off-diagonal elements while preserving the

eigenvalues. Whilst successive rotations will undo previous introduced zeros, the

off-diagonal elements get smaller until eventually we are left with a diagonal

matrix. By accumulating products of the transformations as we proceed we

obtain the eigenvectors of the matrix.

Consider a 2-by-2 symmetric matrix,

A =

[

a d

d b

]

,

we aim to find a rotation matrix J such that

J>AJ =

[6= 0 0

0 6= 0

]

.

191

192 Chapter 10. Other Eigenvalue Solvers

Definition 10.1

A 2-by-2 rotation matrix is an orthogonal matrix

J =

[

cos(θ) sin(θ)

− sin(θ) cos(θ)

]

=

[

c s

−s c

]

,

for some θ.

It can be shown that for

θ = 0.5 tan−1

(

2d

b− a

)

,

the resulting rotation matrix J can diagonalise the 2-by-2 matrix A.

For a large matrix A ∈ Rn×n where n > 4, we cannot directly diagonalise

the matrix. However, we can diagonalise a 2-by-2 submatrix each time using

the abovementioned Jacobi rotation. Consider we want to rotate the submatrix

A(p,q : p,q). We can first create a 2-by-2 Jacobi rotation matrix based on the

θ angle evaluated using the submatrix A(p,q : p,q). Then, we can embed this

Jacobi matrix in a n-dimensional identity matrix

Qp,q,θ =

1

. . .

1

c s

1

−s s

1

. . .

1

,

where all diagonal elements are 1 apart from two elements c in rows p and

q, and all off-diagonal elements are zero apart from the elements s and −s in

rows and columns q and q. Then the orthogonal similarity transformation A˜ =

Q>p,q,θAQp,q,θ will modifies the p-th and q-th rows and columns of the matrix

A. This orthogonal similarity transformation A˜ = Q>p,q,θAQp,q,θ has several

important properties:

1. Eigenvalues are preserved as it is a similarity transformation.

2. Frobenius norm is preserved as it is an orthogonal transformation.

3. From 2, we know that for the 2-by-2 submatrices A˜(p,q : p,q) andA(p,q : p,q),

we have ∑

p,q

A˜2pq =

∑

p,q

A2pq.

Thus, the p-th and q-th diagonal elements of A˜ and A have the following

property

A˜2pp + A˜

2

qq ≥ A2pp +A2qq,

as A˜pq = A˜qp = 0.

10.1. Jacobi Method 193

Since the matrix property has been preserved in the orthogonal similarity trans-

formation, and the transformed matrix is “more diagonal” than the previous

one. If we repeatedly apply the Jacobi rotation to a matrix, the matrix will be

eventually diagonalised.

One benefit of Jacobi method is that it usually has a better accuracy than

QR algorithms. The Jacobi method is also very easy to parallelise, as we only

need to modify two rows and two columns in each operation. However, matrix

reductions such as tridiagonalisation can not be used, as the Jacobi rotation can

destroy the tridiagonal structure. In general, Jacobi method is computationally

less efficiently than the QR algorithms using the tridiagonal reduction.

194 Chapter 10. Other Eigenvalue Solvers

10.2 Divide-and-Conquer

The divide-and-conquer algorithm, based on a recursive subdivision of a sym-

metric tridiagonal eigenvalue problem into problems of smaller dimensions, rep-

resents the most important advances of eigenvalue problems since 1960s. For

symmetric matrices, the divide-and-conquer algorithm outperformed shifted QR

algorithm, particularly for cases both eigenvalues and eigenvectors are desired,

and became the industrial standard in late 1990s. Here illustrate the idea behind

this powerful method.

Consider we have a n-by-n symmetric tridiagonal matrix,

T =

a1 b1

b1 a2 b2

b2 a3

. . .

. . .

. . .

an−1 bn−1

bn−1 an

.

where all the entries on the subdiagonal and superdiagonal are nonzero, so that

the eigenvalue problem cannot be deflated. The matrix T can be split and par-

titioned into the following matrices:

Here T1 = T (1:k, 1:k) and T2 = T (k+1:n, k+1:n) are the upper-left princi-

pal submatrix and lower-right principal submatrix of T , respectively, and β =

T (k+1, k) = T (k, k+1). The only difference between Tˆ1 and T1 is that lower-

right entry of T1 is replaced by T1(k, 1)−β. A similar modification is also applied

to T2 to obtain Tˆ2.

Now we can write the tridiagonal matrix A as the summation of a 2-by-2

block-diagonal matrix with tridiagonal blocks and a rank one update. Since the

eigenvalues of Tˆ1 and Tˆ2 can be solved separately, we can first find the eigen-

vector and eigenvalues of two reduced dimensional matrices, and then express

the eigenvalues of T as a function of eigenvalues of Tˆ1 and Tˆ2 and the rank one

update. Since the submatrices Tˆ1 and Tˆ2 are also symmetric and tridiagonal, we

can recursively apply this procedure to divide the problem into eigenvalues prob-

lems of small matrices where we can apply either analytically formula or other

computational methods that are computationally efficient for small matrices.

The key step in this recursive process is to identify the eigendecomposition

of T given the eigendecompositions of Tˆ1 and Tˆ2. Consider we have computed

the eigendecompositions of Tˆ1 and Tˆ2,

Tˆ1 = Q1D1Q

>

1 , and Tˆ2 = Q2D2Q

>

2 .

Since we can express the matrix T as

T =

[

Tˆ1

Tˆ2

]

+ β~y~y>,

10.2. Divide-and-Conquer 195

where ~y = [~e>k ~e

>

1 ]

> is a vector that have all elements are zero valued except the

value 1 in k-th and (k + 1)-th entries. Introducing an orthogonal matrix

Q =

[

Q1

Q2

]

,

the matrix Q>TQ can be written as

Q>TQ =

[

Q>1

Q>2

]([

Tˆ1

Tˆ2

]

+ β

[

~ek

~e1

] [

~ek

~e1

]>)[

Q1

Q2

]

=

[

D1

D2

]

︸ ︷︷ ︸

D

+β

[

~z1

~z2

]

︸︷︷︸

~z

[

~z1

~z2

]>

,

where ~z1 is the last row of Q1 and ~z1 is the fir row of Q2. Now the problem is

reduced to find the eigenvalues and eigenvectors of

D + β~z~z>,

which is a diagonal matrix plus a rank one update.

Suppose all the entries of the vector ~z is nonzero, otherwise the eigenvalue

problem can be deflated. Let dj = D(j, j and zj = ~z(j). The eigenvalue of this

matrix is simply the roots of the polynomial

f(λ) = 1 + β

n∑

j=1

z2j

dj − λ.

The roots of this function is contained in intervals (dj , dj+1), as shown below.

The roots of this polynomial can be rapidly identified using methods such as the

Newton’s method, as we know exactly the intervals where each eigenvalue lies in

and the function f(λ) is a monotone function in each interval.

The above assertion can be justified by considering the eigenvalue and eigen-

vector of D + β~z~z>, which take the form of

(D + β~z~z>)~q = λ~q,

196 Chapter 10. Other Eigenvalue Solvers

which leads to

(D − λI)~q + β~z(~z>~q) = 0.

Remark 10.2

Here ~z>~q cannot be zero. We can show this by contradiction. If ~z>~q = 0, then

we have (D − λI)~q = 0, so the vector ~q is an eigenvector of D, which means

~q has only one nonzero element as D is diagonal. This way, ~z>~q 6= 0 as all

entries of ~z are nonzero.

Remark 10.3

We also note that λ cannot be eigenvalues of D. We can show this by con-

tradiction. If λ is an eigenvalue of D, then D − λI has zeros entries on the

diagonal, as eigenvalues of D are those entries on the diagonal. This way,

the vector (D − λI)~q has zero entries. Since ~z>~q 6= 0 and all entries of ~z

are nonzero, all entries of the vector β~z(~z>~q) are nonzero. Thus, the vector

(D− λI)~q + β~z(~z>~q) must have nonzero entries, which is contradiction to the

assumption that λ and ~q are eigenvalue and eigenvector of D + β~z~z>.

Use Remark 10.3, we know that D − λI is invertible, so multiplying both

sides of the above equation by (D − λI)−1 (on the right), we have

~q + β(D − λI)−1~z(~z>~q) = 0,

then multiplying both sides of the above equation by ~z> (on the right) leads to

~z>~q + β~z>(D − λI)−1~z(~z>~q) = (~z>~q) (1 + β~z>(D − λI)−1~z))︸ ︷︷ ︸

f(λ)

= 0.

Since ~z>~q 6= 0 by Remark 10.2, we have f(λ) = 0 for all eigenvalues.

Appendix A

Appendices

197

198 Appendix A. Appendices

A.1 Notation

A.1.1 Vectors and Matrices

• ~x is a column vector in Rn; xi is the ith component of ~x.

• We may also write ~x = (x1, . . . , xn)T , where ~xT = (x1, . . . , xn) is a row

vector.

• A is a matrix in Rm×n. The element of A in row i and column j is referred

to by aij .

• The jth column of matrix A is referred to by ~aj . So

A = [~a1| . . . |~an ] .

• Sometimes we use the notation (A)ij for the element of A in position ij.

For example, we can say (AT )ij = aji. We can also use this to refer to a

row of A (as a row vector): the ith row of A can be indicated by (A)i∗,

where the ∗ means all columns j.

Something to remember . . .

In these notes, all vectors ~x are column vectors.

A.1.2 Inner Products

We express the standard Euclidean inner product of vectors ~x, ~y ∈ Rn, in

one of the following equivalent ways:

~xT~y = < ~x, ~y > .

Of course, we also have

~xT~y = ~yT~x =< ~x, ~y >=< ~y, ~x > .

Similarly,

< ~x,A~y > = ~xTA~y

= (AT~x)T~y

=< AT~x, ~y >,

since (AB)T = BTAT .

A.1.3 Block Matrices

Example A.1: Matrix-Matrix Product in Block Form

Let E ∈ R5×7 and F ∈ R7×6. When performing the matrix product E F ,

we can divide E and F in blocks with compatible dimensions, and write the

matrix-matrix product in block form as

A.1. Notation 199

200 Appendix A. Appendices

A.2 Vector Norms

A.2.1 Vector Norms

Definition A.2: Norm on a Vector Space

Let V be a vector space over R. The function ‖ · ‖ : V → R is a norm on V

if ∀ ~x, ~y ∈ V and ∀ a ∈ R, the following hold:

1. ‖~x‖ ≥ 0, and ‖~x‖ = 0 iff ~x = 0

2. ‖a~x‖ = |a|‖~x‖

3. ‖~x+ ~y‖ ≤ ‖~x‖+ ‖~y‖

Definition A.3: p-Norms on Rn

Let ~x ∈ Rn. We consider the following vector norms ‖~x‖p, for p = 1, 2,∞:

‖~x‖2 =

√√√√ n∑

i=1

x2i =

√

~xT~x

‖~x‖1 =

n∑

i=1

|xi|

‖~x‖∞ = max

1≤i≤n

|xi|

Theorem A.4: Cauchy-Schwarz Inequality

Let ~x, ~y ∈ Rn. Then

|~xT~y| ≤ ‖~x‖2‖~y‖2.

A.2.2 A-Norm

The vector 2-norm is induced by the Euclidean inner product:

‖~x‖2 =

√

~xT~x =

√

< ~x, ~x >.

More generally, if A ∈ Rn×n is symmetric positive definite, it can be used to

define an A-inner product, which induces the A-norm.

Definition A.5: A-Inner Product

Let A ∈ Rn×n be symmetric positive definite, and ~x, ~y ∈ Rn. Then

< ~x, ~y >A =< ~x,A~y >

= ~xTA~y

is called the A-inner product of ~x and ~y.

A.2. Vector Norms 201

Definition A.6: A-Norm on Rn

Let A ∈ Rn×n be symmetric positive definite. Then

‖~x‖A =

√

< ~x, ~x >A

=

√

~xTA~x

is a norm on Rn, called the A-norm.

Note that we recover the 2-norm for A = I.

202 Appendix A. Appendices

A.3 Orthogonality

Definition A.7: Orthogonal Vectors

~x, ~y ∈ Rn are orthogonal if

~xT~y = 0.

We may also write ~xT~y = 0 as < ~x, ~y >= 0.

Theorem A.8: Pythagorean Law

If ~x and ~y are orthogonal, then

‖~x+ ~y‖22 = ‖~x‖22 + ‖~y‖22.

Proof.

‖~x+ ~y‖22 =< ~x+ ~y, ~x+ ~y >

=< ~x, ~x > +2 < ~x, ~y > + < ~y, ~y >

= ‖~x‖22 + ‖~y‖22

Definition A.9: Orthogonal Matrices

A ∈ Rn×n is called an orthogonal matrix if

ATA = I.

This means that the columns of A are of length 1 and mutually orthogo-

nal. (So the term ‘orthogonal matrix’ is really a misnomer; in a perfect world

these matrices would be called ‘orthonormal matrices’.) ATA = I implies that

det(A)2 = 1, so A−1 exists and AT = A−1. Also, then, AAT = I, meaning that

the rows of an orthogonal matrix are also orthogonal.

A.4. Matrix Rank and Fundamental Subspaces 203

A.4 Matrix Rank and Fundamental Subspaces

Definition A.10: Range and Nullspace

Let A ∈ Rm×n.

The range or column space of A is defined as

range(A) = {~y ∈ Rm|~y = A~x =

n∑

i=1

~aixi for some ~x ∈ Rn}.

The kernel or null space of A is defined as

null(A) = {~x ∈ Rn|A~x = 0}.

Similarly, the row space of A (the space spanned by the rows of A) is, in

fact, the column space of AT , i.e., range(AT ).

The rank r of a matrix A is the dimension of the column space:

Definition A.11: Rank

rank(A) = dim(range(A))

This is the number of linearly independent columns of A. It can be shown that

this equals the number of linearly independent rows ofA, i.e., r = dim(range(A)) =

dim(range(AT )).

Theorem A.12: Dimensions of Fundamental Subspaces

Let A ∈ Rm×n. Then

1. dim(range(A)) + dim(null(AT )) = m

2. dim(range(AT )) + dim(null(A)) = n

Theorem A.13: Orthogonality of Fundamental Subspaces

range(A) and null(AT ) are orthogonal subspaces of Rm

Proof. If ~yr ∈ range(A), then ~yr = A~x for some ~x. If ~yn ∈ null(AT ), then

AT~yn = 0. Then

~yTr ~yn = (A~x)

T~yn = ~x

TAT~yn = 0.

204 Appendix A. Appendices

A.5 Matrix Determinants

Definition A.14

The determinant of a matrix A ∈ Rn×n is given by

det(A) =

n∑

j=1

(−1)i+jaij det(Aij), for fixed i,

with

Aij =

a11 a12 · · · a(1)(j−1) a(1)(j+1) · · · a1n

a21 a22 · · · a(2)(j−1) a(2)(j+1) · · · a2n

...

...

...

...

...

a(i−1)(1) a(i−1)(2) · · · a(i−1)(j−1) a(i−1)(j+1) · · · a(i−1)(n)

a(i+1)(1) a(i+1)(2) · · · a(i+1)(j−1) a(i+1)(j+1) · · · a(i+1)(n)

...

...

...

...

...

an1 an2 · · · a(n)(j−1) a(n)(j+1) · · · ann

.

i.e. the matrix Aij is an (n− 1)× (n− 1) matrix obtained by removing row i

and column j from the original matrix A.

Theorem A.15

If A ∈ Rn×n is a triangular matrix, then

det(A) =

n∏

i=1

aii.

A.6. Eigenvalues 205

A.6 Eigenvalues

We consider square real matrices A ∈ Rn×n.

A.6.1 Eigenvalues and Eigenvectors

Definition A.16: Eigenvalues and Eigenvectors

Let A ∈ Rn×n. λ is called an eigenvalue of A if there is a vector ~x 6= 0 such

that

A~x = λ~x,

where ~x is called an eigenvector associated with λ.

Notes:

• The eigenvalue may equal zero, but the eigenvector is required to be

nonzero.

• If ~x is an eigenvector of A with associated eigenvalue λ, then a~x for any

a ∈ R \ 0 is also an eigenvector of A, associated with the same eigenvalue.

Definition A.17: Characteristic Polynomial

Let A ∈ Rn×n. The degree-n polynomial

p(λ) = det(A− λI)

is called the characteristic polynomial of A.

The characteristic polynomial can be factored as p(λ) = (λ1−λ) . . . (λn−λ),

where λ1, . . . , λn are the n eigenvalues of A, which we order as

|λ1| ≤ |λ2| ≤ . . . ≤ |λn|.

Note that some eigenvalues may occur multiple times, and some may be complex

(in which case they occur in complex conjugate pairs).

Definition A.18: Algebraic and Geometric Multiplicity

Let A ∈ Rn×n. The algebraic multiplicity of an eigenvalue λi of A, µA(λi),

is the multiplicity of λi as a root of p(λ).

The geometric multiplicity of λi, µG(λi), is the number of linearly indepen-

dent eigenvectors associated with λi.

In other words, the geometric multiplicity µG(λi) = dim(E), where E =

{~x | (A− λiI)~x = 0} is the eigenspace associated with λi.

Theorem A.19: Relation of Algebraic and Geometric Multiplicities

Let A ∈ Rn×n. The algebraic and geometric multiplicities of the eigenvalues

satisfy the following properties.

1. µA(λi) ≥ µG(λi) ≥ 1 for all i = 1, . . . , n

2. A has n linearly independent eigenvectors iff µA(λi) = µG(λi) for all

i = 1, . . . , n.

If A has n linearly independent eigenvectors, it can be diagonalised.

206 Appendix A. Appendices

A.6.2 Similarity Transformations

Definition A.20: Similarity Transformation

Let A,B ∈ Rn×n with B nonsingular. Then the transformation from A to

B−1AB is called a similarity transformation of A. A and B−1AB are

called similar.

Theorem A.21: Eigenvalues of Similar Matrices

Let A,B ∈ Rn×n with B nonsingular. Then A and B−1AB have the same

eigenvalues (with the same algebraic and geometric multiplicities).

This can be shown using that

A~x = λ~x, ~x 6= 0

is equivalent with

AB~y = λB~y, ~y 6= 0,

for ~y given by ~y = B−1~x. This is equivalent with

(B−1AB)~y = λ~y, ~y 6= 0,

so any eigenvalue of A is also an eigenvalue of B−1AB.

A.6.3 Diagonalisation

Definition A.22: Diagonalisable and Defective Matrices

Let A ∈ Rn×n. A is called diagonalisable if it has n linearly independent

eigenvectors; otherwise, it is called defective.

Suppose A ∈ Rn×n has n linearly independent eigenvectors ~xi. Let X be the

matrix with the eigenvectors as its columns:

X = [~x1| . . . |~xn] .

Then

AX = X Λ,

with

Λ =

λ1 0 · · · 0

0 λ2 · · · 0

...

...

. . .

...

0 0 · · · λn

,

or

X−1AX = Λ,

i.e., the similarity transformation with X diagonalises A.

If A is defective, it can be transformed into the so-called Jordan form (which, in

some sense, is almost diagonal), using its n generalised eigenvectors. We won’t

need to consider the Jordan form in these notes.

A.6. Eigenvalues 207

A.6.4 Singular Values of a Square Matrix

Let A ∈ Rn×n. Let λi(ATA) and λi(AAT ), i = 1, . . . , n, be the eigenvalues of

ATA and AAT , respectively, numbered in order of decreasing magnitude. Note

that ATA and AAT are symmetric, so their eigenvalues are real, and they are

positive semi-definite, so their eigenvalues are nonnegative. It can be shown they

have the same eigenvalues.

Definition A.23: Singular Values of a Square Matrix

Let A ∈ Rn×n. Then

σi(A) =

√

λi(ATA)

=

√

λi(AAT ), i = 1, . . . , n,

are called the singular values of A.

208 Appendix A. Appendices

A.7 Symmetric Matrices

We consider square matrices A ∈ Rn×n.

Definition A.24

A ∈ Rn×n is called symmetric if

A = AT .

Theorem A.25: Eigenvalues and Eigenvectors of a Symmetric Matrix

Let A ∈ Rn×n. If A is symmetric, then the eigenvalues of A are real and A

has n linearly independent eigenvectors that can be chosen orthogonally.

Definition A.26

A ∈ Rn×n is called symmetric positive definite (SPD) if

A is symmetric and ~xTA~x > 0 for all ~x 6= 0.

Theorem A.27: Eigenvalues of an SPD Matrix

A symmetric matrix A ∈ Rn×n is SPD iff

λi > 0 for all i = 1, . . . , n.

Proof.

⇒ Assume A is SPD. Then ~xTA~x > 0 for all ~x 6= 0. Thus, ~xTi A~xi =

λi‖~xi‖22 > 0 for any eigenvalue λi with associated eigenvector ~xi since ~xi 6= 0.

This implies that λi > 0.

⇐ Assume λi > 0 for all i. A has n mutually orthogonal eigenvectors ~xi since

it is symmetric, and any ~x 6= 0 can be expressed in the basis of the orthogonal

eigenvectors. So ~x =

∑n

i=1 ci~xi where at least one of the ci 6= 0. Thus, for any

~x 6= 0,

~xTA~x = (

n∑

i=1

ci~x

T

i )(

n∑

j=1

cjA~xj)

= (

n∑

i=1

ci~x

T

i )(

n∑

j=1

cjλj~xj)

=

n∑

i=1

n∑

j=1

cicjλj~x

T

i ~xj

=

n∑

i=1

c2iλi~x

T

i ~xi (due to orthogonality)

=

n∑

i=1

c2iλi‖~xi‖22 > 0,

so A is SPD.

A.7. Symmetric Matrices 209

Note that an SPD matrixA is nonsingular (it does not have a zero eigenvalue).

Definition A.28

A ∈ Rn×n is called symmetric positive semi-definite (SPSD) if

A is symmetric and ~xTA~x ≥ 0 for all ~x 6= 0.

Theorem A.29: Eigenvalues of an SPSD Matrix

A symmetric matrix A ∈ Rn×n is SPSD iff

λi ≥ 0 for all i = 1, . . . , n.

210 Appendix A. Appendices

A.8 Matrices with Special Structure or Properties

Some matrices have a special structure, which may imply special properties.

A.8.1 Diagonal Matrices

Definition A.30

Let A ∈ Rn×n. Then

1. A is called a diagonal matrix if aij = 0 for all i 6= j. With ~a the

diagonal of a diagonal matrix A, we also write A = diag(~a). For any

matrix A (also nondiagonal), we indicate its diagonal by ~a = diag(A).

2. A is called a tridiagonal matrix if aij = 0 for all i, j satisfying |i−j| >

1.

A.8.2 Triangular Matrices

Definition A.31

1. U ∈ Rn×n is called an upper triangular matrix if uij = 0 for all i > j.

2. L ∈ Rn×n is called a unit lower triangular matrix if lij = 0 for all

i < j, and lii = 1 for all i.

Note that det(U) =

∏n

i=1 uii and det(L) = 1.

A.8.3 Permutation Matrices

Definition A.32

P ∈ Rn×n is called a permutation matrix if P can be obtained from the

n× n identity matrix I by exchanging rows.

Note that P has exactly one 1 in each row and column, and is otherwise 0.

Note also that permutation matrices are orthogonal, i.e., PPT = I, or P−1 =

PT , and det(P ) = ±1, depending on the parity of the permutation.

A.8.4 Projectors

Definition A.33

P ∈ Rn×n is called a projector if

P 2 = P.

I − P is also a projector, called the complementary projector to P .

Note: P separates Rn into two subspaces, S1 = range(P ) and S2 = null(P ).

We have ~x = P~x+(I−P )~x, where P~x ∈ S1, and (I−P )~x ∈ S2 since P (I−P )~x =

(P − P 2)~x = 0. P projects ~x into S1 along S2. For example, P (~x + ~y) = P~x if

~y ∈ S2 = null(P ).

A.8. Matrices with Special Structure or Properties 211

Definition A.34

P ∈ Rn×n is called an orthogonal projector if

P 2 = P and PT = P.

If P is an orthogonal projector, S1 = range(P ) and S2 = null(P ) are orthog-

onal: (P~x)T~y = ~xTPT~y = ~xTP~y = 0 if ~y ∈ null(P ). So P projects ~x into S1

along S2, where S2 is orthogonal to S1.

212 Appendix A. Appendices

A.9 Big O Notation

A.9.1 Big O as h→ 0

Consider scalar functions f(x) and g(x) of a real variable x.

Definition A.35

f(h) = O(g(h)) as h→ 0+ if

∃ c > 0, ∃h0 > 0: |f(h)| ≤ c |g(h)| ∀h with 0 ≤ h ≤ h0

Example A.36

Let

f(h) = 3h2 + 4h3.

Then

f(h) = O(h2) as h→ 0

6= O(h3)

= O(h).

In words: f(h) approaches 0 at least as fast as h2 (up to a multiplicative

constant), but not as fast as h3, and, clearly, also at least as fast as h. Note

that 3h2 is the dominant term in f(h) as h→ 0.

A.9.2 Big O as n→∞

Consider scalar functions f(n) and g(n) of an integer variable n.

Definition A.37

f(n) = O(g(n)) as n→∞ if

∃ c > 0, ∃N0 ≥ 0: |f(n)| ≤ c |g(n)| ∀n ≥ N0

Example A.38

Let

f(n) = 3n2 + 4n3.

Then

f(n) = O(n3) as n→∞

6= O(n2)

= O(n4).

In words: f(n) approaches ∞ not faster than n3 (up to a multiplicative con-

stant), but faster than n2, and, clearly, also not faster than n4. Note that 4n3

is the dominant term in f(n) as n→∞.

A.10. Sparse Matrix Formats 213

A.10 Sparse Matrix Formats

When matrices are sparse, it is often advantageous to store them in computer

memory using sparse matrix formats. This can save large amounts of memory

space, and it can also make computations faster if one implements methods

that eliminate multiplications or additions with 0 (e.g., when computing matrix-

vector or matrix-matrix products).

Consider, for example, the following sparse matrix, of which we will only

store the nonzero elements and their locations:

A =

16 0 −18 0

0 12 0 0

0 0 14 18

0 12 11 10

. (A.1)

In all what follows, i refers to rows, and j refers to columns.

A.10.1 Simple List Storage

A simple sparse storage format is to store the (i, j, value) triplets in a list, e.g.,

ordered by row starting from row 1 and from left to right:

val 16 -18 12 14 18 12 11 10

i 1 1 2 3 3 4 4 4

j 1 3 2 3 4 2 3 4

A.10.2 Compressed Sparse Column Format

An alternative with some advantages is the Compressed Sparse Column (CSC)

format, which Matlab uses internally.

In this format, the val array stores the nonzero values, ordered by column,

starting from column 1, and from top to bottom within a column. The i val

array stores the row index for each nonzero value.

The j ptr array saves on storage versus the j array in the simple list storage,

as follows: j ptr has one entry per column, and the entry indicates for each

column where it starts in the val and i val arrays. The j ptr array has one

additional entry at the end, which contains nnz(A) + 1.

val 16 12 12 -18 14 11 18 10

i val 1 2 4 1 3 4 3 4

j ptr 1 2 4 7 9

As such, j ptr(k) indicates where column k starts in the val and i val arrays,

and j ptr(k+1)-j ptr(k) indicates how many elements there are in row k.

Some advantages of the Compressed Sparse Column format:

• saves on storage space versus dense format, and, in many practical cases,

versus simple list storage

• finding all nonzeros in a given column of A is very fast

Note, however, that finding all nonzero elements in a row of a sparse Matlab

matrix can be very time-consuming! (Because the elements are stored per col-

umn.) So if one needs to access rows of a sparse A repeatedly, it can be much

faster to store AT as a sparse matrix instead and access its columns.

214 Appendix A. Appendices

Bibliography

[Ascher and Greif, 2011] Ascher, U. M. and Greif, C. (2011). A first course on numer-

ical methods. SIAM, http://epubs.siam.org.ezproxy.lib.monash.edu.au/doi/book/

10.1137/9780898719987.

[Bjo¨rck, 2015] Bjo¨rck, A˚. (2015). Numerical methods in matrix computations.

Springer, https://link-springer-com.ezproxy.lib.monash.edu.au/book/10.1007/

978-3-319-05089-8.

[Demmel, 1997] Demmel, J. W. (1997). Applied numerical linear algebra. SIAM, http:

//epubs.siam.org.ezproxy.lib.monash.edu.au/doi/book/10.1137/1.9781611971446.

[Gander et al., 2014] Gander, W., Gander, M. J., and Kwok, F. (2014). Sci-

entific computing-An introduction using Maple and MATLAB, volume 11.

Springer, https://link-springer-com.ezproxy.lib.monash.edu.au/book/10.1007/

978-3-319-04325-8.

[Linge and Langtangen, 2016] Linge, S. and Langtangen, H. P. (2016). Program-

ming for Computations-MATLAB/Octave: A Gentle Introduction to Numer-

ical Simulations with MATLAB/Octave. Springer, https://link-springer-

com.ezproxy.lib.monash.edu.au/book/10.1007/978-3-319-32452-4.

[Quarteroni et al., 2010] Quarteroni, A., Sacco, R., and Saleri, F. (2010).

Numerical mathematics, volume 37. Springer, https://link-springer-

com.ezproxy.lib.monash.edu.au/book/10.1007/b98885.

[Saad, 2003] Saad, Y. (2003). Iterative methods for sparse linear systems. SIAM,

http://www-users.cs.umn.edu/~saad/IterMethBook 2ndEd.pdf.

[Saad, 2011] Saad, Y. (2011). Numerical Methods for Large Eigenvalue Problems: Re-

vised Edition. SIAM, http://www-users.cs.umn.edu/~saad/eig book 2ndEd.pdf.

[Trefethen and Bau III, 1997] Trefethen, L. N. and Bau III, D. (1997). Numerical lin-

ear algebra, volume 50. SIAM, on overnight reserve in library.

[Winlaw et al., 2015] Winlaw, M., Hynes, M. B., Caterini, A., and De Sterck, H.

(2015). Algorithmic acceleration of parallel ALS for collaborative filtering: Speeding

up distributed big data recommendation in Spark. In 2015 IEEE 21st International

Conference on Parallel and Distributed Systems (ICPADS), pages 682–691. IEEE.

215