CMDA 3606 · MATHEMATICAL MODELING II
Problem Set 8 (for Week 12)
Posted 30 March 2021. Upload submissions before 11:59pm on Friday 9 April 2021.
Canvas site. Be sure to include your matlab code for exercises where it is relevant.
Basic guidelines: Students may discuss the problems on this assignment with other students currently
enrolled in CMDA 3606, but each student must submit his or her individual writeup and code. (In particular,
you must write up your own individual matlab code.) Cite your sources and list those with whom you have
discussed the problems. Students may consult the class notes and other online resources, but any use of
solutions generated outside of the current course context (including consultation with 3606 students from
previous semesters) is expressly forbidden and will be regarded as a violation of the Honor Code.
1. [30 points: 10 points for each part]
The Kaczmarz row projection method. Any iterative method for solving a linear system of
equations, Ax = b, will generate a sequence of approximate solutions, {x1, x2, ..., xk, ... }, in such a
way that when we are at the kth approximate solution, xk, the next approximation, xk+1, is cheap to
generate and is better in some sense than xk. Suppose A ∈ Rm×n and b ∈ Rm are written as
A =

− aT1 −
− aT2 −
...
− aTm −
 and b =

b1
b2
...
bm

Starting from a current approximation xk, the Kaczmarz row projection method generates the next
approximation, xk+1, in the following way: If the current approximation, xk, satisfies all the equations
in Ax = b then xk is already a solution and we stop. Otherwise there will be at least one equation
in the system that is not satisfied — let’s say equation iˆ is not satisfied: aT

xk 6= biˆ. Then we define
the next approximate solution, xk+1, to be the closest vector to the current approximation xk, that
satisfies the iˆ equation:
xk+1 solves min
x
‖x− xk‖ such that aTiˆ x = biˆ. (?)
(a) By using the pseudoinverse of the 1× n “matrix”, aT

, show that the solution to ? is given by
xk+1 = xk + aiˆ
(
aT

xk − biˆ
‖aiˆ‖2
)
(b) Suppose x∗ is a solution to Ax = b. The error in the current approximation is xk − x∗.
(i) Using the fact that aT

x∗ = biˆ, show that the error in consecutive approximations satisfies:
xk+1 − x∗ = (xk − x∗)−
(
aiˆa
T

aT

aiˆ
)
(xk − x∗)
Note that since
(
aiˆa
T

aT

aiˆ
)
is an orthogonal projection and orthogonal projections can never increase
the length of vectors, ‖xk+1−x∗‖ ≤ ‖xk−x∗‖, with strict inequality holding whenever aTiˆ xk 6= biˆ.
Thus, the error steadily gets smaller, and the sequence of iterates steadily draws closer to the true
solution, x∗.
To elaborate on this, suppose A has only two rows, a1 and a2, and we apply the Kaczmarz update
repeatedly, alternating between the two rows. The first five steps are represented in the graphic
below with an arbitrary initial error e0 = x0 − x∗.
(ii) Explain (geometrically) the sequence of errors shown, ek = xk − x∗ with k = 1, 2, ...5.
(iii) Why does ek approach zero as k → ∞ ? How fast ? (e.g. show that ‖ek+1‖ ≤ | cos(θ)|‖ek‖
where θ is the angle between a1 and a2. As a consequence, ‖ek‖ ≤ | cos(θ)|k−1‖e0‖ → 0 as
k →∞ and the closer to orthogonal a1 and a2 are, the faster the convergence.)

(c) Perform some numerical experiments using TestKaczmarz.m from the course Canvas site. The
first three approaches in TestKaczmarz.m use the standard Kaczmarz update described above
and differ only with respect to how rows are selected for the next update:
• systematic row sweep (rows selected top to bottom, then repeated)
• random row selection with rows selected with uniform probability.
• random row selection with rows selected with probability weighted by row norm (larger norm
means greater probablility).
One additional approach is included in TestKaczmarz.m that is often used in machine learning -
rows are selected at random but instead of the standard Kaczmarz update, a diagonal scaling of
(i) Comment on the effect of the three row selection schemes. Include plots to support your
discussion.
(ii) By selectively commenting/uncommenting statements in TestKaczmarz.m, consider both ran-
domly generated matrices and a (nonrandom) graded matrix (i.e., a matrix having row norms
that span many orders of magnitude)
(iii) What happens if noise contaminates b so the linear system is no longer consistent ?
2. [35 points: 5 points each for (a), (d), and (e); 12 points for (b), 8 points for (c)]
At its kth step, an iterative method for solving Ax = b constructs an approximate solution xk. With
this iterate we associate the residual vector
rk = b−Axk.
The method described in Problem 1 focussed on driving the iterate xk toward the true solution x∗.
An alternative approach for developing an iterative solution strategy for solving Ax = b could instead
focus on driving the residual rk → 0 as quickly as possible.
Lewis Fry Richardson proposed that xk be constructed as follows Let x0 = 0 so that r0 = b. Then for
a fixed constant τ , define
xk+1 = xk + τ rk.
(a) Show that you can write rk = (I− τA)kb.
(b) Suppose that A is diagonalizable, A = VΛV−1 =
∑n
j=1 λjvjv̂

j .
(i) Show that I− τA has the same eigenvectors as A.
(ii) What are the eigenvalues of I− τA?
(iii) What are the eigenvalues of (I− τA)k?
(iv) Explain why all eigenvalues of I − τA must have absolute value less than one in order to
ensure that rk → 0.
(c) Suppose further that all eigenvalues λ of A satisfy 1 ≤ λ ≤ 2.
(i) For what real numbers τ will rk → 0?
(ii) What optimal choice of τ will drive rk → 0 most rapidly, making the largest absolute value
of the eigenvalues of I− τA as small as possible (based only on knowing that 1 ≤ λ ≤ 2)?
For the rest of the problem, consider this 500× 500 matrix A and accompanying right-hand side:
A =
1
4

6 1
1 6
. . .
. . .
. . . 1
1 6
 , b =

1
1
...
1
 ,
where the unspecified entries of A are zero. The eigenvalues of this matrix satisfy 1 ≤ λ ≤ 2.
(You actually could calculate the eigenvalues of A explicitly after recognizing that this matrix is a
modification of one you examined on Problem Set 2. You don’t need to do this here.)
(d) Produce a semilogy plot showing ‖rk‖ (vertical axis) versus k (horizontal axis) for the optimal
τ you determined in part (c).
(e) The polynomial ψk(z) = (1 − τ z)k should be small on the interval 1 ≤ z ≤ 2, and satisfy
ψk(0) = 1. Confirm these properties by producing a plot showing ψk(z) (vertical axis) versus z
(horizontal axis) for −1 ≤ z ≤ 3 for k = 1, 2, 3, 4, 5. (Show all five graphs on the same plot.) Use
axis([-3 3 -.5 1.5]) to crop the axes to the most relevant region.
3. [35 points: 5 each for (a), (b), and (d) 10 points for (c) and (e)]
One can extend the basic Richardson Iteration described in Problem 2 in a number of ways. One
approach starts with a matrix splitting A = M − N (See CMDA3606_Week11_LectC) where M is
invertible and an associated stationary iteration is defined as
xk+1 = M
−1Nxk + M−1b.
(a) Defining the residual associated with an approximate solution xk as rk = b−Axk, show that the
stationary iteration associated with the above splitting can be rewritten as
xk+1 = xk + M
−1rk.
The matrix M is often a “preconditioner” called the “preconditioning matrix”. One generally
seeks an M such that My = c is very cheap to solve for any c, in order to make each iteration
cheap. Note that if M = A then the iteration yields the exact solution in a single step, but almost
certainly M = A will not be a choice for which M is easy to invert. More typical would taking
M to be the diagonal (or diagonal blocks) of A (“Jacobi” or “block Jacobi” preconditioning) or
taking M to be the lower triangular part of A (“Gauss-Seidel” preconditioning).
(b) Show that the iteration in 2(a) can be rewritten as a combined iteration:
Solve Msk = rk
xk+1 = xk + sk
with x0 = 0 and r0 = b.
(c) Now we try to accelerate convergence using the mechanism of Richardson Iteration; we insert a
parameter “τ” and then choose a good value for τ . Consider then
xk+1 = xk + τM
−1rk.
Following the reasoning of Problem 1, and assuming that the eigenvalues of AM−1 are real and all
within the interval (0, 2), find the optimum value of τ that produces the fastest rate of convergence
in the above iteration. Express τ in terms of the largest and smallest eigenvalues of AM−1.
(d) Try the iteration using the optimal τ determined in (c), on the 500×500 linear system considered
in Problem 1, taking for M the diagonal of A: M = 64I. Produce a semilogy plot showing ‖rk‖
(vertical axis) versus k (horizontal axis) for the optimal τ you determined here and compare your
results with what you found in part 1(c).
(e) We extend the idea of Richardson Iteration further by using a different parameter at each iteration
step:
xk+1 = xk + τkM
−1rk
This is a nonstationary iteration since the transformation that takes xk to xk+1 changes from
step to step. This is often called “polynomial acceleration” because if we define the kth degree
polynomial, ψk(z) =
∏k
i=1(1− τiz), then
rk =
k∏
i=1
(I− τiAM−1)r0 = ψk(AM−1)b.
which extends a similar fact that was described in Problem 1(a) above.
i. If we are given τ0, τ1, ..., τk, show that indeed rk = ψk(AM
−1)b.
ii. Show that the eigenvalues of ψk(AM
−1) are exactly ψk(γj) for each eigenvalue γj of AM−1.

Email:51zuoyejun

@gmail.com