CMDA 3606 · MATHEMATICAL MODELING II

Problem Set 8 (for Week 12)

Posted 30 March 2021. Upload submissions before 11:59pm on Friday 9 April 2021.

Please submit your problem solutions as a single PDF document via upload on the course

Canvas site. Be sure to include your matlab code for exercises where it is relevant.

Basic guidelines: Students may discuss the problems on this assignment with other students currently

enrolled in CMDA 3606, but each student must submit his or her individual writeup and code. (In particular,

you must write up your own individual matlab code.) Cite your sources and list those with whom you have

discussed the problems. Students may consult the class notes and other online resources, but any use of

solutions generated outside of the current course context (including consultation with 3606 students from

previous semesters) is expressly forbidden and will be regarded as a violation of the Honor Code.

1. [30 points: 10 points for each part]

The Kaczmarz row projection method. Any iterative method for solving a linear system of

equations, Ax = b, will generate a sequence of approximate solutions, {x1, x2, ..., xk, ... }, in such a

way that when we are at the kth approximate solution, xk, the next approximation, xk+1, is cheap to

generate and is better in some sense than xk. Suppose A ∈ Rm×n and b ∈ Rm are written as

A =

− aT1 −

− aT2 −

...

− aTm −

and b =

b1

b2

...

bm

Starting from a current approximation xk, the Kaczmarz row projection method generates the next

approximation, xk+1, in the following way: If the current approximation, xk, satisfies all the equations

in Ax = b then xk is already a solution and we stop. Otherwise there will be at least one equation

in the system that is not satisfied — let’s say equation iˆ is not satisfied: aT

iˆ

xk 6= biˆ. Then we define

the next approximate solution, xk+1, to be the closest vector to the current approximation xk, that

satisfies the iˆ equation:

xk+1 solves min

x

‖x− xk‖ such that aTiˆ x = biˆ. (?)

(a) By using the pseudoinverse of the 1× n “matrix”, aT

iˆ

, show that the solution to ? is given by

xk+1 = xk + aiˆ

(

aT

iˆ

xk − biˆ

‖aiˆ‖2

)

(b) Suppose x∗ is a solution to Ax = b. The error in the current approximation is xk − x∗.

(i) Using the fact that aT

iˆ

x∗ = biˆ, show that the error in consecutive approximations satisfies:

xk+1 − x∗ = (xk − x∗)−

(

aiˆa

T

iˆ

aT

iˆ

aiˆ

)

(xk − x∗)

Note that since

(

aiˆa

T

iˆ

aT

iˆ

aiˆ

)

is an orthogonal projection and orthogonal projections can never increase

the length of vectors, ‖xk+1−x∗‖ ≤ ‖xk−x∗‖, with strict inequality holding whenever aTiˆ xk 6= biˆ.

Thus, the error steadily gets smaller, and the sequence of iterates steadily draws closer to the true

solution, x∗.

To elaborate on this, suppose A has only two rows, a1 and a2, and we apply the Kaczmarz update

repeatedly, alternating between the two rows. The first five steps are represented in the graphic

below with an arbitrary initial error e0 = x0 − x∗.

(ii) Explain (geometrically) the sequence of errors shown, ek = xk − x∗ with k = 1, 2, ...5.

(iii) Why does ek approach zero as k → ∞ ? How fast ? (e.g. show that ‖ek+1‖ ≤ | cos(θ)|‖ek‖

where θ is the angle between a1 and a2. As a consequence, ‖ek‖ ≤ | cos(θ)|k−1‖e0‖ → 0 as

k →∞ and the closer to orthogonal a1 and a2 are, the faster the convergence.)

(c) Perform some numerical experiments using TestKaczmarz.m from the course Canvas site. The

first three approaches in TestKaczmarz.m use the standard Kaczmarz update described above

and differ only with respect to how rows are selected for the next update:

• systematic row sweep (rows selected top to bottom, then repeated)

• random row selection with rows selected with uniform probability.

• random row selection with rows selected with probability weighted by row norm (larger norm

means greater probablility).

One additional approach is included in TestKaczmarz.m that is often used in machine learning -

rows are selected at random but instead of the standard Kaczmarz update, a diagonal scaling of

the update direction is used instead (this is called an adaptive gradient method or ADAGRAD)

(i) Comment on the effect of the three row selection schemes. Include plots to support your

discussion.

(ii) By selectively commenting/uncommenting statements in TestKaczmarz.m, consider both ran-

domly generated matrices and a (nonrandom) graded matrix (i.e., a matrix having row norms

that span many orders of magnitude)

(iii) What happens if noise contaminates b so the linear system is no longer consistent ?

2. [35 points: 5 points each for (a), (d), and (e); 12 points for (b), 8 points for (c)]

At its kth step, an iterative method for solving Ax = b constructs an approximate solution xk. With

this iterate we associate the residual vector

rk = b−Axk.

The method described in Problem 1 focussed on driving the iterate xk toward the true solution x∗.

An alternative approach for developing an iterative solution strategy for solving Ax = b could instead

focus on driving the residual rk → 0 as quickly as possible.

Lewis Fry Richardson proposed that xk be constructed as follows Let x0 = 0 so that r0 = b. Then for

a fixed constant τ , define

xk+1 = xk + τ rk.

(a) Show that you can write rk = (I− τA)kb.

(b) Suppose that A is diagonalizable, A = VΛV−1 =

∑n

j=1 λjvjv̂

∗

j .

(i) Show that I− τA has the same eigenvectors as A.

(ii) What are the eigenvalues of I− τA?

(iii) What are the eigenvalues of (I− τA)k?

(iv) Explain why all eigenvalues of I − τA must have absolute value less than one in order to

ensure that rk → 0.

(c) Suppose further that all eigenvalues λ of A satisfy 1 ≤ λ ≤ 2.

(i) For what real numbers τ will rk → 0?

(ii) What optimal choice of τ will drive rk → 0 most rapidly, making the largest absolute value

of the eigenvalues of I− τA as small as possible (based only on knowing that 1 ≤ λ ≤ 2)?

For the rest of the problem, consider this 500× 500 matrix A and accompanying right-hand side:

A =

1

4

6 1

1 6

. . .

. . .

. . . 1

1 6

, b =

1

1

...

1

,

where the unspecified entries of A are zero. The eigenvalues of this matrix satisfy 1 ≤ λ ≤ 2.

(You actually could calculate the eigenvalues of A explicitly after recognizing that this matrix is a

modification of one you examined on Problem Set 2. You don’t need to do this here.)

(d) Produce a semilogy plot showing ‖rk‖ (vertical axis) versus k (horizontal axis) for the optimal

τ you determined in part (c).

(e) The polynomial ψk(z) = (1 − τ z)k should be small on the interval 1 ≤ z ≤ 2, and satisfy

ψk(0) = 1. Confirm these properties by producing a plot showing ψk(z) (vertical axis) versus z

(horizontal axis) for −1 ≤ z ≤ 3 for k = 1, 2, 3, 4, 5. (Show all five graphs on the same plot.) Use

axis([-3 3 -.5 1.5]) to crop the axes to the most relevant region.

3. [35 points: 5 each for (a), (b), and (d) 10 points for (c) and (e)]

One can extend the basic Richardson Iteration described in Problem 2 in a number of ways. One

approach starts with a matrix splitting A = M − N (See CMDA3606_Week11_LectC) where M is

invertible and an associated stationary iteration is defined as

xk+1 = M

−1Nxk + M−1b.

(a) Defining the residual associated with an approximate solution xk as rk = b−Axk, show that the

stationary iteration associated with the above splitting can be rewritten as

xk+1 = xk + M

−1rk.

The matrix M is often a “preconditioner” called the “preconditioning matrix”. One generally

seeks an M such that My = c is very cheap to solve for any c, in order to make each iteration

cheap. Note that if M = A then the iteration yields the exact solution in a single step, but almost

certainly M = A will not be a choice for which M is easy to invert. More typical would taking

M to be the diagonal (or diagonal blocks) of A (“Jacobi” or “block Jacobi” preconditioning) or

taking M to be the lower triangular part of A (“Gauss-Seidel” preconditioning).

(b) Show that the iteration in 2(a) can be rewritten as a combined iteration:

Solve Msk = rk

xk+1 = xk + sk

rk+1 = rk −Ask

with x0 = 0 and r0 = b.

(c) Now we try to accelerate convergence using the mechanism of Richardson Iteration; we insert a

parameter “τ” and then choose a good value for τ . Consider then

xk+1 = xk + τM

−1rk.

Following the reasoning of Problem 1, and assuming that the eigenvalues of AM−1 are real and all

within the interval (0, 2), find the optimum value of τ that produces the fastest rate of convergence

in the above iteration. Express τ in terms of the largest and smallest eigenvalues of AM−1.

(d) Try the iteration using the optimal τ determined in (c), on the 500×500 linear system considered

in Problem 1, taking for M the diagonal of A: M = 64I. Produce a semilogy plot showing ‖rk‖

(vertical axis) versus k (horizontal axis) for the optimal τ you determined here and compare your

results with what you found in part 1(c).

(e) We extend the idea of Richardson Iteration further by using a different parameter at each iteration

step:

xk+1 = xk + τkM

−1rk

This is a nonstationary iteration since the transformation that takes xk to xk+1 changes from

step to step. This is often called “polynomial acceleration” because if we define the kth degree

polynomial, ψk(z) =

∏k

i=1(1− τiz), then

rk =

k∏

i=1

(I− τiAM−1)r0 = ψk(AM−1)b.

which extends a similar fact that was described in Problem 1(a) above.

i. If we are given τ0, τ1, ..., τk, show that indeed rk = ψk(AM

−1)b.

ii. Show that the eigenvalues of ψk(AM

−1) are exactly ψk(γj) for each eigenvalue γj of AM−1.

欢迎咨询51作业君