MAT1841 Continuous Mathematics for Computer Science Lecture notes, Semester 1 2017 These notes have been adapted from previous lecture notes written by several members of the School of Mathematical Sciences. The notes were been compiled by Anne Eastaugh and Jennifer Flegg (2016). Edited by Daniel McInnes for 2017. Contents 1 Vectors, Lines and Planes 3 1.1 Introduction to Vectors . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Notation and definition . . . . . . . . . . . . . . . . . 3 1.1.2 Linear independence . . . . . . . . . . . . . . . . . . . 5 1.1.3 Algebraic properties . . . . . . . . . . . . . . . . . . . 5 1.2 Vector Dot Product . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.1 Length of a vector . . . . . . . . . . . . . . . . . . . . 8 1.2.2 Unit Vectors . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.3 Scalar projections . . . . . . . . . . . . . . . . . . . . 9 1.2.4 Vector projection . . . . . . . . . . . . . . . . . . . . . 10 1.3 Vector Cross Product . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1 Interpreting the cross product . . . . . . . . . . . . . . 13 1.3.2 Right hand thumb rule . . . . . . . . . . . . . . . . . . 13 1.4 Lines in 3-dimensional space . . . . . . . . . . . . . . . . . . . 16 1.4.1 Vector equation of a line . . . . . . . . . . . . . . . . . 18 1.5 Planes in 3-dimensional space . . . . . . . . . . . . . . . . . . 20 1.5.1 Constructing the equation of a plane . . . . . . . . . . 20 1.5.2 Parametric equations for a plane . . . . . . . . . . . . 21 1.5.3 Vector equation of a plane . . . . . . . . . . . . . . . . 22 1.6 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . 25 1.6.1 Examples of Linear Systems . . . . . . . . . . . . . . . 25 1.6.2 A standard strategy . . . . . . . . . . . . . . . . . . . 26 1.6.3 Points, lines and planes - intersections . . . . . . . . . 27 1.6.4 Points, lines and planes - distances . . . . . . . . . . . 29 1.6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 32 2 Matrices 36 2.1 Introduction - notation and operations . . . . . . . . . . . . . 36 2.1.1 Operations on matrices . . . . . . . . . . . . . . . . . 37 2.1.2 Some special matrices . . . . . . . . . . . . . . . . . . 38 2.1.3 Properties of matrices . . . . . . . . . . . . . . . . . . 38 2.1.4 Inverses of square matrices . . . . . . . . . . . . . . . 39 2.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . 39 ii 2.2.1 Gaussian elimination strategy . . . . . . . . . . . . . . 40 2.2.2 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . 40 2.3 Systems of equations using matrices . . . . . . . . . . . . . . 42 2.3.1 The augmented matrix . . . . . . . . . . . . . . . . . . 43 2.4 Row echelon form . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.4.1 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4.2 Homogeneous systems . . . . . . . . . . . . . . . . . . 47 2.4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.5 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.6 Matrix Transpose . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.7 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.7.1 Properties of determinants . . . . . . . . . . . . . . . 53 2.7.2 Vector cross product using determinants . . . . . . . . 53 2.7.3 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . 54 2.8 Obtaining inverses using Gauss-Jordan elimination . . . . . . 54 2.8.1 Inverse - another method . . . . . . . . . . . . . . . . 56 3 Calculus 57 3.1 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.1.1 Rate of change . . . . . . . . . . . . . . . . . . . . . . 57 3.1.2 Definition of the derivative f ′(x) and the slope of a tangent line . . . . . . . . . . . . . . . . . . . . . . . . 58 3.1.3 Techniques of differentiation - rules . . . . . . . . . . . 60 3.2 Maximum and minimum of functions . . . . . . . . . . . . . . 62 3.3 Differentiating inverse, circular and exponential functions . . 67 3.3.1 Inverse functions and their derivatives . . . . . . . . . 67 3.3.2 Exponential and logarithmic functions: ex and lnx . . 68 3.3.3 Derivatives of circular functions . . . . . . . . . . . . . 71 3.4 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . 76 3.5 Parametric curves and differentiation . . . . . . . . . . . . . . 79 3.5.1 Parametric curves . . . . . . . . . . . . . . . . . . . . 79 3.5.2 Parametric differentiation . . . . . . . . . . . . . . . . 80 3.6 Function approximations . . . . . . . . . . . . . . . . . . . . . 83 3.6.1 Introduction to power series . . . . . . . . . . . . . . . 83 3.6.2 Power series . . . . . . . . . . . . . . . . . . . . . . . . 86 3.6.3 Taylor series . . . . . . . . . . . . . . . . . . . . . . . 88 3.6.4 Derivation of Taylor polynomials from first principles 91 3.6.5 Cubic splines interpolation . . . . . . . . . . . . . . . 94 4 Integration 101 4.1 Fundamental theorem of calculus . . . . . . . . . . . . . . . . 101 4.1.1 Revision . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.1.2 Fundamental theorem . . . . . . . . . . . . . . . . . . 104 4.2 Area under the curve . . . . . . . . . . . . . . . . . . . . . . . 105 iii 4.3 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . . 108 5 Multivariable Calculus 111 5.1 Functions of several variables . . . . . . . . . . . . . . . . . . 111 5.1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.1.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.1.3 Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.1.4 Alternative forms . . . . . . . . . . . . . . . . . . . . . 116 5.2 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . 118 5.2.1 First partial derivatives . . . . . . . . . . . . . . . . . 118 5.3 The tangent plane . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3.1 Geometric interpretation . . . . . . . . . . . . . . . . . 121 5.3.2 Linear approximations . . . . . . . . . . . . . . . . . . 124 5.4 Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.5 Gradient and Directional Derivative . . . . . . . . . . . . . . 131 5.6 Second order partial derivatives . . . . . . . . . . . . . . . . . 135 5.6.1 Taylor polynomials of higher degree . . . . . . . . . . 138 5.6.2 Exceptions: when derivatives do not exist . . . . . . . 141 5.7 Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.7.1 Finding stationary points . . . . . . . . . . . . . . . . 142 5.7.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.7.3 Maxima, Minima or Saddle point . . . . . . . . . . . . 145 5.7.4 Application of extrema . . . . . . . . . . . . . . . . . . 148 1 Reference books 1. TBC =================== 2 Chapter 1 Vectors, Lines and Planes 1.1 Introduction to Vectors 1.1.1 Notation and definition Common forms of vector notation are bold symbols (v), arrow notation (~v) and tilde notation (v˜). Throughout this Study Guide we will use the tildenotation. This notation compares suitably with the handwritten notation for vectors. Points in space are represented by a capital letter (for example the point P ). Note that capital letters are also used for matrices, but con- sidering each of these objects, this does not lead to ambiguity. ————————————————– Vectors can be defined in (at least) two ways - algebraically as objects like v˜ = (1, 7, 3) u˜ = (2,−1, 4) or geometrically as arrows in space. 3 Note that vectors have both magnitude and direction. A quantity speci- fied only by a number (but no direction) is known as a scalar. How can we be sure that these two definitions actually describe the same object? Equally, how do we convert from one form to the other? That is, given v˜ = (1, 2, 7) how do we draw the arrow and likewise, given the arrowhow do we extract the numbers (1, 2, 7)? Suppose we are give two points P and Q. Suppose also that we find the change in coordinates from P to Q is (say) (1, 2, 7). We could also draw an arrow from P to Q. Thus we have two ways of recording the path from P to Q, either as the numbers (1, 2, 7) or the arrow. Suppose now that we have another pair of points R and S and further that we find the change in coordinates to be (1, 2, 7). Again, we can join the points with an arrow. This arrow will have the same direction and length as that for P to Q. In both cases, the displacement, from start to finish, is represented by either the numbers (1, 2, 7) or the arrow – thus we can use either form to represent the vector. Note that this means that a vector does not live at any one place in space – it can be moved anywhere provided its length and direction are unchanged. To extract the numbers (1, 2, 7) given just the arrow, simply place the arrow somewhere in the x, y, z space, and the measure the change in coordinates from tail to tip of the vector. Equally, to draw the vector given the numbers (1, 2, 7), choose (0, 0, 0) as the tail then the point (1, 2, 7) is the tip. The components of a vector are just the numbers we use to describe the vector. In the above, the components of v˜ are 1,2 and 7. Another very common way to write a vector, such as v˜ = (1, 7, 3) for exam-ple, is v˜ = 1i˜+ 7j˜+ 3k˜. The three vectors i˜, j˜, k˜ are a simple way to remindus that the three numbers in v˜ = (1, 7, 3) refer to directions parallel to thethree coordinate axes (with i˜ parallel to the x-axis, j˜ parallel to the y-axisand k˜ parallel to the z-axis). 4 In this way we can always write down any 3-dimensional vector as a linear combination of the vectors i˜, j˜, k˜ and thus these vectors are also known asbasis vectors. 1.1.2 Linear independence Two or more vectors are linearly independent if we cannot take any one of the vectors and write it as a linear combination of the others. We cannot write i˜ as a linear combination of j˜ and k˜. In other words there areno non-zero scalars α and β such that i˜= αj˜+ βk˜. Thus the basis vectorsi˜, j˜, k˜ are linearly independent. If we can take a vector and write it as a linear combination of other vectors, then those vectors are known as linearly dependent. For example the vectors u˜ = (7, 17,−3), v˜ = (1, 2, 3) and w˜ = (3, 7, 1) are linearly dependentas u˜ = 3w˜ − 2v˜. 1.1.3 Algebraic properties What rules must we observe when we are working with vectors? • Equality v˜ = w˜ only when the arrows for v˜ and w˜ are identical. • Stretching (scalar multiple) The vector λv˜ is parallel to v˜ but is stretched by a factor λ. The 5 magnitude of λv˜ is |λ| times the magnitude of v˜. • Addition To add two vectors v˜ and w˜ arrange the two so that they are tip to tail.Then v˜+ w˜ is the vector that starts at the first tail and ends at thesecond tip. Thus the sum of two vectors v˜ and w˜ is the displacementvector resulting from first applying v˜ then w˜ . • Subtraction The difference v˜−w˜ of two vectors v˜ and w˜ is the displacement vectorresulting from first applying v˜ then −w˜ . Note that −w˜ is simply thevector w˜ now pointing in the opposite direction to w˜ . Example 1.1. Express each of the above rules in terms of the components of vectors (i.e. in terms of numbers like (1, 2, 7) and (a, b, c)). Example 1.2. Given v˜ = (3, 4, 2) and w˜ = (1, 2, 3) compute v˜ + w˜ and2v˜+ 7w˜ . 6 Example 1.3. Given v˜ = (1, 2, 7) draw v˜, 2v˜ and −v˜. Example 1.4. Given v˜ = (1, 2, 7) and w˜ = (3, 4, 5) draw and compute v˜−w˜ . 1.2 Vector Dot Product How do we multiply vectors? We have already seen one form, where we stretch v˜ by a scalar λ, ie v˜→ λv˜. This is called scalar multiplication. Another form is the vector dot product. Let v˜ = (vx, vy, vz) and w˜ =(wx, wy, wz) be a pair of vectors, then we define the dot product v˜ · w˜ by v˜ · w˜ = vxwx + vywy + vzvz . Example 1.5. Let v˜ = (1, 2, 7) and w˜ = (−1, 3, 4). Compute v˜ · v˜, w˜ · w˜and v˜ · w˜ What do we observe? • v˜ · w˜ is a single number not a vector (ie it is a scalar) • v˜ · w˜ = w˜ · v˜ • (λv˜) · w˜ = λ(v˜ · w˜) • (a˜+ b˜) · v˜ = a˜ · v˜+ b˜ · v˜ The last two cases display what we call linearity. 7 1.2.1 Length of a vector The length of a vector v˜ is defined by |v˜| = √v˜ · v˜. The notation |v˜| should be distinguished from the absolute value for a scalar(for example | − 5| = 5 ). The length of a vector is one example of a norm, which is a quantity used in higher level mathematics. Example 1.6. Let v˜ = (1, 2, 7). Compute the distance from (0, 0, 0) to(1, 2, 7). Compare this with √v˜ · v˜. We can now show that v˜ · w˜ = |v˜||w˜ | cos θ where |v˜| = the length of v˜ = ( v2x + v 2 y + v 2 z )1/2 |w˜ | = the length of w˜ = ( w2x + w 2 y + w 2 z )1/2 and θ is the angle between the two vectors. How do we prove this? Simply start with v˜− w˜ and compute its length, |v˜− w˜ |2 = (v˜− w˜) · (v˜− w˜) = v˜ · v˜− v˜ · w˜ − w˜ · v˜+ w˜ · w˜ = |v˜|2 + |w˜ |2 − 2v˜ · w˜ and from the Cosine Rule for triangles we know |v˜− w˜ |2 = |v˜|2 + |w˜ |2 − 2|v˜||w˜ | cos θ Thus we have v˜ · w˜ = |v˜||w˜ | cos θ This gives us a convenient way to compute the angle between any pair of vectors. If we find cos θ = 0 then we can say that v˜ and w˜ are orthogonal(perpendicular). 8 • Vectors v˜ and w˜ are orthogonal when v˜ · w˜ = 0 (provided neither v˜nor w˜ are zero). Example 1.7. Find the angle between the vectors v˜ = (2, 7, 1) and w˜ =(3, 4,−2) 1.2.2 Unit Vectors A vector is said to be a unit vector if its length is one. That is, v˜ is a unitvector when v˜ · v˜ = 1. The notation for a unit vector is vˆ˜ (called ‘v˜ hat’). Unit vectors are calculated by: vˆ˜ = v˜|v˜| 1.2.3 Scalar projections This is simply the length of the shadow cast by one vector onto another. The scalar projection, vw, of v˜ in the direction of w˜ is given by vw = v˜ · w˜|w˜ | Example 1.8. What is the length (i.e. scalar projection) of v˜ = (1, 2, 7) inthe direction of the vector w˜ = (2, 3, 4)? 9 1.2.4 Vector projection This time we produce a vector shadow with length equal to the scalar pro- jection. The vector projection, v˜w, of v˜ in the direction of w˜ is given by v˜w = ( v˜ · w˜|w˜ |2 ) w˜ Example 1.9. Find the vector projection of v˜ = (1, 2, 7) in the direction ofw˜ = (2, 3, 4) Example 1.10. This example shows how a vector may be resolved into its parts parallel and perpendicular to another vector. Given v˜ = (1, 2, 7) and w˜ = (2, 3, 4) express v˜ in terms of w˜ and a vectorperpendicular to w˜ . 10 Vector Dot Product - Summary Let v˜ = (vx, vy, vz) and w˜ = (wx, wy, wz). Then the Dot Productof v˜ and w˜ is the scalar defined by v˜ · w˜ = vxwx + vywy + vzvz Consider the angle θ between the two vectors such that 0 ≤ θ ≤ pi: Then cos θ = v˜.w˜|v˜||w˜ | Two vectors are orthogonal if and only if v˜.w˜ = 0 The scalar projection, vw, of v˜ in the direction of w˜ is given by vw = v˜ · w˜|w˜ | The vector projection, v˜w, of v˜ in the direction of w˜ is given by v˜w = ( v˜ · w˜|w˜ |2 ) w˜ 1.3 Vector Cross Product The vector cross product is another way to multiply vectors. We start with vectors v˜ = (vx, vy, vz) and w˜ = (wx, wy, wz). Then we define the crossproduct v˜× w˜ by v˜× w˜ = (vywz − vzwy, vzwx − vxwz, vxwy − vywx) 11 From this definition we can observe • v˜× w˜ is a vector • v˜× w˜ = −w˜ × v˜ • v˜× v˜ = 0˜ = (0, 0, 0) (the zero vector) • (λv˜)× w˜ = λ(v˜× w˜) • (a˜+ b˜)× v˜ = a˜× v˜+ b˜× v˜ • (v˜× w˜) · v˜ = (v˜× w˜) · w˜ = 0 Example 1.11. Verify all of the above. Example 1.12. Given v˜ = (1, 2, 7) and w˜ = (−2, 3, 5) compute v˜× w˜ , andits dot product with each of v˜ and w˜ . 12 1.3.1 Interpreting the cross product We know that v˜×w˜ is a vector and we know how to compute it. But can wedescribe this vector? First we need a vector, so let’s assume that v˜×w˜ 6= 0˜.Then what can we say about the direction and length of v˜× w˜? The first thing we should note is that the cross product is a vector which is orthogonal to both of the original vectors. Thus v˜× w˜ is a vector that isorthogonal to v˜ and to w˜ . This fact follows from the definition of the crossproduct. Thus we must have v˜× w˜ = λn˜ where n˜ is a unit vector orthogonal to both v˜ and w˜ and λ is some unknownnumber (at this stage). How do we construct n˜ and λ? Let’s do it! 1.3.2 Right hand thumb rule For any choice of v˜ and w˜ you can see that there are two choices for n˜ – onepoints in the opposite direction to the other. Which one do we choose? It’s up to us to make a hard rule. This is it. Place your right hand palm so that your fingers curl over from v˜ to w˜ . Your thumb then points in the directionof v˜× w˜ . Now for λ, we will show that |v˜× w˜ | = λ = |v˜||w˜ | sin θ How? First we build a triangle from v˜ and w˜ and then compute the crossproduct for each pair of vectors v˜× w˜ = λθn˜ (v˜− w˜)× v˜ = λφn˜ (v˜− w˜)× w˜ = λρn˜ 13 (one λ for each of the three vertices). We need to compute each λ. Now since (βv˜) × w˜ = β(v˜ × w˜) for any number β we must have λθ inv˜× w˜ = λθn˜ proportional to |v˜||w˜ |, likewise for the other λ’s. Thus λθ = |v˜||w˜ |αθ λφ = |v˜||v˜− w˜ |αφ λρ = |w˜ ||v˜− w˜ |αρ where each α depends only on the angle between the two vectors on which it was built (i.e. αφ depends only on the angle φ between v˜ and v˜− w˜). But we also have v˜ × w˜ = (v˜ − w˜) × v˜ = (v˜ − w˜) × w˜ which implies thatλθ = λφ = λρ which in turn gives us αθ |v˜− w˜ | = αφ |w˜ | = αρ |v˜| But we also have the Sine Rule for triangles sin θ |v˜− w˜ | = sinφ |w˜ | = sin ρ |v˜| and so αθ = k sin θ, αφ = k sinφ, αρ = k sin ρ where k is a number that does not depend on any of the angles nor on any of lengths of the edges – the value of k is the same for every triangle. We can choose a trivial case to compute k, simply put v˜ = (1, 0, 0) and w˜ = (0, 1, 0).Then we find k = 1. We have now found that |v˜× w˜ | = |v˜||w˜ | sin θ ———————————————————— Example 1.13. Show that |v˜×w˜ | also equals the area of the parallelogramformed by v˜ and w˜ . 14 Vector Cross Product - Summary Let v˜ = (vx, vy, vz) and w˜ = (wx, wy, wz). Then the Cross Productis defined by v˜× w˜ = (vywz − vzwy, vzwx − vxwz, vxwy − vywx) v˜ × w˜ = −w˜ × v˜ gives a vector orthogonal to both v˜ and w˜ , anddefined by the right-hand rule: If θ is the angle between v˜ and w˜ , such that 0 ≤ θ ≤ pi: sin θ = |v˜× w˜ ||v˜||w˜ | . Two vectors are parallel if and only if v˜× w˜ = 0˜. The area of the parallelogram spanned by vectors v˜ and w˜ is: A = |v˜× w˜ |. 15 1.4 Lines in 3-dimensional space Through any pair of distinct points we can always construct a straight line. These lines are normally drawn to be infinitely long in both directions. Example 1.14. Find all points on the line joining (2, 4, 0) and (2, 4, 7). Example 1.15. Find all points on the line joining (2, 0, 0) and (2, 4, 7). These equations for the line are all of the form x(t) = a+ pt , y(t) = b+ qt , z(t) = c+ rt where t is a parameter (it selects each point on the line) and the numbers a, b, c, p, q, r are computed from the coordinates of two points on the line. (There are other ways to write an equation for a line.) How do we compute a, b, c, p, q, r? It is a simple recipe. • First put t = 0, then x = a, y = b, z = c. That is (a, b, c) are the coordinates of one point (such as P ) on the line and so a, b, c are known. • Next, put t = 1, then x = a+ p, y = b+ q, z = c+ r. Take this to be the second point (such as Q) on the line, and thus solve for p, q, r. A common interpretation is that (a, b, c) are the coordinates of one (any) point on the line and (p, q, r) are the components of a (any) vector parallel to the line. 16 Example 1.16. Find the equation of the line joining the two points (1, 7, 3) and (2, 0,−3). Example 1.17. Show that a line may also be expressed as x− a p = y − b q = z − c r provided p 6= 0, q 6= 0 and r 6= 0. This is known as the Symmetric Form of the equation for a straight line. Example 1.18. In some cases you may find a small problem with the form suggested in the previous example. What is that problem and how would you deal with it? Example 1.19. Determine if the line defined by the points (1, 0, 1) and (1, 2, 0) intersects with the line defined by the points (3,−1, 0) and (1, 2, 5). Example 1.20. Is the line defined by the points (3, 7,−1) and (2,−2, 1) parallel to the line defined by the points (1, 4,−1) and (0,−5, 1). Example 1.21. Is the line defined by the points (3, 7,−1) and (2,−2, 1) parallel to the line defined by the points (1, 4,−1) and (−2,−23, 5). 17 1.4.1 Vector equation of a line The parametric equations of a line are x(t) = a+ pt y(t) = b+ qt z(t) = c+ rt Note that (a, b, c) = the vector to one point (P ) on the line (p, q, r) = the vector from the first point to the second point on the line (P to Q) = a vector parallel to the line Let’s relabel these and put d˜ = (a, b, c), v˜ = (p, q, r) and r˜(t) = (x(t), y(t), z(t)),then r˜(t) = d˜+ tv˜ This is known as the vector equation of a line. Example 1.22. Write down the vector equation of the line that passes through the points (1, 2, 7) and (2, 3, 4). Example 1.23. Write down the vector equation of the line that passes through the points (2, 3, 7) and (4, 1, 2). 18 Lines in R3 The vector equation of a line L is determined using a point P on the line and a vector v˜ in the direction of the line. Let d˜ = (a, b, c) be the position vector of P , and v˜ = (p, q, r) bet a vector parallel to the line, then a line is defined as all vectors which pass through the point P and are parallel to the vector v˜. Thus the vector (or parametric) equation of the line L is given by r˜(t) = d˜+ tv˜ where t is a parameter. As t is varied all the points on L are traced out. 19 1.5 Planes in 3-dimensional space A plane in 3-dimensional space is a flat 2-dimensional surface. The standard equation for a plane in 3-d is ax+ by + cz = d where a, b, c and d are some bunch of numbers that identify this plane from all other planes. (There are other ways to write an equation for a plane, as we shall see). Example 1.24. Sketch each of the planes z = 1, y = 3 and x = 1. 1.5.1 Constructing the equation of a plane A plane is uniquely determined by any three points (provided not all three points are contained on a line). Recall, that a line is fully determined by any pair of points on the line. We can find the equation of the plane that passes through the three points (1, 0, 0), (0, 3, 0) and (0, 0, 2). To do this we need to compute a, b, c and d. We do this by substituting each point into the above equation, 1st point a · 1 + b · 0 + c · 0 = d 2nd point a · 0 + b · 3 + c · 0 = d 3rd point a · 0 + b · 0 + c · 2 = d Now we have a slight problem, we are trying to compute four numbers, a, b, c, d but we only have three equations. We have to make an arbitrary choice for one of the four numbers a, b, c, d. Let’s set d = 6. Then we find from the above that a = 6, b = 2 and c = 3. Thus the equation of the plane is 6x+ 2y + 3z = 6 Example 1.25. What equation do you get if you chose d = 1 in the previous example? What happens if you chose d = 0? 20 Example 1.26. Find an equation of the plane that passes through the three points (−1, 0, 0), (1, 2, 0) and (2,−1, 5). 1.5.2 Parametric equations for a plane Recall that a line could be written in the parametric form x(t) = a+ pt y(t) = b+ qt z(t) = c+ rt A line is one-dimensional so its points can be selected by a single parameter t. However, a plane is two-dimensional and so we need two parameters (say u and v) to select each point. Thus it’s no surprise that every plane can also be described by the following equations x(u, v) = a+ pu+ lv y(u, v) = b+ qu+mv z(u, v) = c+ ru+ nv Now we have nine parameters a, b, c, p, q, r, l,m and n. These can be com- puted from the coordinates of three (distinct) points on the plane. For the first point put (u, v) = (0, 0), the second put (u, v) = (1, 0) and for the final point put (u, v) = (0, 1). Then solve for a through to n. Example 1.27. Find the parametric equations of the plane that passes through the three points (−1, 0, 0), (1, 2, 0) and (2,−1, 5). 21 Example 1.28. Show that the parametric equations found in the previous example describe exactly the same plane as found in Example 1.26 (Hint : substitute the answers from Example 1.27 into the equation found in Example 1.26). Example 1.29. Find the parametric equations of the plane that passes through the three points (−1, 2, 1), (1, 2, 3) and (2,−1, 5). Example 1.30. Repeat the previous example but with points re-arranged as (−1, 2, 1), (2,−1, 5) and (1, 2, 3). You will find that the parametric equations look different yet you know they describe the same plane. If you did not know this last fact, how would you prove that the two sets of parametric equations describe the same plane? 1.5.3 Vector equation of a plane The Cartesian equation for a plane is ax+ by + cz = d for some numbers a, b, c and d. We will now re-express this in a vector form. Suppose we know one point on the plane, say (x, y, z) = (x, y, z)0, then ax0 + by0 + cz0 = d ⇒ a(x− x0) + b(y − y0) + c(z − z0) = 0 This is an equivalent form of the above equation. Now suppose we have two more points on the plane (x, y, z)1 and (x, y, z)2. Then a(x1 − x0) + b(y1 − y0) + c(z1 − z0) = 0 a(x2 − x0) + b(y2 − y0) + c(z2 − z0) = 0 22 Put ∆x˜10 = (x1−x0, y1− y0, z1− z0) and ∆x˜20 = (x2−x0, y2− y0, z2− z0).Notice that both of these vectors lie in the plane and that (a, b, c) ·∆x˜10 = (a, b, c) ·∆x˜20 = 0 What does this tell us? Simply that both vectors are orthogonal to the vector (a, b, c). Thus we must have that (a, b, c) = the normal vector to the plane Now let’s put n˜ = (a, b, c) = the normal vector to the plane d˜ = (x0, y0, z0) = one (any) point on the plane r˜ = (x, y, z) = a typical point on the plane Then we have n˜ · (r˜− d˜) = 0 This is the vector equation of a plane. Example 1.31. Find the vector equation of the plane that contains the points (1, 2, 7), (2, 3, 4) and (−1, 2, 1). Example 1.32. Re-express the previous result in the form ax+by+cz = d. 23 Planes in R3 The vector equation of a plane is determined using a point P on the plane and a direction n˜ (known as the normal direction) which isperpendicular to the plane. Then all vectors on the plane which pass through P are normal to n˜, ie. n˜ · (r˜− d˜) = 0 where r˜ = (x, y, z) is a typical point on the plane, and d˜ = (x0, y0, z0)is a particular point (P ) on the plane. 24 1.6 Systems of Linear Equations 1.6.1 Examples of Linear Systems The central problem in linear algebra is to solve systems of simultaneous linear equation. A system of linear equations is a collection of equations which have the same set of variables. Let’s look at some examples. Bags of coins We have three bags with a mixture of gold, silver and copper coins. We are given the following information Bag 1 contains 10 gold, 3 silver, 1 copper and weighs 60g Bag 2 contains 5 gold, 1 silver, 2 copper and weighs 30g Bag 3 contains 3 gold, 2 silver, 4 copper and weighs 25g The question is – What are the respective weights of the gold, silver and copper coins? Let G, S and C denote the weight of each of the gold, silver and copper coins. Then we have the system of equations 10G + 3S + C = 60 5G + S + 2C = 30 3G + 2S + 4C = 25 Silly puzzles John and Mary’s ages add to 75 years. When John was half his present age he was twice as old as Mary. How old are they? We have just two equations in our system: J + M = 75 1 2J − 2M = 0 Intersections of planes It is easy to imagine three planes in space. Is it possible that they share one point in common? Here are the equations for three such planes 3x + 7y − 2z = 0 6x + 16y − 3z = −1 3x + 9y + 3z = 3 Can we solve this system for (x, y, z)? In all of the above examples we need to unscramble the set of linear equa- tions to extract the unknowns (e.g. G,S,C etc). To solve a system of linear equations is to find solutions to the sets of equations. In other words we find values that the variables can take such that each of the equations in the system is true. 25 1.6.2 A standard strategy We start with the previous example 3x + 7y − 2z = 0 (1) 6x + 16y − 3z = −1 (2) 3x + 9y + 3z = 3 (3) Suppose by some process we were able to rearrange these equations into the following form 3x + 7y − 2z = 0 (1) 2y + z = −1 (2)′ 4z = 4 (3)′′ Then we could solve (3)′′ for z (3)′′ ⇒ 4z = 4 ⇒ z = 1 and then substitute into (2)′ to solve for y (2)′ ⇒ 2y + 1 = −1 ⇒ y = −1 and substitute into (1) to solve for x (1) ⇒ 3x− 7− 2 = 0 ⇒ x = 3 How do we get the modified equations (1), (2)′ and (3)′′ ? The general method is to take suitable combinations of the equations so that we can eliminate various terms. This method is applied as many times as we need to turn the original equations into the simple form like (1), (2)′ and (3)′′. Let’s start with the first pair of the original equations 3x + 7y − 2z = 0 (1) 6x + 16y − 3z = −1 (2) We can eliminate the 6x in equations (2) by replacing equation (2) with (2)− 2(1), ⇒ 0x+ (16− 14)y + (−3 + 4)z = −1 (2)′ ⇒ 2y + z = −1 (2)′ Likewise, for the 3x term in equation (3) we replace equation (3) with (3)− (1), ⇒ 2y + 5z = 3 (3)′ 26 At this point our system of equations is 3x + 7y − 2z = 0 (1) 2y + z = −1 (2)′ 2y + 5z = 3 (3)′ The last step is to eliminate the 2y term in the last equation. We do this by replacing equation (3)′ with (3)′ − (2)′ ⇒ 4z = 4 (3)′′ So finally we arrive at the system of equations 3x + 7y − 2z = 0 (1) 4z = 4 (3)′′ which, as before, we solve to find z = 1, y = −1 and x = 3. The procedure we just went through is known as a reduction to upper triangular form and we used elementary row operations to do so. We then solved for the unknowns by back substitution. This procedure is applicable to any system of linear equations (though be- ware, for some systems the back substitution method requires special care, we’ll see examples later). The general strategy is to eliminate all terms below the main diagonal, working column by column from left to right. More on this later! 1.6.3 Points, lines and planes - intersections In previous lectures we saw how we could construct the equations for lines and planes. Now we can answer some simple questions. How do we compute the intersection between a line and a plane? Can we be sure that they do intersect? And what about the intersection of a pair or more of planes? The general approach to all of these questions is simply to write down equa- tions for each of the lines and planes and then to search for a common point (i.e. a consistent solution to the system of equations). Example 1.33. Is the point (1, 2, 3) on the line r˜(t) = (3, 4, 5) + ((2, 2, 2)t? Solution: We simply check if the following system of equations yields the same value for t. 1 = 3 + 2t 2 = 4 + 2t 3 = 5 + 2t 27 Rearranging the top equation gives t = −1. In fact, each of these three equations gives t = −1 hence the point (1, 2, 3) is on the line r˜(t) = (3, 4, 5)+(2, 2, 2)t. Example 1.34. Is the point (1, 2, 4) on the line r˜(t) = (3, 4, 5) + (2, 2, 2)t? Example 1.35. Do the lines r˜1(t) = (1, 0, 0)+(1, 0, 0)t and r˜2(s) = (0, 0, 0)+(0, 1, 0)s intersect? If so, find the point of intersection. Solution: To answer this question we simply solve the system of equations as follows. 1 + 1t = 0 + 0s 0 + 0t = 0 + 1s 0 + 0t = 0 + 0s This system of equations has the solution t = −1 and s = 0. Hence the two lines do intersect. To find the point of intersection, we put t = −1 into the first line (or s = 0 into the second line). This gives r˜1(−1) =(1, 0, 0) + (1, 0, 0)(−1) = (0, 0, 0). Hence the point of intersection of the two lines is the origin. Example 1.36. Do the lines r˜1(t) = (1, 2, 3)+(1, 1, 2)t and r˜2(s) = (0, 0, 7)+(1, 1, 1)s intersect? Example 1.37. Find the intersection of the line x(t) = 1+3t, y(t) = 3−2t, z(t) = 1− t with the plane 2x+ 3y − 4z = 1. 28 Example 1.38. Find the intersection of the plane y = 0 with the plane 2x+ 3y − 4z = 1. Example 1.39. Find the intersection of the three planes 2x+ 3y − z = 1, x− y = 2 and x = 1 1.6.4 Points, lines and planes - distances Now we are well equipped to be able to find the distances between points, lines and planes. There are various combinations we can have, such as the distance between a point and a plane, or the distance between a line and a plane. Example 1.40. Find the distance between the point (1, 2, 3) and the line given by the equation r˜(t) = (0, 0, 7)+(1, 1, 1)t. Solution: Firstly we subtractthe position vector of the point v = (1, 2, 3) from the equation of the line. This will give us a vector u˜ that is dependent on the parameter t. u˜(t) = (0, 0, 7) + (1, 1, 1)t− (1, 2, 3) = (−1,−2, 4) + (1, 1, 1)t = (−1 + t,−2 + t, 4 + t) Think of the tail of this vector being fixed at the point (1, 2, 3) and its tip running along the line as t changes. In order to find the shortest distance (note that when asked to find the distance, it is implied that this means find the shortest distance) we want to find the value of t for which the length of u˜ us as short as possible. We can also note that the shortest vector u˜ will 29 be perpendicular to the direction of the line. This means that the dot product of the vectors (1, 1, 1) and (−1 + t,−2 + t, 4 + t) will be zero. (1, 1, 1) · (−1 + t,−2 + t, 4 + t) = −1 + t− 2 + t+ 4 + t = 1 + 3t = 0 Hence t = −13 . Now put this value of t into the vector u˜ to give: u˜(−13) = (−1− 13 ,−2− 13 , 4− 13) = (−43 ,−73 , 113 ) Now simply calculate the length of u˜(−13) = (−43 ,−73 , 113 ). This gives |u˜| = . If you plug t = −13 into the vector equation of the line, you get the coordinates of the point on the line that is closest to the point (1, 2, 3). Example 1.41. Find the distance between the two lines (1, 2, 3) + (1, 1, 2)t and (0, 0, 7) + (1, 1, 1)s Parallel lines If two lines are parallel, then it is easy to calculate the distance between them. Simply pick a point on one of the lines and calculate its distance from the other line, as per finding the distance between a point and a line. Remember that two lines are parallel if the direction vector of one line is a scalar multiple of the direction vector of the other line. 30 Example 1.42. Find the distance between the two lines (1, 2, 3) + (1, 1, 2)t and (0, 0, 7) + (2, 2, 4)s Another way of finding the distance between two lines uses scalar projec- tion. Using this method we find any vector that joins a point on one line to the other line, and then compute the scalar projection of this vector onto the vector orthogonal to both lines (it helps to draw a diagram). Example 1.43. Find the distance between the point (2, 3, 4) and the plane given by the equation x+ 2y + 3z = 4. Solution: First of all we need to find a point on the plane. By setting y = 0 and z = 0 we find x = 4. Thus (4, 0, 0) is a point on the plane. Now we find the normal vector of the plane, n˜ = (1, 2, 3). We then form a vector v˜ fromthe point on the plane (4, 0, 0) to the given point (2, 3, 4). Thus v˜ = (2, 3, 4)− (4, 0, 0) = (−2, 3, 4) Now find the scalar projection of v˜ onto the normal vector n˜. This is theshortest distance from the point (2, 3, 4) to the plane x+ 2y + 3z = 4. Example 1.44. Find the distance between the line r˜(t) = (2, 3, 4)+(3, 0,−1)tand the plane plane x+ 2y + 3z = 4. 31 Example 1.45. Find the distance between the two planes 2x+ 3y− 4z = 2 and 4x+ 6y − 8z = 3. 1.6.5 Summary • The equation ax+ by = c (or equivalently a1x1 + a2x2 = b represents a straight line in 2-space. • The equation ax+ by+ cz = d (or equivalently a1x1 +a2x2 +a3x3 = b represents a plane in 3-space. • The equation a1x1 +a2x2 +a3x3 + . . .+anxn = b represents a hyper- plane in n-space. The equation a1x1 +a2x2 +a3x3 +. . .+anxn = b is called a linear equation in the variables x1, x2, x3, . . ., xn with coefficients a1, a2, a3, . . ., an and constant term b. In general we can study m linear equations in n variables: a11x1 + a12x2 + . . . + a1nxn = b1 a21x1 + a22x2 + . . . + a2nxn = b2 ... am1x1 + am2x2 + . . . + amnxn = bm and call this a system of linear equations. Every linear system satisfies one of the following: • There is no solution • There is exactly one solution • There are infinitely many solutions 32 This seems obvious for the case of two or three variables if we view the equa- tions geometrically. In this case a solution of a system of linear equations (of two or three variables) is a point in the intersection of the lines or planes represented by the equations. When n = 2: Two lines may intersect at a single point (unique solution), or not at all (no solution), or they may even be the same line (infinite solutions). No point of intersection (no solution) One point of intersection (unique solution) Infinite points of intersection (infinite solutions - intersection in the same line) 33 When n = 3: Three planes may intersect at a single point or along a common line or even not at all. No point of intersection (no solution) One point of intersection (unique solution) 34 Intersection in a common line (infinite solutions) Example 1.46. What other examples can you draw of intersecting planes? 35 Chapter 2 Matrices 2.1 Introduction - notation and operations An m×n matrix A is a rectangular array of numbers consisting of m rows and n columns. We say A is of size m× n. We use capital letters to represent matrices, for example: A = 3 2 −11 −1 1 2 1 −1 , B = [ 2 3 4 1 ] Entries within a matrix are denoted by subscripted lower case letters. For the matrix A above we have a11 = 3 , a12 = 2, a13 = −1, a21 = 1, a22 = −1, a23 = 1 and so forth. Here, A is a 3× 3 matrix A = 3 2 −11 −1 1 2 1 −1 = a11 a12 a13a21 a22 a23 a31 a32 a33 where aij = the entry in row i and column j of A. An m×n matrix can be represented similarly. Note that m denotes number of rows in the matrix, and n denotes the number of columns. A = a11 a12 . . . a1n a21 a22 . . . a2n ... ... . . . ... am1 am2 . . . amn For brevity we sometimes write A = [aij ]. This also reminds us that A is a matrix with elements aij . A square matrix is an n× n matrix. 36 2.1.1 Operations on matrices • Equality : A = B only when all entries in A equal those in B. • Addition: Normal addition of corresponding elements. For example:[ 1 2 7 2 −1 3 ] + [ 3 1 4 1 2 1 ] = [ 4 3 11 3 1 4 ] • Multiplication by a number : λA = λ times each entry of A. For example: 5 [ 2 3 4 1 ] = [ 10 15 20 5 ] • Multiplication of matrices: . . . . . . a b c d . . . . . . . . . . . . e . . . . . . f . . . . . . g . . . . . . h . . . . . . ... . . . = . . . . . . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . ... . . . i = a · e+ b · f + c · g + d · h+ . . . Note that we can only multiply matrices that fit together. That is, if A and B are a pair of matrices, then in order that AB make sense, we must have the number of columns of A equal to the number of rows of B. We also say that matrices A and B are compatible for multiplication if A has size m× n and B has size m× r. The product AB is then a matrix of size n× r. • Transpose: Flip rows and columns, denoted by [· · · ]T . For example: [ 1 2 7 0 3 4 ]T = 1 02 3 7 4 Example 2.1. Does the following make sense? [ 2 3 4 1 ] 1 70 2 4 1 37 2.1.2 Some special matrices • The Identity matrix : I = 1 0 0 0 · · · 0 0 1 0 0 · · · 0 0 0 1 0 · · · 0 0 0 0 1 · · · 0 ... ... ... ... . . . ... 0 0 0 0 · · · 1 For any square matrix A we have IA = AI = A. • The Zero matrix : A matrix whose entries are all zeroes. • Symmetric matrices: Any matrix A for which A = AT . • Skew-symmetric matrices: Any matrix A for which A = −AT . Some- times also called anti-symmetric. 2.1.3 Properties of matrices • AB 6= BA • (AB)C = A(BC) • (AT )T = A • (AB)T = BTAT Example 2.2. Given A = [ 2 3 4 1 ] B = [ 1 7 0 2 ] and C = [ 2 1 3 0 ] verify the above four properties. 38 2.1.4 Inverses of square matrices A square matrix A is called invertible if there is a matrix B such that AB = BA = I (where I is the identity matrix). We call B the inverse of A and write B = A−1. In later lectures we will see how to compute the inverse of a square matrix. 2.2 Gaussian Elimination In previous lectures we introduced systems of linear equations, and briefly looked at how to solve these. The most efficient method for solving systems of linear equations is by using Gaussian elimination. This is essentially the row reduction that we have already encountered, but with a few extra steps. We will walk through this method using a typical example. 2x + 3y + z = 10 (1) x + 2y + 2z = 10 (2) 4x + 8y + 11z = 49 (3) 2x + 3y + z = 10 (1) y + 3z = 10 (2)′ ← 2(2)− (1) 2y + 9z = 29 (3)′ ← (3)− 2(1) 2x + 3y + z = 10 (1) y + 3z = 10 (2)′ 3z = 9 (3)′′ ← (3)′ − 2(2)′ Previously we would then solve this system using back-substitution, z = 3, y = 1, x = 2. Note how we record the next set of row-operations on each equation. This makes it much easier for someone else to see what you are doing and it also helps you track down any arithmetic errors. In this example we found 2x + 3y + z = 10 (1) y + 3z = 10 (2)′ 3z = 9 (3)′′ Why stop there? We can apply more row-operations to eliminate terms above the diagonal. This does not involve back-substitution. This method 39 of row reduction is known as Gaussian elimination.1 Example 2.3. Continue from the previous example and use row-operations to eliminate the terms above the diagonal. Hence solve the system of equa- tions. 2.2.1 Gaussian elimination strategy 1. Use row-operations to eliminate elements below the diagonal. 2. Use row-operations to eliminate elements above the diagonal. 3. If possible, re-scale each equation so that each diagonal element = 1. 4. The right hand side is now the solution of the system of equations. If you stop after step one you are doing Gaussian elimination with back- substitution (this is usually the easier option). 2.2.2 Exceptions Here are some examples where problems arise. Example 2.4. A zero on the diagonal 2x + y + 2z + w = 2 (1) 2x + y − z + 2w = 1 (2) x − 2y + z − w = −2 (3) x + 3y − z + 2w = 2 (4) 2x + y + 2z + w = 2 (1) 0y − 3z + w = −1 (2)′ ← (2)− (1) − 5y + 0z − 3w = −6 (3)′ ← 2(3)− (1) + 5y − 4z + 3w = 2 (4)′ ← 2(4)− (1) 1In some texts, using row operations to eliminate terms below the diagonal only is known as Gaussian elimination, whereas using row operations to eliminate terms below and above the diagonal is known as Gauss-Jordan elimination. 40 The zero on the diagonal on the second equation is a serious problem, it means we can not use that row to eliminate the elements below the diagonal term. Hence we swap the second row with any other lower row so that we get a non-zero term on the diagonal. Then we proceed as usual. 2x + y + 2z + w = 2 (1) − 5y + 0z − 3w = −6 (2)′′ ← (3)′ 0y − 3z + w = −1 (3)′′ ← (2)′ + 5y − 4z + 3w = 2 (4)′ ← 2(4)− (1) The result is w = 2, z = 1, y = 0 and x = −1. Example 2.5. A consistent and under-determined system Suppose we start with three equations and we wind up with 2x + 3y − z = 1 (1) − 5y + 5z = −1 (2)′ 0z = 0 (3)′′ The last equation tells us nothing! We can’t solve it for any of x, y and z. We really only have 2 equations, not 3. That is 2 equations for 3 unknowns. This is an under-determined system. We solve the system by choosing any number for one of the unknowns. Say we put z = λ where λ is any number (our choice). Then we can leap back into the equations and use back-substitution. The result is a one-parameter family of solutions x = 1 5 − λ, y = 1 5 + λ, z = λ Since we found a solution we say that the system is consistent. Example 2.6. An inconsistent system Had we started with 2x + 3y − z = 1 (1) x − y + 2z = 0 (2) 3x + 2y + z = 0 (3) 41 we would have arrived at 2x + 3y − z = 1 (1) − 5y + 5z = −1 (2)′ 0z = −2 (3)′′ This last equation makes no sense as there are no finite values for z such that 0z = −2 and thus we say that this system is inconsistent and that the system has no solution. 2.3 Systems of equations using matrices Consider the system of equations 3x + 2y − z = −1 x − y + z = 4 2x + y − z = −1 We can rewrite this system using matrix notation. The coefficients of our equations form a 3× 3 matrix A A = 3 2 −11 −1 1 2 1 −1 The variables (x, y and z) can be written as a 3× 1 matrix (also known as a column vector) X X = xy z and the right hand side can also be written as a column vector B B = 31 0 Thus our system of equations AX = B becomes 3 2 −11 −1 1 2 1 −1 xy x = 31 0 42 Example 2.7. Write the system of equations 3x + 2y − z = −1 x − y + z = 4 2x + y − z = −1 in matrix notation. 2.3.1 The augmented matrix Consider the system of equations: 2x + 3y + z = 10 (1) x + 2y + 2z = 10 (2) 4x + 8y + 11z = 49 (3) The augmented matrix of A is the matrix augmented by the column vector b˜. [ A|b]˜ = 2 3 1 101 2 2 10 4 8 11 49 Previously we used Gaussian elimination to solve systems of linear equa- tions, where we labelled our equations (1), (2), (3) and so forth. It is much more efficient to set up our system using matrices, and then perform Gaus- sian elimination on the augmented matrix. Gaussian elimination (using matrices) consists of bringing the augmented matrix to echelon form us- ing elementary row operations. This allows us to then solve a much simpler system of equations. 43 2.4 Row echelon form A matrix is in row echelon form if it satisfies the following two conditions: • If there are any zero rows, they are at the bottom of the matrix. • The first non-zero entry in each non-zero row (called the leading entry or pivot) is to the right of the pivots in the rows above it. A matrix is in reduced echelon form if it also satisfies: • Each pivot entry is equal to 1. • Each pivot is the only non-zero entry in its column. Example 2.8. Write down three matrices in echelon form and circle the pivots. Example 2.9. Write down three matrices in reduced echelon form. The variables corresponding to the columns containing pivots are called the leading variables. The variables corresponding to the columns that do not contain pivots are called free variables. Free variables are not restricted by the linear equations - they can take arbitrary values, and we often denote these by Greek letters (such as α, β and so forth). The leading variables are then expressed in therms of the free variables. Example 2.10. For the following linear systems (a) Write down the augmented matrix and bring it to echelon form. (b) Identify the free variables and the leading variables. (c) Write down the solution(s) if any exist. 44 (d) Give a geometric interpretation of your results. (i) x+ y = 1 x− 2y = 4 (ii) x− y = 1 x− y = 2 (iii) x− y = 1 3x− 3y = 3 Example 2.11. Consider the linear system x1 + 3x2 + 3x3 + 2x4 = 1 2x1 + 6x2 + 9x3 + 5x4 = 1 −x1 − 3x2 + 3x3 = k (a) Write down the augmented matrix and bring it to echelon form. (b) Identify the free variables and the leading variables. (c) For what values of the number k does the system have (i) no solution, (ii) infinitely many solutions, (iii) exactly one solution? (d) When a solution or solutions exist, find them. 45 2.4.1 Rank The rank of a matrix is the number of non-zero rows (also the number of pivots) in its row echelon form. The rank of a matrix is denoted by rank(A). The rank of a matrix gives us information about the solutions of the associ- ated linear system. Importantly, if the number of rows in the augmented matrix is equal to the rank of the matrix, then the system of linear equations has a unique solution. A linear system of m equations in n variables will give an m× n matrix A. Once we have reduced the matrix to echelon form, and found the rank = r of the reduced matrix (let’s call the reduced matrix U) we can deduce the following informative properties: Properties 1. Number of variables = n 2. Number of leading variables = r 3. Number of free variables = n− r 4. r ≤ m (because there is at most one pivot in each of the m rows of U). 5. r ≤ n (because there is at most one pivot in each of the n columns of U). 6. If r = n there are no free variables and there will be either no solution or one solution. 7. If r < n there is at least one free variable and there will be either no solution or infinitely many solutions. 8. If there are more variables than equations, that is n > m, then r < n and so there will be either no solution or infinitely many solutions. Example 2.12. What is the rank of each of the matrices in the previous examples? 46 2.4.2 Homogeneous systems A homogeneous system is one of the form Ax˜ = 0˜. The augmented matrixis therefore [A|0˜] and its echelon form is [U |0˜]. The last non-zero row cannotbe [0 0 . . . 0 d], d 6= 0, so a homogeneous system is never inconsistent. In fact x˜ = 0˜ is always a solution. Geometrically, the lines, planes or hyperplanesrepresented by the equations in a homogeneous system all pass through the origin. 2.4.3 Summary When we reduce a matrix to echelon form, we do so by performing elemen- tary row operations. On a matrix, these operations are • Interchange two rows (which we denote by Ri ↔ Rj). • Multiply one row by a non-zero number (cRi → Ri). • Add a multiple of one row to another row (Ri + cRj → Ri). Every matrix can be brought to echelon form by a sequence of elementary row operations using Gaussian elimination. This is sometimes given as an algorithm: Gaussian Elimination 1. If the matrix consists entirely of zeros, stop (it is in echelon form). 2. Otherwise, find the first column with a non-zero entry (say a) and use a row interchange to bring that entry to the top row. 3. Subtract multiples of the top row from the rows below it so that each entry below the pivot a becomes zero. (This completes the first row. All subsequent operations are carried out on the rows below it.) 4. Repeat steps 1 to 3 on the remaining rows. 47 A linear system of equations Ax˜ = b˜, or AX = B, can be written in thegeneral form: a11 a12 ... a1n a21 a22 ... a2n ... ... ... ... am1 am2 ... amn x1 x2 ... ... xn = b1 b2 ... bm where A is the m× n matrix of coefficients, x˜ (or X) is the n× 1 matrix(or column vector) of variables, and b˜ (or B) is the m×1 matrix (or columnvector) of constant terms. The augmented matrix is the matrix [A|b˜] = a11 a12 ... a1n b1 a21 a22 ... a2n b2 ... ... ... ... ... am1 am2 ... amn bm In order to solve a general linear system Ax˜ = b˜ we: 1. Bring the augmented matrix to echelon form: [A|b˜] → [U |c˜]. Sinceeach elementary row operation is reversible, the two systems Ax˜ = b˜and Ux˜ = c˜ have exactly the same solutions. 2. Solve the triangular system Ux˜ = c˜ by back-substitution. For a general linear system Ax˜ = b˜ or its corresponding triangular formUx˜ = c˜ there are three possibilities. 1. There is no solution - this happens when the last non-zero row of [U |c˜] is [0 0 . . . 0 d ] with d 6= 0, in which case the equations are in-consistent. 2. There are infinitely many solutions - this happens when the equa- tions are consistent and there is at least one free variable. 3. There is exactly one solution - this happens when the equations are consistent and there are no free variables. 48 2.5 Matrix Inverse Suppose we have a system of equations[ a b c d ] [ x y ] = [ u v ] and that we write in the matrix form AX = B Can we find another matrix, call it A−1, such that A−1A = I = the identity matrix If so, then we have A−1AX = A−1B ⇒ X = A−1B Thus we have found the solution of the original system of equations. For a 2× 2 matrix it is easy to verify that A−1 = [ a b c d ]−1 = 1 ad− bc [ d −b −c a ] Note that not all matrices will have an inverse. For example, if A = [ a b c d ] then A−1 = 1 ad− bc [ d −b −c a ] and for this to be possible we must have ad− bc 6= 0. In later lectures we will see some different methods for computing the inverse A−1 for other (square) matrices larger than 2× 2. Example 2.13. If A = [ 1 2 3 4 ] , then A−1 = 49 Properties of inverses • A square matrix has at most one inverse. Proof: If B1 and B2 are both inverses of A, then AB1 = B1A = I and AB2 = B2A = I. So B1 = B1I = B1(AB2) = (B1A)B2 = IB2 = B2. So the inverse of A is unique. • If A is invertible, then so is AT and (AT )−1 = (A−1)T . • If A is invertible, then so is A−1 and (A−1)−1 = A. • If A and B are invertible matrices of the same size, then AB is invert- ible and (AB)−1 = B−1A−1. • (A1A2 . . . Am)−1 = A−1m . . . A−12 A−11 . • If A = [ a b c d ] and ad− bc 6= 0, then A−1 = 1ad−bc [ d −b −c a ] . • Cancellation laws: If A is invertible, then – AB = AC implies B = C (just multiply on the left by A−1) – BA = CA implies B = C (just multiply on the right by A−1). – BAC = DAE does not imply BC = DE. • Solving systems: Let A be n × n and invertible. Then the linear system Ax˜ = b˜ always has exactly one solution, namely x˜ = A−1b˜. • Rank test: An n× n matrix A is invertible if and only if it has full rank r = n. 2.6 Matrix Transpose If A is a matrix of size m × n then the transpose AT of A is the n × m matrix defined by AT (j, i) = A(i, j). Consider the matrix A = [ 2 3 4 0 1 5 ] . The transpose of A, namely AT is simple to find. Row 1 of matrix A becomes column 1 of matrix AT , and row 2 of matrix A becomes column 2 of matrix AT . Thus AT = 2 03 1 4 5 . 50 Example 2.14. Let B = [ 1 1 1 1 2 5 ] and C = 1 1 1 0 00 1 2 1 0 2 0 1 1 0 . Find BT and CT . Note the following: • (AT )T = A • (cA)T = cAT • (A+B)T = AT +BT • (AB)T = BTAT Example 2.15. Verify the above using matrices A = [ 2 3 4 5 ] and B =[ 1 2 3 4 ] . 2.7 Determinants The determinant function det is a function that assigns to each n×n matrix A a number det A called the determinant of A. The function is defined as follows: • If n = 1: A = [a] and we define det A := a. • If n = 2: A = [ a b c d ] and we define det A := ad− bc. 51 • If n > 2: It gets a bit complicated now, but it is not too bad. Firstly create a sub-matrix Sij of A by deleting the i th row and the jth column. Then define detA := a11 detS11 − a12 detS12 + a13 detS13 − · · · ± a1n detS1n The quantity detSij is called the minor of entry aij and is denoted Mij . The number (−1)i+jMij is called the cofactor of entry aij . Thus to compute detA you have to compute a chain of determinants from (n− 1)× (n− 1) determinants all the way down to 2× 2 determinants. This method of defining and evaluating det A is called Laplace’s expan- sion along the first row. We can, in fact, use any row (or any column) to calculate det A. We often write detA = |A|. Example 2.16. Compute the determinant of A = 1 7 23 4 5 6 0 9 When we expand the determinant about any row or column, we must observe the following pattern of ± signs (these correspond to the (−1)i+j in Cij - check!). + − + − + − · · · − + − + − + · · · + − + − + − · · · − + − + − + · · · This is best seen in an example. Example 2.17. By expanding about the second row compute the determi- nant of A = 1 7 23 4 5 6 0 9 52 Example 2.18. Compute the determinant of A = 1 2 70 0 3 1 2 1 2.7.1 Properties of determinants • If we interchange two rows (or two columns) of A the resulting matrix has determinant equal to −det A. • If we add a multiple of one row to another row (similarly for columns), the resulting matrix has determinant equal to det A. • If we multiply a row or column of A by a scalar α, the resulting matrix has determinant equal to α(det A). • If A has a row or column of zeros, then det A = 0. • If two rows (or columns) of A are identical, then det A = 0. • For any fixed i = 1, . . . , n we have det A = ai1Ci1+ai2Ci2+. . .+ainCin. • For any fixed j = 1, . . . , n we have det A = a1jC1j + a2jC2j + . . . + anjCnj . Determinant test: An n×nmatrix A is invertible if and only if det(A) 6= 0. 2.7.2 Vector cross product using determinants The rule for a vector cross product can be conveniently expressed as a de- terminant. Thus if v˜ = vxi˜+ vyj˜+ vzk˜ and w˜ = wxi˜+ wyj˜+ wzk˜ then v˜× w˜ = ∣∣∣∣∣∣ i˜ j˜ k˜vx vy vz wx wy wz ∣∣∣∣∣∣ 53 2.7.3 Cramer’s rule Recall that if a linear system Ax˜ = b˜ has a unique solution, then x˜ =A−1b˜ is this solution. If we substitute the formula for the inverse A−1 fromthe previous section (using det Sji) into the product A−1b˜ we arrive atCramer’s rule for solving the linear system Ax˜ = b˜. Cramer’s rule: Let Ax˜ = b˜ be a linear system with a unique solution. Thismeans that A is a square matrix with non-zero determinant. Let Ai be the matrix that results from A by replacing the ith column of A by b˜. Then xi = detAi detA Examples of Cramer’s rule will be given in tutorials. 2.8 Obtaining inverses using Gauss-Jordan elimi- nation The most efficient method for computing the inverse of a matrix is by Gauss-Jordan elimination which we have met earlier. • Use row-operations to reduce A to the identity matrix. • Apply exactly the same row-operations to a matrix set initially to the identity. • The final matrix is the inverse of A. We usually record this process in a large augmented matrix. • Start with [A|I]. • Apply row operations to obtain [I|A−1] • Read off A−1, the inverse of A. Recall that some texts use the term Gaussian elimination to refer to re- ducing a matrix to its echelon form, and the term Jordan elimination to refer to reducing a matrix to its reduced echelon form. In this manner, the Gauss-Jordan algorithm can be described diagrammatically as follows: [A|I] −→G.A [U |∗] −→J.A [I|B] where B = A−1. In words, provided A has rank n: 54 • augment A by the identity matrix; • perform the Gaussian algorithm to bring A to echelon form U and [A|I] to [U |∗]; • perform the Jordan algorithm to bring U to reduced echelon form I and [U |∗] to [I|B] (in other words use elementary row operations to make the pivots all 1s and to produce zeros above the pivots); • then B = A−1. Example 2.19. Use the Gauss-Jordan algorithm to invert the following matrix A. 1 1 3 1 0 00 2 1 | 0 1 0 1 4 4 0 0 1 Example 2.20. Solve the linear system x + y + 3z = 2 2y + z = 0 x + 4y + 4z = 1 55 2.8.1 Inverse - another method Here is another way to compute the inverse of a matrix. • Select the ith row and jth column of A. • Compute (−1)i+j detSijdetA • Store this entry at aji (row j and column i) in the inverse matrix. • Repeat for all other entries in A. That is, if A = [ aij ] then A−1 = 1 detA [ (−1)i+j detSji ] This method works but it is rather tedious. ————————————————– 56 Chapter 3 Calculus Reference books 1. G. James, Modern Engineering Mathematics, 4th edition, Prentice Hall, 2007. 2. J. Stewart, Calculus Early Transcendentals, 7th edition, Cengage Learn- ing, 2012. 3.1 Differentiation 3.1.1 Rate of change Differentiation is the mathematical method that we use to study the rate of change of physical quantities. Let us look at this with an example. Consider a point P that is moving with a constant speed v along a straight line. Let s be the distance moved by the point after time t. The distance moved after time t is given by the formula s = vt . If ∆t denotes a finite change in time t, the corresponding change of distance is given by ∆s = v∆t. The rate of change of s with t is then simply change of s change in t = ∆s ∆t = v = average speed over the time interval ∆t. 57 Suppose now that the speed of P varies with time. By making ∆t become very small, ie taking the limit as ∆t→ 0, we define the derivative of s with respect to t at time t as the rate of change of s with respect to t, as t→ 0. If we let ds and dt be the infinitesimal changes in s and t, then we can write: v = ds dt = lim ∆t→0 ∆s ∆t = instantaneous speed of P at time t. 3.1.2 Definition of the derivative f ′(x) and the slope of a tangent line Consider the graph of the function y = f(x) of the single variable x shown in the plot below. We will now compare the average rate of change of f(x) with the derivative of f(x). Let ∆f = f(x + ∆x) − f(x) be the change in f as we go from point P to Q and as x changes from x to x + ∆x. The average rate of change of the function f(x) on the interval ∆x is ∆f ∆x . This is the slope of the chord PQ. The derivative of f(x) at the point x is defined as df dx = f ′(x) = lim ∆x→0 ( ∆f ∆x ) = lim ∆x→0 ( f(x+ ∆x)− f(x) ∆x ) . The derivative is the slope (or gradient) of the local tangent line to the curve y = f(x) at the point P . That is, f ′(x) = tan θ, where θ is the angle between the two dashed lines. 58 The derivative f ′(x) is thus the instantaneous rate of change of f with respect to x at the point P . Example 3.1. Use the definition of the derivative to obtain from first principles the value of f ′(x) for the function f(x) = 25x − 5x2 at x = 1. Find the equation of the tangent line to the graph of y = f(x) at the point (1, 20) in the xy-plane. Solution: df dx = lim ∆x→0 ( f(x+ ∆x)− f(x) ∆x ) = 59 3.1.3 Techniques of differentiation - rules Most mathematical functions are readily differentiable without the need to resort to the first principles definition. It is simply a matter of applying one or more of the rules of differentiation which are collected in the following table. It is assumed that c and n are constants. Description Function Derivative Constant f(x) = c f ′(x) = 0 Power of x f(x) = xn f ′(x) = nxn−1 Multiplication by a constant c cf(x) ddx (cf(x)) = cf ′(x) Sum (or difference) of functions f(x)± g(x) f ′(x)± g′(x) Product of functions f(x)g(x) f(x)g′(x) + g(x)f ′(x) Quotient of functions f(x) g(x) g(x)f ′(x)− f(x)g′(x) (g(x))2 Chain rule for composite functions If u = g(x) and y = f(u) so that y = f (g(x)) then dy dx = dy du du dx = f ′(u)g′(x) Example 3.2. (a) Find the derivative of f(x) = x3 + 2x2 − 5x − 6 with respect to x. (b) Find the derivative of y = (x5 + 6x2 + 2)(x3− x+ 1) with respect to x. 60 (c) Find the derivative of f(x) = x2 + 1 x2 − 1 with respect to x. (d) Use the chain rule to find dy dx when (i) y = (2x+ 3)5 (ii) y = √ (3x2 + 1) 61 3.2 Maximum and minimum of functions The derivative of a function f(x) tells us important information regarding the graph of y = f(x). • If f ′(x) > 0 on the interval [a, b] then the function f(x) is increasing on that interval. • If f ′(x) < 0 on the interval [a, b] then the function f(x) is decreasing on that interval. • If f ′(x) = 0 on the interval [a, b] then the function f(x) is constant on that interval. A function f(x) has a local maximum at x = c if f(x) ≤ f(c) for values of x in some open interval containing c. (Note an ’open’ interval is an interval NOT including the end points, ie the interval (a, b) rather than [a, b].) A function f(x) has a local minimum at x = c if f(x) ≥ f(c) for values of x in some open interval containing c. Note that the interval endpoints cannot correspond to a local maximum or a local minimum. Can you see why this is so? Example 3.3. Identify the local maxima and minima on the graph of the function below. 62 How do we find the local maxima and minima? Local maxima and local minima occur where the derivative of the function is zero. They can also occur where the derivative does not exist (consider the function f(x) = |x| at the point x = 0. For the function f(x) we define the extrema, or critical points, as the points x = c such that: • f ′(c) = 0, or • f ′(c) does not exist. It is important to note that f ′(c) = 0 does not imply that the function f(x) must have a local maximum or minimum at x = c. Consider the function f(x) = x3 at x = 0 to explore this further. Thus having f ′(c) = 0 is only a necessary requirement, rather than a sufficient requirement for the existence of local maxima or minima. We can also note the following with regards to the graph of f(x). • At a point on the graph of the function f(x) corresponding to a local maximum, the function changes from increasing to decreasing. • At a point on the graph of the function f(x) corresponding to a local minimum, the function changes from decreasing to increasing. Note that the tangent line (if it exists) is horizontal at the point x = c corresponding to either a local maximum or a local minimum. The First Derivative Test Using this test is simple. All we do is look at the sign of the derivative at each side of the critical point c: • If f ′(x) changes from positive to negative (ie f(x) changes from in- creasing to decreasing) then f(x) has a local maximum at x = c. • If f ′(x) changes from negative to positive (ie f(x) changes from de- creasing to increasing) then f(x) has a local minimum at x = c. • If f ′(x) does not change, then f(c) is neither a maximum nor a mini- mum value for f(x). 63 In summary to find the local extrema we • Find all critical points. • For each critical point, decide whether it corresponds to a local maximum or minimum (or neither) using the First Derivative Test. Example 3.4. Find the local extrema for the function f(x) = x3 − 5x2 − 8x+ 7 over the interval R. Solution: First we find the critical points by differentiating the function and solving for x when f ′(x) = 0. Next we inspect the critical points found above. Using the First Derivative Test there is a local maximum at x = and a local minimum at x = . The corresponding values of the function at the local extrema are f( ) = and f( ) = . 64 Absolute (Global) Maximum and Minimum Since we have been talking about local extrema, we must also mention ab- solute (global) extrema. • A function f(x) has an absolute minimum at x = c if f(x) ≥ f(c) for all x in the domain [a, b] for a ≤ c ≤ b. • A function f(x) has an absolute maximum at x = c if f(x) ≤ f(c) for all x in the domain [a, b] for a ≤ c ≤ b. The Extreme Value Theorem states that if a function f(x) is continuous on a closed interval [a, b] then f(x) obtains an absolute maximum and an absolute minimum at some points in the interval. Note that the interval [a, b] must be a closed interval. Why is this necessary? Example 3.5. What happens with the Extreme Value Theorem if a func- tion f(x) is not continuous? 65 To find the absolute extrema of a continuous function on a closed interval: 1. Find the values of the function at all critical points in the interval. 2. Find the values of the function at the end points of the interval. 3. Compare all of these for maximum / minimum. To find the absolute extrema of a continuous function over an open interval (including R): 1. Find the values of the function at all critical points in the interval. 2. Find the limit of the function as x approaches the endpoints of the interval (or ±∞). 3. Compare all of these for maximum / minimum. Example 3.6. Find the absolute extrema for the function f(x) = e−x2 . 66 3.3 Differentiating inverse, circular and exponen- tial functions 3.3.1 Inverse functions and their derivatives The inverse of a function f is the function that reverses the operation done by f . The inverse function is denoted by f−1. It satisfies the relation y = f(x)⇔ x = f−1(y). Here ⇔ means ‘implies in both directions’. Since x is normally chosen as the independent variable of a function and as x is always plotted on the horizontal axis in the xy-plane, the graph of the inverse function of y = f(x) is defined by the relation y = f−1(x)⇔ x = f(y). In practice, to obtain the inverse function f−1 to a given function y = f(x), we • solve this equation to obtain x in terms of y • interchange the labels x and y to give y = f−1(x) Note that f ( f−1(x) ) = x = f−1 ( f(x) ) . It is possible to plot the graphs of y = f(x) and y = f−1(x) on the same diagram. In this case, the graph of y = f−1(x) is the mirror image of the graph of y = f(x) in the line y = x . Example 3.7. Find the inverse function of y = f(x) = 15(4x− 3). A sketch of f(x) is given below. Note that y = x is the thin line shown in the diagram. Sketch f−1(x) on the same axis. 67 3.3.2 Exponential and logarithmic functions: ex and lnx A very important example of a function and its inverse are the exponential function y = ex and the natural logarithm function y = lnx. From the definition of the inverse function we have y = f(x) = ex ⇔ x = ln y Now re-labelling x and y, we obtain f−1(x) = lnx as the inverse function of f(x) = ex. We note from the definition y = ex ⇔ x = ln y, that • ln(ex) = ln(y) = x • eln y = ex = y This explicitly demonstrates the inverse behaviour of ex and lnx. As an illustration, since e0 = 1, we have ln 1 = ln e0 = 0. This means that as the point (0, 1) lies on the graph of y = ex, the point (1, 0) must lie on the graph of y = lnx. This feature is seen in the graphs of y = ex and y = lnx given below. In general, for any function f(x) since b = f(a) ⇔ a = f−1(b), it follows that if the point (a, b) lies on the graph of y = f(x), then (b, a) is a point on the graph of y = f−1(x). 68 Sometimes we will need to restrict the domain of a function in order to find its inverse. Example 3.8. Find the inverse function f−1(x) of f(x) = x2 by first re- stricting the domain to [0,∞). Derivative rule for inverse functions If y = f−1(x)⇔ x = f(y), then dy dx = 1 dx/dy = 1 f ′(y) Example 3.9. Find the derivative of the function f(x) and its inverse f−1(x) for f(x) = 1 5 (4x − 3). Check that the answers satisfy the deriva- tive rule for inverse functions. 69 Example 3.10. Show that d dx (lnx) = 1 x given that ex is the inverse func- tion of lnx and d dx (ex) = ex. Solution: Put y = lnx. This implies x = ey. Therefore d dy (x) = d dy (ey) = ey and hence dy dx = 1 dx/dy = 1 ey = 1 x . 70 3.3.3 Derivatives of circular functions Circular (or trigonometric) functions sinx, cosx, tanx, etc arise in prob- lems involving functions that are periodic and repetitive, such as those that describe the orbit of a planet about its parent star. Here we acquaint our- selves with the derivatives of such functions. Example 3.11. Sketch the graphs of f(x) = sinx and g(x) = cosx on the same diagram for the interval 0 ≤ x ≤ 3pi. Use the tangent line method to estimate the values of f ′(x) at the points x = 0, pi2 , pi, 3pi 2 , 2pi, . . . on the same diagram. Do these values seem to match the curve g(x)? 71 Before examining the derivatives of circular functions in more detail, we consider two basic inverse circular functions sin−1 x and tan−1 x. Note that the alternative notation for inverse circular functions is: sin−1 x = arcsinx, cos−1 x = arccosx and tan−1 x = arctanx. Example 3.12. The graphs of y = sinx and y = tanx are shown below by the heavy curves for the restricted domains [−pi2 , pi2 ], ie −pi2 ≤ x ≤ pi2 and (−pi2 < x < pi2 ), respectively. Sketch the inverse functions sin−1 x and tan−1 x using mirror reflection across the line y = x. The domain of sin−1 x is The range of sin−1 x is The domain of tan−1 x is The range of tan−1 x is y = sinx y = tanx Note: The reason for the use of a restricted domain in specifying inverse circular functions is that if all of the graph of y = sinx or tanx were naively reflected across the line y = x, there would be more than one choice (in fact there would be an infinite number of choices) for the ordinate (y) value of the inverse function. We saw this at work in a previous example. A function may have only a single value for each x in its domain. We also note that tan ( pi 2 ) and tan (−pi2 ) are not defined (±∞). 72 The values of the derivatives of the six basic circular (ie trigonometric) func- tions are shown in the table below, along with the derivatives of the three main inverse circular functions sin−1 x, cos−1 x and tan−1 x. Also listed are the derivatives of the basic exponential and logarithm functions. Table of the derivatives of the basic functions of calculus Original function f Derivative function f ′ sinx cosx cosx − sinx tanx sec2 x ≡ 1 + tan2 x cosecx ≡ 1/ sinx −cosecx · cotx secx ≡ 1/ cosx secx · tanx cotx ≡ 1/ tanx −cosec2 x sin−1 x domain:− 1 ≤ x ≤ 1 (ie |x| ≤ 1) 1√ 1− x2 cos−1 x domain:− 1 ≤ x ≤ 1 (ie |x| ≤ 1) − 1√ 1− x2 tan−1 x domain:−∞ < x <∞ 1 1 + x2 ex ex lnx domain: x > 0 1 x Example 3.13. Find dy dx when y is given by (a) sin(2x+ 3) 73 (b) x2 cosx (c) x tan(2x+ 1) (d) tan−1 x2 (e) Prove the differentiation formula d dx ( sin−1 x ) = 1√ 1− x2 . (f) Find dy dx when y = arcsin(e2x) 74 (g) Prove the differentiation formula d dx ( cos−1 x ) = −1√ 1− x2 . (h) Find ds dt when s = ln(tan(2t)) (i) Find dg dt when g = √ t sin−1(t2) (j) Find dg dx when g = 5x3 + 3x (x2 + 3)2 75 3.4 Higher order derivatives If f(x) is a differentiable function, then its derivative f ′(x) = df dx is also a function and so may have a derivative itself. The derivative of a derivative is called the second derivative and is denoted by f ′′(x). There are various ways it can be written: f ′′(x) = d dx f ′(x) = d dx ( df dx ) = d2f dx2 = f (2)(x) The second derivative, f ′′(x) can be differentiated with respect to x to yield the third derivative f ′′′(x) = d3f dx3 = f (3)(x). And so on! In general, the nth derivative of f(x) is denoted by dnf dxn or f (n)(x). Interpretation: Earlier we used the first derivative to find local maxima and local minima. We can also use the second derivative at x = c to find these. • If f ′′(c) > 0 then the function has a local minima at c. • If f ′′(c) < 0 then the function has a local maxima at c. • If f ′′(c) = 0 then the test is inconclusive and we cannot determine if there is a local maxima or minima at c. The second derivative f ′′(x) also measures the rate of change of the first derivative f ′(x). As f ′(x) is the gradient or slope of the tangent line to the graph of y = f(x) in the xy-plane, we see that: (i) if d2f dx2 > 0 then df dx increases with increasing x and the graph of y = f(x) is said to be locally concave up. (ii) if d2f dx2 < 0 then df dx decreases with decreasing x and the graph of y = f(x) is said to be locally concave down. 76 Example 3.14. If f(x) = x3 − 3x + 1 find the first four derivatives f ′(x), f ′′(x), f (3)(x) and f (4)(x). Determine for what values of x the curve is concave up and concave down. Also locate the turning points (a, f(a)) given where f ′(a) = 0. Mark these features on the graph of y = f(x) shown below. Example 3.15. Find the second derivative of the function y = e−x sin 2x. 77 Example 3.16. Find the second derivative of the function y = lnx x . Example application: Find the point on the graph of √ x, x ≥ 0 closest to (2, 0). 78 3.5 Parametric curves and differentiation 3.5.1 Parametric curves The equation that describes a curve C in the Cartesian xy-plane can some- times be very complicated. In that case it can be easier to introduce an independent parameter t, so that the coordinates x and y become functions of t. We explored this when we looked at the vector equations of lines and planes. That is, x = f(t) and y = g(t). The curve C is parametrically represented by C = {(x, y) : x = f(t) y = g(t) t1 ≤ t ≤ t2} As the independent parameter t goes from t1 to t2, the point P = ( (x(t), y(t) ) on the curve C moves from P1 = (x1, y1) to P2 = (x2, y2). Example 3.17. What do the following parametric curves represent? (a) C = {(x, y) : x = 2 cos(t), y = 2 sin(t), 0 ≤ t ≤ 2pi} (b) C = { (x, y) : x = 2t, y = 4t2, −∞ < t <∞} 79 (c) C = {(x, y) : x = 5 cos(t), y = 2 sin(t), 0 ≤ t ≤ 2pi} (d) C = {(x, y) : x = t cos(t), y = t sin(t), t > 0} 3.5.2 Parametric differentiation it is now a natural progression to ask what is the value of the slope dy dx at the point ( (x(t), y(t) ) on the curve. This is given by dy dx = dy dt / dx dt = g′(t) f ′(t) where f ′(t) = df dt and g′(t) = dg dt . Example 3.18. Sketch the curve represented parametrically by: C = {(x, y) : x = a cos t y = a sin t 0 ≤ t ≤ 2pi} . 80 Find the derivative function dy dx . Find the equation of the tangent line to the curve at the point corresponding to t = pi 4 and draw this on the sketch for the case a = 2. Solution: Here f(t) = a cos t, g(t) = a sin t. Therefore x2+y2 = a2(cos2 t+ sin2 t) = a2. The curve C is thus a circle of radius a centred at the origin. As t goes from 0→ 2pi, the circle is described once, in the positive direction, starting at the point (2, 0). Now, f ′(t) = −a sin t = −y, g′(t) = a cos t = x and so dy dx = dy dt / dx dt = g′(t) f ′(t) = x −y = − a cos t −a sin t = − cot t At t = pi 4 , x (pi 4 ) = a cos (pi 4 ) = a√ 2 , y (pi 4 ) = a sin (pi 4 ) = a√ 2 , and so dy dx = −a/ √ 2 a/ √ 2 = −1. Tangent Line: The equation of a straight line of slope m which passes through the point (x1, y1) is y − y1 = m(x− x1) Now taking a = 2 and collecting values, we have: 81 x1 = y1 = m = and thus the required answer is: Example 3.19. Consider the curve represented parametrically by: C = { (x, y) : x = 1 + 3t2, y = 1 + 2t3, −∞ < t <∞} . (i) Find dy dx as a function of t (ii) Evaluate dy dx at t = 1 and find the tangent line to the curve at this point. Example 3.20. Find the equation of the tangent line to x = 5 cos(t) y = 2 sin(t) for 0 ≤ t ≤ 2pi at t = pi 4 . 82 3.6 Function approximations We are now going to take a step in an interesting direction, and look at how to approximate a function by a number of different methods. 3.6.1 Introduction to power series A geometric sequence an (n = 0, 1, 2, 3, . . .) is one in which the ratio of successive terms, namely an+1/an is a constant, say r. That is, an+1/an = r. Thus a1 = ra0, a2 = ra1 = r 2a0, and so on. We write: a0, a1, a2, a3, . . . = a0, ra0, r 2a0, r 3a0, . . . A finite geometric series consists of the sum Sn of the first n terms of the geometric sequence. Setting the initial term a0 = a, a constant, we have: Sn = a+ ar + ar 2 + ar3 + . . .+ arn−1 (i) We can easily find the sum of this series. To do this we multiply both sides of Equation (i) by r to obtain: rSn = ar + ar 2 + ar3 + ar4 . . .+ arn (ii) Subtracting Equation (ii) from (i) then yields: (1− r)Sn = a− arn Hence as long as r 6= 1, the sum of the first n terms of a geometric series is: Sn = n−1∑ k=0 ark = a(1− rn) 1− r (iii) For the particular case of r = 1, we see from Equation (i) that Sn = an. 83 Example 3.21. Find the geometric series of the following sequence 1, 1 2 , 1 4 , 1 8 , 1 16 , . . . (i) when n = 3, ie find S3. (ii) when n = 5, ie find S5. (iii) what happens as n→∞? Example 3.22. A nervous investor is deciding whether to invest a sum of $P0 in a company that is advertising a high fixed interest rate of I% for the next N years. To allay all fear it is agreed that at the end of each year, he/she can withdraw all principal less the interest earned on that year. That interest is then used as principal for the next year’s investment. The com- pany is to pocket the interest on the last year of the plan as a penalty. What is the total value PN of his/her asset at the end of the final year? Calculate PN for the case where P0 = 100000, I = 25% and N = 10. Solution: At the end of the first year, the value of the investment is P1 = P0 + iP0. 84 Year no. Investment value at start of year Investment value at end of year Amount withdrawn 1 P0 P0 + iP0 P0 2 iP0 iP0 + i 2P0 iP0 3 ... ... ... ... N − 1 N iN−1P0 iN−1P0 + iNP0 Hence at the end of year N the investor has got back a total sum SN = PN = P0 + iP0 + i 2P0 + . . .+ i N−1P0 = Substituting numerical values, the final value of the investment for the ner- vous investor is: $133, 333.21. Final return for the bank is: 85 3.6.2 Power series We have seen that a finite geometric series of n terms has the sum Sn = a+ ar + ar 2 + ar3 + . . .+ arn−1 = n−1∑ k=0 ark = a(1− rn) 1− r Suppose we allow n to become very large. Then provided that −1 < r < 1, we have rn → 0 as n → ∞. For example (12)n→∞ = 0. Now setting a = 1, r = x and taking n→∞, it follows that 1 1− x = 1 + x+ x 2 + x3 + . . .+ xn−1+ = ∞∑ k=0 xk (3.1) The right hand side of Equation (3.1) is said to be an infinite power series in the variable x. In this particular case, the sum of the power series is the function f(x) = 1 1− x . Example 3.23. Two trains 200 km apart are moving toward each other. Each one is going at a constant speed of 50 kilometres per hour. A fly starting on the front of one of them flies back and forth between them at a rate of 75 kilometres per hour (fast fly!). The fly does this until the two trains collide. What is the total distance the fly has flown? Power series A general infinite power series in the variable x has the form ∞∑ n=0 anx n = a0 + a1x+ a2x 2 + a3x 3 + . . .+ anx n + . . . (3.2) 86 where a0, a1, a2, . . . are constants. Putting a0 = 1, a1 = 1, a2 = 1, . . . etc, we recover the geometric power series that is defined in Equation (3.1). If x is taken to be sufficiently small, namely −R < x < R, where R > 0 is called the radius of convergence, the infinite power series adds up to a well-defined finite sum which is a function of x, namely f(x). This leads us to the idea of representing continuous functions of x by an infinite power series. That is, we have f(x) = a0 + a1x+ a2x 2 + a3x 3 + . . .+ anx n + . . . (3.3) where the domain of f(x) is −R < x < R. Table of Useful Power Series Series Domain 1 1− x = 1 + x+ x 2 + x3 + . . .+ xn + . . . −1 < x < 1 1 1 + x = 1− x+ x2 − x3 + . . .+ (−1)nxn + . . . −1 < x < 1 ex = 1 + x+ 12!x 2 + 13!x 3 + . . .+ 1n!x n + . . . −∞ < x <∞ ln(1 + x) ≡ loge(1 + x) = x− 12x2 + 13x3 − 14x4 + . . . −1 < x ≤ 1 +(−1)n x n+1 n+ 1 + . . . sinx = x− 13!x3 + 15!x5 − 17!x7 + . . .+ (−1)n x2n+1 (2n+ 1)! + . . . −∞ < x <∞ cosx = 1− 12!x2 + 14!x4 − 16!x6 + . . .+ (−1)n x2n (2n)! + . . . −∞ < x <∞ 87 3.6.3 Taylor series Taylor polynomials and linear approximation Suppose we truncate the infinite power series defined in Equation (3.3) after the first (n+ 1) terms. That is, let us stop the series at the term anx n. We then obtain an nth degree polynomial in x. This polynomial, denoted by Tn(x), is called the Taylor Polynomial of degree n for the function f(x) about x = 0. We have Tn(x) = a0 + a1x+ a2x 2 + a3x 3 + . . .+ anx n Since Tn(x) is a finite polynomial, its domain includes all x. That is, −∞ < x <∞. Example 3.24. Use the table of basic power series to find T0(x), T1(x), T2(x), T3(x) for e x. Solution: T0(x) = 1 T1(x) = 1 + x T2(x) = 1 + x+ 1 2 x2 T3(x) = 1 + x+ 1 2 x2 + 1 6 x3 The first of these Taylor polynomials, namely T0(x) = 1 simply matches the height of the graph of y = f(x) at x = 0. It is sometimes called the zeroth approximation to y = f(x) at x = 0. It can also be called the zeroth Taylor polynomial for f at x = 0. The next Taylor polynomial, namely T1(x) = 1 + x, is called the linear approximation to f(x) at x = 0. The equation y = T1(x) is the equation of the tangent line to the graph y = f(x) at x = 0. The diagram below shows the graph of y = ex (thick curve) on the do- main −1.5 ≤ x ≤ 1.5, along with the graphs of y = T0(x) = 1 and the linear approximation function y = T1(x) = 1 + x. 88 Example 3.25. Use the power series table to find: (i) T0(x), T1(x), T2(x), T3(x), for f(x) = ln(1 + x) (ii) T1(x), T3(x), T5(x), for f(x) = sinx (iii) T1(x), T3(x), T5(x), for f(x) = sin 2x (iv) T0(x), T2(x), T6(x), for f(x) = cos 3x 89 Example 3.26. For parts (i) and (ii) of the example above, draw sketches of the graphs of y = f(x), T0(x) and T1(x). Linear Approximation The Taylor polynomial of degree one (y = T1(x)) is the linear approx- imation to y = f(x) at x = 0. Clearly T1(x) = f(0) + f ′(0)x. Example 3.27. (i) Find the linear approximation to y = f(x) = e3x 2 + x at x = 0. Solution: f(x) = e3x 2 + x f(0) = f ′(x) = f ′(0) = Hence T1(x) = (ii) Use the linear approximation to f(x) to estimate the value of f(0.1). 90 3.6.4 Derivation of Taylor polynomials from first principles Suppose we do not know the infinite power series for a given function f(x) but wish to derive the first few Taylor polynomial approximations to f(x) near x = 0. How do we find T0(x), T1(x), T2(x), . . ., Tn(x)? In order to do this we need to know not only the value of f(x) at x = 0, namely f(0), but also the values of the first n derivatives of f(x) at x = 0. That is, we need to be given the values of f(0), f ′(0), f ′′(0) ≡ f (2)(0), f (3)(0), . . ., f (n)(0). We will build each of the Ti(x), 0 ≤ i ≤ n, so that its function value and derivative at x = 0 match up to (and include) f (i)(0). Solution: We write Tn(x) = a0 + a1x+ a2x 2 + a3x 3 + . . .+ anx n (3.4) where a0, a1, a2, a3, . . ., an are undetermined constants. To find the constant a0: Put x = 0 in Equation (3.4). Therefore Tn(0) = a0 +0+0+0+ . . .+0 = a0. We next insist that Tn(0) = f(0). Hence a0 = f(0). To find the constant a1: Differentiate Equation (3.4) with respect to x and the set T (1) n (0) = f (1)(0). T (1)n (x) = dTn dx = 0 + a1 + 2a2x+ 3a3x 2 + . . .+ nanx n−1 (3.5) Therefore T (1) n (0) = 0+a1 +2a20+3a30 2 + . . .+nan0 n−1 = a1. By insisting that T (1) n (0) = f (1)(0) we get a1 = f (1)(0). To find the constant a2: Differentiate Equation (3.5) with respect to x and then set T (2) n (0) = f (2)(0). T (2)n (x) = d2Tn dx2 = 0 + 2× 1a2x0 + 3× 2a3x1 + . . .+ n(n− 1)anxn−2 (3.6) Therefore T (2) n (0) = 2 × a2 + 3 × 2a30 + . . . + n(n − 1)an0n−2 = 2a2. By insisting that T (2) n (0) = f (2)(0) we get a2 = 1 2! f (2)(0). To find the constant a3: Differentiate Equation (3.6) with respect to x and then set T (3) n (0) = f (3)(0). T (3)n (x) = d3Tn dx3 = 0 + 3× 2× 1a3x0 + . . .+ n(n− 1)(n− 2)anxn−3 (3.7) 91 Therefore T (3) n (0) = 3× 2× 1a3 + . . .+n(n− 1)(n− 2)an0n−3 = 3× 2× 1a3. By insisting that T (3) n (0) = f (3)(0) we get a3 = 1 3! f (3)(0). Repeating this process n times, we find an = 1 n!f (n)(0). Lastly, we substitute these ai values into Equation (3.4), to obtain the Taylor polynomial of degree n at x = 0: Tn(x) = f(0) + f ′(0) 1! x+ f ′′(0) 2! x2 + f (3)(0) 3! x3 + . . .+ f (n)(0) n! xn. Example 3.28. (i) Derive the first four Taylor polynomials T0(x), T1(x), T2(x), T3(x) for the function f(x) = 1 1+x at x = 0. Function Value at x = 0 f(x) = 1 1 + x f(0) = f ′(x) = − 1 (1 + x)2 f ′(0) = f (2)(x) = f (2)(0) = f (3)(x) = f (3)(0) = (ii) Sketch f(x), T1(x) and T2(x). 92 (iii) Deduce the Taylor polynomial of degree three for the function g(x) = 1 1 + 3x . Example 3.29. (i) Derive the Taylor polynomials T0(x), T2(x), T4(x) for the function y = cosx at x = 0. (ii) Using Mathematica (or otherwise) plot y = cosx, as well as T0(x), T2(x), and T4(x) for the domain −pi ≤ x ≤ pi. (iii) Deduce the Taylor polynomial of degree four for y = cos 3x. Function Value at x = 0 f(x) = cosx f(0) = cos(0) = 1 f ′(x) = f ′(0) = f (2)(x) = f (2)(0) = f (3)(x) = f (3)(0) = f (4)(x) = f (4)(0) = Example 3.30. Is it possible to have two different power series for the one function? 93 3.6.5 Cubic splines interpolation Power series (and Taylor series) provide us with a method of approximating values of a function at some particular point. When we construct Taylor polynomials we get a higher level of accuracy when we use a higher degree polynomial. Suppose we are asked to evaluate a function at a particular point x = c (for example finding the zeros of a function, or the intersection point of two functions) but we cannot use algebraic methods (such as the quadratic formula) to solve such a problem. What if we are not even given the function? What can we do? Lucky for us there are methods available that will give a good approximation of the solution. Algebraic methods will give us exact solutions, but the algebraic methods may not be simple to use. Numerical methods will give us an approximation, but are much easier to use. Polynomial interpolation Suppose we are given a set of data points (xi, fi(x)) (note that we do not have the function f(x) explicitly given to us) and we want to build a function that approximates f(x) with as much continuity as we can get. What do we do? We will introduce a method of interpolation to do this. We are essentially going to find a curve which best fits the data given. In science and engineering, numerical methods often involve curve fitting of experimental data. Polynomial interpolation is simply a method of estimating the values of a function, between known data points. Thus linear interpolation would use two data points, quadratic interpolation would use three data points, and so on. As an example, suppose you are asked to evaluate a function f(x) at x = 3.4 but all you are given is the following table of data points. x 0.000 1.200 2.400 3.600 4.000 6.000 7.000 f(x) 0.000 0.932 0.675 −0.443 −0.757 −0.279 0.657 94 Since x = 3.4 is not in the table, the best we can do is find an estimate of f(3.4). We could construct a straight line built on the two points either side of x = 3.4 namely (2.400, 0.675) and (3.600,−0.443). Or we could build a quadratic based on any three points (that cover x = 3.4). If we were really keen we could build a cubic by selecting four points around x = 3.4. All we are doing is simply using a set of points near the target point (x = 3.4) to build a polynomial. Then we estimate f(x) by evaluating the polynomial at the target point. This process is called polynomial interpolation. This method will give us a unique polynomial for our approximation of the function f(x). This is fine but we know that our accuracy is likely to decline as we get further away from our target point. What else can we do? Piecewise polynomial interpolation: Cubic splines Instead of trying to find one polynomial to fit our data points, what if we take sections of the data, and fit polynomials to each section, ensuring that the overall piecewise function is continuous. We would also like differentia- bility, but let us not be too picky for now. The simplest polynomial to use is a linear approximation. This will produce a path that consists of line segments that pass through the points of the data set. The resulting linear spline function can be written as a piecewise function. Unfortunately though, we do not usually have continuity of the first derivatives at the data set points. linear spline This technique can be easily extended to higher order polynomials. If we take piecewise quadratic polynomials, we can get continuity of the first derivatives, but the second derivatives will have discontinuities. If we take piecewise cubic polynomials, then we can make (with some work) both the first and second derivatives continuous. 95 cubic spline Suppose we are given a simple dataset and we are asked to estimate the derivative at say x = 0.35 How do we proceed? Here is one approach. Construct, by whatever means, a smooth approximation y˜(x) to y(x) near x = 0.35. Then put y′(0.35) ≈ y˜′(0.35). What we want is a method which • Produces a unique approximation, • Is continuous over the domain and • Has, at least, a continuous first derivative over the domain. Let us say we have a set of data points (xi, yi), i = 0, 1, 2, · · ·n and we wish to build an approximation y˜(x) which has as much continuity as we can get. Between each pair of points we will construct a cubic. Let y˜i(x) be the cubic function for the interval xi ≤ x ≤ xi+1. We demand that the following conditions are met • Interpolation condition yi = y˜i(xi) (1) • Continuity of the function y˜i−1(xi) = y˜i(xi) (2) • Continuity of the first derivative y˜′i−1(xi) = y˜ ′ i(xi) (3) 96 • Continuity of the second derivative y˜′′i−1(xi) = y˜ ′′ i (xi) (4) Can we solve this system of equations? We need to balance the number of unknowns against the number of equations. We have n+ 1 data points and thus n cubics to compute. Each cubic (f(x) = ax3 + bx2 + cx + d) has 4 coefficients, thus we have 4n unknowns. And how many equations? From the above we count n + 1 equations in (1), and (n − 1) equations in each of (2), (3) and (4). A total of 4n − 2 equations for 4n unknowns. We see that we will have to provide two extra pieces of information. For now let us press on, and see what comes up. We start by putting y˜i(x) = yi + ai(x− xi) + bi(x− xi)2 + ci(x− xi)3 (5) which automatically satisfies equation (1). For the moment suppose we happen to know all of the second derivatives y′′i . We then have y˜ ′′ i (x) = 2bi + 6ci(x− xi) and evaluating this at x = xi leads to bi = y ′′ i /2 (6) Now we turn to equation (4) y′′i+1 = y ′′ i + 6ci(xi+1 − xi) which gives ci = (y ′′ i+1 − y′′i )/(6hi) (7) where we have introduced hi = xi+1 − xi. Next we compute the ai by applying equation (2), yi+1 = yi + aihi + 1 6 (y′′i+1 + 2y ′′ i )h 2 i (8) and so ai = yi+1 − yi hi − 1 6 hi(y ′′ i+1 + 2y ′′ i ). (9) It appears that we have completely determined each of the cubics, though we are yet to use (3), continuity in the first derivative. But remember that we 97 don’t yet know the values of y′′i . Thus equation (3) will be used to compute the y′′i . Using our values for ai, bi and ci we find (after much fiddling) that equation (3) is 6 ( yi+1 − yi hi − yi − yi−1 hi−1 ) = hiy ′′ i+1 + 2(hi + hi−1)y ′′ i + hi−1y ′′ i−1. (10) The only unknowns in this equation are the y′′i of which there are n + 1. But there are only n− 1 equations. Thus we must supply two extra pieces of information. The simplest choice is to set y′′0 = y′′n = 0. Then we have a tridiagonal system of equations 1 to solve for y′′i . That’s as far as we need push the algebra – we can simply now use technology (such as Matlab, Mathematica, Wolfram Alpha...) to solve the tridiagonal system. The recipe • Solve equation (10) for y′′i , • Compute all of the ai from equation (9), • Compute all of the bi from equation (6), • Compute all of the ci from equation (7) and finally • Assemble all of the cubics using equation (5). Our job is done. We have computed the cubic spline for each interval. Example 3.31. Let us say we are given the set of data points in the fol- lowing table. Find the cubic spline that best fits this data. x −2 −1 1 3 f(x) 3 0 2 1 1Often a system of equations will give a coefficient matrix of a special structure. A tridiagonal system of equations is one such that the coefficient matrix has zero entries everywhere except for in the main diagonal and in the diagonals above and below the main diagonal. 98 We are going to use equation 5 to give three cubics, y˜1(x), y˜2(x) and y˜3(x). Recall y˜i(x) = yi + ai(x− xi) + bi(x− xi)2 + ci(x− xi)3 (5) From the data points we have x1 = −2, x2 = −1, x3 = 1, x4 = 3, y1 = 3, y2 = 0, y3 = 2 and y4 = 1. We also know that y′′1 = y′′4 = 0. Putting this information into equation 10 we obtain the following two equa- tions When i = 2 24 = 6y′′2 + 2y′′3 When i = 3 −9 = 2y′′2 + 8y′′3 Solving this system of equations we find that y′′2 = 105 22 and y′′3 = − 51 22 . The use of equation 9 will give us a1 = −167 44 , a2 = −31 22 and a3 = 23 22 . Equation 6 gives b1 = 0, b2 = 105 44 and b3 = −51 44 . Equation 7 gives c1 = 35 44 , c2 = − 4 11 and c3 = 17 88 . Now using equation 5 we produce the following three cubic splines y˜1(x) = 3− 167 44 (x+ 2) + 35 44 (x+ 2)3 for −2 ≤ x ≤ −1 y˜2(x) = −31 22 (x+ 1) + 105 44 (x+ 1)2 − 4 11 (x+ 1)3 for −1 ≤ x ≤ 1 y˜3(x) = 2 + 23 22 (x− 1)− 51 44 (x− 1)2 + 17 88 (x− 1)3 for 1 ≤ x ≤ 3 Example 3.32. For the above example, check that the four conditions (1, 2, 3 and 4) are met. 99 Example 3.33. Compute the cubic spline that passes through the following data set points x 0 1 2 3 f(x) 0 0.5 2 1.5 100 Chapter 4 Integration 4.1 Fundamental theorem of calculus 4.1.1 Revision Computing I = ∫ f(x)dx is no different from finding a function F (x) such that dF dx = f(x). Thus ∫ dF dx dx = F (x). The function F (x) is called the anti-derivative of f(x). You should recall some of the basic integrals. ∫ kdx = kx+ C, where C ∈ R∫ xndx = 1 n+ 1 xn+1 + C, n 6= −1∫ sin(x)dx = − cos(x) + C∫ cos(x)dx = sin(x) + C∫ exdx = ex + C∫ 1 x dx = ln |x|+ C Recall also the properties of indefinite integrals: ∫ [ f(x) + g(x) ] dx = ∫ f(x)dx+ ∫ g(x)dx 101 ∫ [ f(x)− g(x)]dx = ∫ f(x)dx− ∫ g(x)dx ∫ kf(x)dx = k ∫ f(x)dx for any constant k There are also a few tricks we can use to find F (x), such as integration by substitution and integration by parts. Integration by substitution If I = ∫ f(x)dx looks nasty, try changing the variable of integration. That is, put u = u(x) for some chosen function u(x), then invert the function to find x = x(u) and substitute into the integral. I = ∫ f(x)dx = ∫ f(x(u)) dx du du If we have chosen well, then this second integral will be easy to do. Example 4.1. Find ∫ 4x cos(x2 + 5)dx Integration by parts This is a very powerful technique based on the product rule for derivatives. Recall that d(fg) dx = g df dx + f dg dx Now integrate both sides∫ d(fg) dx dx = ∫ g df dx dx+ ∫ f dg dx dx But integration is the inverse of differentiation, thus we have 102 fg = ∫ g df dx dx+ ∫ f dg dx dx which we can re-arrange to∫ f dg dx dx = fg − ∫ g df dx dx Thus we have converted one integral into another. The hope is that the second integral is easier than the first. This will depend on the choices we make for f and dg dx . Example 4.2. Find ∫ xexdx. Solution: We have to split the integrand xex into two pieces, f and dg dx . If we choose f(x) = x and dg dx = ex then df dx = 1 and g(x) = ex. Then ∫ xexdx = fg − ∫ g df dx dx = xex − ∫ 1 · exdx = xex − ex + C Example 4.3. Find ∫ x cos(x)dx. Solution: Choose f(x) = x and dg dx = cos(x) then df dx = 1 and g(x) = sin(x). Then ∫ x cos(x)dx = fg − ∫ g df dx dx = x sin(x)− ∫ 1 · sin(x)dx = x sin(x) + cos(x) + C 103 Example 4.4. Find ∫ x sin(x)dx 4.1.2 Fundamental theorem The Fundamental Theorem of Calculus states that: If f(x) is a continuous function on the interval [a, b] and there is a function F (x) such that F ′(x) = f(x), then∫ b a f(x)dx = F (b)− F (a) Note that ∫ b a f(x)dx is known as the definite integral from a to b as we are integrating the function f(x) between the values x = a and x = b. Can we interpret this theorem in some physical way? Of course! Let s(t) be a continuous function which gives the position of a moving object at time t where t is in the interval [a, b]. We know that s′(t) gives the velocity of the object at time t, and we want to know what is the meaning of ∫ b a s ′(t)dt. Recall that distance = velocity × time. Thus for any small interval ∆t in [a, b] we have s′(t)×∆t ≈ distance travelled in ∆t. Adding each successive calculation of the distance travelled for the small intervals of time ∆t from t = a to t = b will give us (approximately) the total distance travelled over the interval [a, b]. Integrating the velocity function s′(t) over the interval [a, b] will then give us the total distance travelled over the interval [a, b]. Thus the definite inte- gral of a velocity function can be interpreted as the total distance travelled in the interval [a, b]. The integral of the rate of change of any quantity gives the total change in that quantity. 104 4.2 Area under the curve When f(x) is a positive function and a < b then the definite integral ∫ b a f(x)dx gives the area between the graph of the function f(x) and the x - axis. In other words ∫ b a f(x)dx = A Example 4.5. Find the area between the graph of y = sinx and the x - axis, between x = 0 and x = pi2 . When f(x) is a negative function and a < b then the definite integral gives the negative of the area between the graph of the function f(x) and the x - axis. ∫ b a f(x)dx = −A 105 Example 4.6. Find the area between the graph of y = sinx and the x - axis, between x = pi and x = 3pi2 . When f(x) is positive for some values of x in the interval [a, b] and negative for other values in the interval [a, b] then the definite integral gives the sum of the areas above the x - axis and subtracts the areas below the x - axis. In other words ∫ b a f(x)dx = A−B + C Example 4.7. Find the area between the graph of y = sinx and the x - axis, between x = 0 and x = 3pi2 . 106 Area between two curves. Given two continuous functions f(x) and g(x) where f(x) ≥ g(x) for all x in the interval [a, b], the area of the region bounded by the curves y = f(x) and y = g(x), and the lines x = a and x = b is given by the definite integral ∫ b a [ f(x)− g(x)]dx This is true regardless of whether the functions are positive, negative, or a combination of both. Can you see why? Example 4.8. Find the area between the graphs of y = sinx and y = cosx between x = pi4 and x = pi. Look carefully at the next example. Example 4.9. Find the area between the graphs of y = sinx and y = cosx between x = 0 and x = pi. Example 4.10. Find the area bounded by the graphs of x = y2 − 5y and x = −2y2 + 4y. 107 4.3 Trapezoidal rule Sometimes it may not be all that simple to integrate a function. (As an example, try finding the anti-derivative of e−x2 .) When we encounter situ- ations such as this we can again turn to numerical methods of approx- imation to help us out, avoiding the need to integrate the function. One such method is the Trapezoidal rule which (as its name suggests) uses the area of the trapezium to approximate the area under the graph of a function f(x). Recall the area of a trapezium is given by A = 1 2 (m+ n)w where m and n are lengths of the parallel sides of the trapezium, and w is the distance between the parallel lengths (ie the width). If the interval is [a, b] then w = b− a. If the interval [a, b] is divided into n equal sub-intervals, then each sub-interval has width wi = b− a n = ∆n and the successive heights (m+n) of the parallel sides are given by f(a) +f(a+ ∆n); f(a+ ∆n) + f(a+ 2∆n); . . ., f(a+ (n− 1)∆n) + f(b). The sum of the areas of each of the trapezoids created by each sub-interval can then be stated as: A = 1 2 b− a n ( f(a) + 2f(a+ ∆n) + . . .+ 2f(a+ (n− 1)∆n) + f(b) ) . Altering our notation slightly gives A = n−1∑ i=0 b− a 2n ( f(xi) + f(xi + ∆n) ) . Note that when i = 0, x0 = a and thus f(x0) = f(a), and when i = n − 1, f(xn−1 + ∆n) = f(b). Thus the sum of the areas of each of the trapezoids created by each sub- interval can be stated as: ∫ b a f(x)dx ≈ b− a 2n ( f(a) + f(b) + 2 n−1∑ i=1 f(xi) ) 108 Example 4.11. Use the Trapezoidal rule with n = 4 to find an approximate value of ∫ 2 0 2 xdx. Solution In the interval [0, 2] when n = 4 we have four trapezoids each of width 12 . The endpoints of our interval are a = 0 and b = 2 thus f(a) = f(0) = 20 = 1 and f(b) = f(2) = 22 = 4. Note that b− a 2n = 2 2× 4 = 1 4 . Thus ∫ 2 0 2xdx ≈ b− a 2n ( f(a) + f(b) + 2 n−1∑ i=1 f(xi) ) = 1 4 ( 1 + 4 + 2 3∑ i=1 2xi ) = 1 4 ( 1 + 4 + 2(21/2 + 21 + 23/2) ) = 1 4 ( 5 + 2( √ 2 + 2 + 2 √ 2) ) = 1 4 (9 + 6 √ 2) Example 4.12. Use the Trapezoidal rule with n = 5 to find an approximate value of ∫ pi 0 √ sinxdx Example 4.13. Use the Trapezoidal rule with n = 4 to find an approximate value of ∫ 1 0 1 1 + x2 dx 109 Example 4.14. Use the Trapezoidal rule with n = 4 to find an approximate value of ∫ 1 0 ex 2 dx 110 Chapter 5 Multivariable Calculus 5.1 Functions of several variables We are all familiar with simple functions such as y = x3. And we all know the answers to questions such as • What is the domain and range of the function? • What does the function look line as a plot in the xy− plane? • What is the derivative of the function? Single variable calculus encompasses functions such as y = x3 where y is a function of the single (independent) variable x. The graph of y = f(x) is a curve in the xy− plane. In nature, many physical quantities depend on more than one independent variable. We are now going to explore how to answer similar questions to the above for functions such as z = x3 + y2. This is just one example of what we call functions of several variables. We can have as many variables as we want; z = x3 + y2 is a function of two independent variables (x and y), w = x3 + y2− z2 is a function of three independent variables (x, y and z), and so forth. Just as we would write f(x) = x3 we can write f(x, y) = x3 + y2 and f(x, y, z) = x3 + y2 − z2 and so on. For the remainder of this course we will focus on functions involving two independent variables, but bear in mind that the lessons learnt here will be applicable to functions of any number of variables. 5.1.1 Definition A function f of two (independent) variables (x, y) is a single valued mapping of a subset of R2 into a subset of R. 111 What does this mean? Simply that for any allowed value of x and y we can compute a single value for f(x, y). In a sense f is a process for converting pairs of numbers (x and y) into a single number f . The notation R2 means all possible choices of x and y such as all points in the xy-plane. The symbol R denotes all real numbers (for example all points on the real line). The use of the word subset in the above definition is simply to remind us that functions have an allowed domain (i.e. a subset of R2) and a corresponding range (i.e. a subset of R). Notice that we are restricting ourselves to real variables, that is the func- tion’s value and its arguments (x, y) are all real numbers. This game gets very exciting and somewhat tricky when we enter the world of complex num- bers. Such adventures await you in later year mathematics (not surprisingly this area is known as Complex Analysis). 5.1.2 Notation Here is a function of two variables f(x, y) = sin(x+ y) We can choose the domain to be R2 and then the range will be the closed set [−1,+1]. Another common way of writing all of this is f : (x, y) ∈ R2 7→ sin(x+ y) ∈ [−1, 1] This notation identifies the function as f , the domain as R2, the range as [−1, 1] and most importantly the rule that (x, y) is mapped to sin(x + y). For this subject we will stick with the former notation. You should also note that there is nothing sacred about the symbols x, y and f . We are free to choose what ever symbols takes our fancy, for example we could create the function w(u, v) = log(u− v) Example 5.1. What would be a sensible choice of domain for the previous function? 5.1.3 Surfaces A very common application of functions of two variables is to describe a surface in 3-dimensional space. How do we do this? The idea is that we take the value of the function to describe the height of the surface above 112 the xy-plane. If we use standard Cartesian coordinates then such a surface could be described by the equation z = f(x, y) This surface has a height z units above each point (x, y) in the xy-plane. Just as the equation y = f(x) describes the curve in the xy− plane, the equation z = f(x, y) describes the surface in R3. Just as the curve C = f(x) is made up of the points (x, y), the surface S = f(x, y) is made up of the points (x, y, z). As z = f(x, y) describes this surface explicitly as a height function over a plane, we say that the surface is given in explicit form. A surface such as z = f(x, y) is also often called the graph of the function f . Here are some simple examples. A very good exercise is to try to convince yourself that the following images are correct (i.e. that they do represent the given equation). Note that in each of the following r is defined as r = + √ (x2 + y2). z = x2 + y2 113 1 = x2 + y2 − z2 z = cos (3pir) exp (−2r2) z = √ 1 + y2 − x2 114 z = −xy exp (−x2 − y2) 1 = x+ y + z Example 5.2. Sketch and describe the graph of the surface z = f(x, y) = 6 + 3x+ 2y. 115 5.1.4 Alternative forms We might ask are there any other ways in which we can describe a surface? We should be clear that (in this subject) when we say surface we are talking about a 2-dimensional surface in our familiar 3-dimensional space. With that in mind, consider the equation 0 = g(x, y, z) What do we make of this equation? Well, after some algebra we might be able to re-arrange the above equation into the familiar form z = f(x, y) for some function f . In this form we see that we have a surface, and thus the previous equation 0 = g(x, y, z) also describes a surface. When the surface is described by an equation of the form 0 = g(x, y, z) we say that the surface is given in implicit form. Consider all of the points in R3 (i.e all possible (x, y, z) points). If we now introduce the equation 0 = g(x, y, z) we are forced to consider only those (x, y, z) values that satisfy this constraint. We could do so by, for example, arbitrarily choosing (x, y) and using the equation (in the form z = f(x, y) to compute z. Or we could choose say (y, z) and use the equation 0 = g(x, y, z) to compute x. Which ever road we travel it is clear that we are free to choose just two of the (x, y, z) with the third constrained by the equation. Now consider some simple surface and let’s suppose we are able to drape a sheet of graph paper over the surface. We can use this graph paper to select individual points on the surface (well as far as the graph paper covers the surface). Suppose we label the axes of the graph paper by the symbols u and v. Then each point on surface is described by a unique pair of values (u, v). This makes sense – we are dealing with a 2-dimensional surface and so we expect we would need 2 numbers ((u, v)) to describe each point on the surface. The parameters (u, v) are often referred to as (local) coordinates on the surface. How does this picture fit in with our previous description of a surface, as an equation of the form 0 = g(x, y, z)? Pick any point on the surface. This point will have both (x, y, z) and (u, v) coordinates. That means that we can describe the point in terms of either (u, v) or (x, y, z). As we move around the surface all of these coordinates will vary. So given (u, v) we should be able to compute the corresponding (x, y, z) values. That is we should be able to find functions P (u, v), Q(u, v) and R(u, v) such that x = P (u, v) y = Q(u, v) z = R(u, v) 116 The above equations describe the surface in parametric form. Example 5.3. Identify (i.e. describe) the surface given by the equations x = 2u+ 3v + 1 y = u− 4v + 2 z = u+ 2v − 1 Hint : Try to combine the three equations into one equation involving x, y and z but not u and v. Example 5.4. Describe the surface defined by the equations x = 3 cos(φ) sin(θ) y = 4 sin(φ) sin(θ) z = 5 cos(θ) for 0 < φ < 2pi and 0 < θ < pi Example 5.5. How would your answer to the previous example change if the domain for θ was 0 < θ < pi/2? Equations for surfaces A 2-dimensional surface in 3-dimensional space may be described by any of the following forms. Explicit z = f(x, y) Implicit 0 = g(x, y, z) Parametric x = P (u, v), y = Q(u, v), z = R(u, v) 117 5.2 Partial derivatives 5.2.1 First partial derivatives We are all familiar with the definition of the derivative of a function of one variable df dx = lim ∆x→0 f(x+ ∆x)− f(x) ∆x The natural question to ask is: Is there similar rule for functions of more than one variable? The answer is yes, and we will develop the necessary formulas by a simple generalisation of the above definition. Let us suppose we have a function, say f(x, y). Suppose for the moment that we pick a particular value of y, say y = 3. Then only x is allowed to vary and in effect we now have a function of just one variable. Thus we can apply the above definition for a derivative which we write as ∂f ∂x = lim ∆x→0 f(x+ ∆x, y)− f(x, y) ∆x Notice the use of the symbol ∂ rather than d. This is to remind us that in computing this derivative all other variables are held constant (which in this instance is just y). Of course we could do the same again but with x held constant. This gives us the derivative in y ∂f ∂y = lim ∆y→0 f(x, y + ∆y)− f(x, y) ∆y Each of these derivatives, ∂f/∂x and ∂f/∂y are known as first order par- tial derivatives of f (the derivative of a function of one variable is often called an ordinary derivative). We can also look at this in terms of the rate of change, as we did for single variable functions. If z is a function of two independent variables x and y (ie z = f(x, y)) then there are two independent rates of change. One of these is the rate of change of f with respect to the variable x, and the other is the rate of change of f with respect to the variable y. You might think that we would now need to invent new rules for the (partial) derivatives of products, quotients and so on. But our definition of partial 118 derivatives is built upon the definition of an ordinary derivative of a function of one variable. Thus all the familiar rules carry over without modification. For example, the product rule for partial derivatives is ∂ (fg) ∂x = g ∂f ∂x + f ∂g ∂x ∂ (fg) ∂y = g ∂f ∂y + f ∂g ∂y Computing partial derivatives is no more complicated than computing or- dinary derivatives. Hooray for us! Rules for finding partial derivatives • To find ∂f ∂x , treat y as a constant and differentiate f(x, y) with respect to x only. • To find ∂f ∂y , treat x as a constant and differentiate f(x, y) with respect to y only. Example 5.6. If f(x, y) = x3 + x2y3 − 2y2, find fx(2, 1) and fy(2, 1). Example 5.7. If f(x, y) = sin(x) cos(y) then ∂f ∂x = ∂ sin(x) cos(y) ∂x = cos(y) ∂ sin(x) ∂x = cos(y) cos(x) Also find ∂f ∂y . 119 Example 5.8. If g(x, y, z) = e−x2−y2−z2 then ∂g ∂z = ∂e−x2−y2−z2 ∂z = e−x 2−y2−z2 ∂(−x2 − y2 − z2) ∂z = −2ze−x2−y2−z2 Also find ∂g ∂x and ∂g ∂y . A word on notation: An alternative notation for ∂f∂x and ∂f ∂y is fx and fy respectively. You will find both versions are commonly used. 5.3 The tangent plane For functions of one variable we found that a tangent line provides a useful means of approximating the function. It is natural to ask how we might generalise this idea to functions of several variables. Constructing a tangent line for a function of a single variable, f = f(x), is quite simple. (This should be revision!) First we compute the function’s value f and its gradient df dx at some chosen point. We then construct a straight line equation (y = mx + c) with these values at the chosen point. This line is the tangent line of the function f at the given point. Example 5.9. Find the tangent line to the function f(x) = sinx at x = pi 4 . How do we relate this to functions of several variables? 120 5.3.1 Geometric interpretation Earlier we noted that the partial derivative ∂f ∂x of the function of two vari- ables z = f(x, y) is the rate of change of f in the x-direction, keeping y fixed. To visualise ∂f ∂x as the slope (or gradient) of a straight line, consider the diagram below. This diagram shows the intersection of the vertical plane y = ‘constant’ with the smooth differentiable surface z = f(x, y) in R3. The intersection of the plane with this surface is a curve, C1 say. On C1, x can vary but y stays constant. We now draw the tangent line to the surface at the point P that also lies in the vertical plane y = constant. This tangent line, T1 say, strikes the xy-plane at angle α as shown. This tangent line has slope tanα which equals the rate of change of the height z of the surface z = f(x, y) in the x-direction at the point P . We thus have ∂f ∂x = tanα = slope of the tangent line to the surface z = f(x, y) in x-direction. 121 Similarly, the diagram below illustrates the intersection of the vertical plane x = ‘constant’ with the surface z = f(x, y). This intersection is a smooth curve, C2 say, on the surface. On C2, y can vary but x stays fixed. As we move along this curve, the height z to the surface changes only with respect to the change in the independent variable y. Now we can draw the tangent line to the curve C2 at the point P . This strikes the xy-plane at angle β. The slope of this tangent line, namely tanβ, gives the rate of change of f with respect to y at the point P . That is ∂f ∂y = tanβ = slope of the tangent line to the surface z = f(x, y) in y-direction. What happens if we consider the rate of change of f with respect to x and with respect to y together? It is helpful to look at the diagram below, which shows a section of the curve S of a differentiable function z = f(x, y) at a point P (a, b, c), where c = f(a, b). 122 If we zoom in onto the surface at P it becomes locally flat. We can then draw the two tangent lines T1 and T2 to the surface at P which are tan- gential to the two curves C1 and C2 that lie in the vertical planes y = b and x = a. These tangent lines, which have the slopes ∂f ∂x = tanα and ∂f ∂y = tanβ shown previously, give the rate of change of f(x, y) in both the x and y directions. The tangent plane to the surface at the point P is the plane that con- tains both of the tangent lines T1 and T2. Let us now find the equation of this plane. We know that the general equation of a plane that passes through the point P (a, b, c) is z − c = m(x− a) + n(y − b) (5.1) Here m and n are the slopes of the lines of intersection of the general plane with the two vertical planes y = b and x = a that are parallel to the principal coordinate planes (the xz− plane and the yz− plane respectively). If we now put y = b in equation (5.1), we have z − c = m(x − a). This is the equation of the line of intersection of our general plane with the plane y = b. It clearly has slope m. Next we put x = a in equation (5.1) and this yields z−c = n(y−b). This is the equation of the line of intersection of our general plane with the plane x = a. It clearly has slope n. Lastly, if we choose m = tanα = ∂f ∂x = fx(a, b) and = n = tanβ = ∂f ∂y = fy(a, b) then the equation of the tangent plane to the surface z = f(x, y) at the 123 point (x, y) = (a, b) is z = f(a, b) + fx(a, b) · (x− a) + fy(a, b) · (y − b) (5.2) Our work is done! Example 5.10. Find the equation of the tangent plane to the surface z = 2x2 + y2 at the point (a, b) = (1, 1). Solution: Here f(x, y) = 2x2 + y2. Thus f(a, b) = f(1, 1) = 2 · 12 + 12 = 3. Next, ∂f ∂x = 4x therefore ∂f ∂x (1, 1) = ∂f ∂y = 2y therefore ∂f ∂y (1, 1) = Using equation (5.2) the equation of the tangent plane is 5.3.2 Linear approximations We have done the hard work, and now it is time to enjoy the fruits of our labour. Just as we used the tangent line in approximations for functions of one variable, we can use the tangent plane as a way to estimate the original function f(x, y) in a region close to the chosen point. The equation of the tangent plane to the surface z = f(x, y) at the point (a, b) is also the equation for the linear approximation to z = f(x, y) for points (x, y) near (a, b). We can regard the tangent plane equation (5.2) as the natural extension to functions of two variables (x, y) of the Taylor polynomial of degree one equation y = T1(x) = f(a)+f ′(a)·(x−a). This is the linear approximation equation for functions of one variable, namely y = f(x), for x near a. Hence we call z = T1(x, y) = f(a, b) + fx(a, b) · (x− a) + fy(a, b) · (y − b) the linear approximation to f(x, y) for points (x, y) near (a, b). 124 Example 5.11. Derive the linear approximation function T1(x, y) for the function f(x, y) = √ 3x− y at the point (4, 3). Example 5.12. Use the result of example 5.9 to estimate sin(x) sin(y) at ( 5pi 16 , 5pi 16 ). 125 5.4 Chain rule In a previous lecture we saw how we could compute (partial) derivatives of functions of several variables. The trick we employed was to reduce the number of independent variables to just one (which we did by keeping all but one variable constant). There is another way in which we can achieve this reduction, which involves parametrising the function. Consider a function of two variables f(x, y) and let’s suppose we are given a smooth (continuous, with derivatives which are also continuous) curve in the xy-plane. Each point on this curve can be characterised by its distance from some arbitrary starting point on the curve. In this way we can imagine that the (x, y) pairs on this curve are given as functions of one variable, let’s call it s. That is, our curve is described by the parametric equations x = x(s) y = y(s) for some functions x(s) and y(s). The values of the function f(x, y) on this curve are therefore given by f = f(x(s), y(s)) and this is just a function of one variable s. Thus we can compute its derivative df/ds. We will soon see that df/ds can be computed in terms of the partial derivatives. Example 5.13. Given the curve x(s) = 2s, y(s) = 4s2 − 1 < s < 1 and the function f(x, y) = 5x− 7y + 2 compute df ds at s = 0. Example 5.14. Show that for the curve x(s) = s, y(s) = 2 we get df ds = ∂f ∂x . 126 Example 5.15. Show that for the curve x(s) = −1, y(s) = s we get df ds = ∂f ∂y . The last two examples show that df/ds is somehow tied to the partial deriva- tives of f . The exact link will be made clear in a short while. What meaning can we assign to this number df/ds? It helps to imagine that we have drawn a graph of f(x, y) (i.e. as a surface over the xy-plane). Now draw the curve (x(s), y(s)) in the xy-plane and imagine walking along that curve, let’s call it C. At each point on C, f(s) is the height of the surface above the xy-plane. If you walk a short distance ∆s then the height might change by an amount ∆f . The rate at which the height changes with respect to the distance travelled is then ∆f/∆s. In the limit of infinitesimal distances we recover df/ds. Thus we can interpret df/ds as measuring the rate of change of f along the curve. This is exactly what we would have expected – after all, derivatives measure rates-of-change. The first example above showed how you could compute df/ds by first re- ducing f to an explicit function of s. It was also hinted that it is also possible to evaluate df/ds using partial derivatives. Let’s go back to basics. The derivative df/ds could be calculated as df ds = lim ∆s→0 f(x(s+ ∆s), y(s+ ∆s))− f(x(s), y(s)) ∆s We will re-write this by adding and subtracting f(x(s), y(s+∆s)) just before the minus sign. After a little rearranging we get df ds = lim ∆s→0 f(x(s+ ∆s), y(s+ ∆s))− f(x(s), y(s+ ∆s)) ∆s + lim ∆s→0 f(x(s), y(s+ ∆s))− f(x(s), y(s)) ∆s 127 Now let’s look at the first limit. If we introduce ∆x = x(s+∆s)−x(s) then we can write lim ∆s→0 f(x(s+ ∆s), y(s+ ∆s))− f(x(s), y(s+ ∆s)) ∆s = lim ∆s→0 f(x(s+ ∆s), y(s+ ∆s))− f(x(s), y(s+ ∆s)) ∆x ∆x ∆s = ∂f ∂x dx ds . We can write a similar equation for the second limit. Combining the two leads us to df ds = ∂f ∂x dx ds + ∂f ∂y dy ds This is an extremely useful and important result. It is an example of what is known as the chain rule for functions of several variables. The Chain Rule Let f = f(x, y) be a differentiable function. If the function is parametrized by x = x(s) and y = y(s) then the chain rule for derivatives of f along a path x = x(s), y = y(s) is df ds = ∂f ∂x dx ds + ∂f ∂y dy ds Now that we have covered this much, it’s rather easy to see an important extension of the above result. Suppose the path was obtained by holding some other parameter constant. That is, imagine that the path x = x(s), y = y(s) arose from some more complicated expressions such as x = x(s, t), y = 128 y(s, t) with t held constant. How would our formula for the chain rule change? Not much other than we would have to keep in mind throughout that t is constant. We encountered this issue once before and that led to partial rather than ordinary derivatives. Clearly the same change of notation applies here, and thus we would write ∂f ∂s = ∂f ∂x ∂x ∂s + ∂f ∂y ∂y ∂s as the first partial derivative of f with respect to s. Let’s see where we are at so far. We are given a function of two variables f = f(x, y) and we are also given two other functions, also of two variables, x = x(s, t), y = y(s, t). Then ∂f/∂s can be calculated using the above chain rule. Of course you could also compute ∂f/∂s directly by substituting x = x(s, t) and y = y(s, t) into f(x, y) before taking the partial derivatives. Both approaches will give you exactly the same answer. Note that there is nothing special in the choice of symbols, x, y, s or t. You will often find (u, v) used rather than (s, t). Example 5.16. Given f = f(x, y) and x = 2s + 3t, y = s − 2t compute ∂f/∂t directly and by way of the chain rule. 129 The Chain Rule : Episode 2 Let f = f(x, y) be a differentiable function. If x = x(u, v), y = y(u, v) then ∂f ∂u = ∂f ∂x ∂x ∂u + ∂f ∂y ∂y ∂u ∂f ∂v = ∂f ∂x ∂x ∂v + ∂f ∂y ∂y ∂v 130 5.5 Gradient and Directional Derivative Given any differentiable function of several variables we can compute each of its first partial derivatives. Let’s do something ‘out of the square’. We will assemble these partial derivatives as a vector which we will denote by ∇f . So for a function f(x, y) of two variables we define ∇f = ∂f ∂x i˜+ ∂f∂y j˜ The is known as the gradient of f and is often pronounced grad f. This may be pretty but what use is it? If we look back at the formula for the chain rule we see that we can write it out as a vector dot-product df ds = ∂f ∂x dx ds + ∂f ∂y dy ds = ( ∂f ∂x i˜+ ∂f∂y j˜ ) · ( dx ds i˜+ dyds j˜ ) = (∇f) · ( dx ds i˜+ dyds j˜ ) The number that we calculate in this process, ie df/ds, is known as the directional derivative of f in the direction t˜. What do we make of the vector on the far right of this equation, ie dxds i˜+ dyds j˜? It is not hard to seethat it is a tangent vector to the curve (x(s), y(s)). And if we chose the parameter s to be distance along the curve then we also see that it is a unit vector. Example 5.17. Prove the last pair of statements, ie that the vector is a tangent vector and that it is a unit vector. 131 It is customary to denote the tangent vector by t˜ (some people prefer u˜).With the above definitions we can now write the equation for a directional derivative as follows df ds = t˜ · ∇f Yet another variation on the notation is to include the tangent vector as subscript on ∇. Thus we also have df ds = ∇t˜f Directional derivative The directional derivative df/ds of a function f in the direction t˜ isgiven by df ds = t˜ · ∇f = ∇t˜f where the gradient ∇f is defined by ∇f = ∂f ∂x i˜+ ∂f∂y j˜ and t˜ is a unit vector, t˜ · t˜= 1. Example 5.18. Given f(x, y) = sin(x) cos(y) compute the directional deriva- tive of f in the direction t˜= (i˜+ j˜)/√2. 132 Example 5.19. Given ∇f = 2xi˜ + 2yj˜ and x(s) = s cos(0.1), y(s) =s sin(0.1) compute df/ds at s = 1. Example 5.20. Given f(x, y) = (xy)2 and the vector v˜ = 2i˜+ 7j˜ computethe directional derivative at (1, 1). Hint: Is v˜ a unit vector? We began this discussion by restricting a function of many variables to a function of one variable. We achieved this by choosing a path such as x = x(s), y = y(s). We might ask if the value of df/ds depends on the choice of the path? That is we could imagine many different paths all sharing the one point, call it P , in common. Amongst these different paths might we get different answers for df/ds? This is a very good question. To answer it let’s look at the directional derivative in the form df ds = t˜ · ∇f First we note that ∇f depends only on the values of (x, y) at P . It knows nothing about the curves passing through P . That information is contained solely in the vector t˜. Thus if a family of curves passing through P sharethe same t˜ then we most certainly will get the same value for df/ds for eachmember of that family. But what class of curves share the same t˜ at P?Clearly they are all tangent to each other at P . None of the curves cross any other curve at P . 133 At this point we can dispense with the curves and retain just the tangent vector t˜ at P . All that we require to compute df/ds is the direction we wishto head in, t˜, and the gradient vector, ∇f , at P . Choose a different t˜ andyou will get a different answer for df/ds. In each case df/ds measures how rapidly f is changing the direction of t˜. 134 5.6 Second order partial derivatives The result of a partial derivative of a function yields another function of one or more variables. We are thus at liberty to take another derivative, generating yet another function. Clearly we can repeat this any number of times (though possibly subject to some technical limitations as noted below, see Exceptions). Example 5.21. Let f(x, y) = sin(x) sin(y). Then we can define g(x, y) = ∂f ∂x and h(x, y) = ∂g ∂x . That is g(x, y) = ∂f ∂x = ∂ ( sin(x) sin(y) ) ∂x = cos(x) sin(y) and h(x, y) = ∂g ∂x = ∂ ( cos(x) sin(y) ) ∂x = − sin(x) sin(y) Example 5.22. Compute ∂g ∂y for the above example. From this we see that h(x, y) was computed as follows h(x, y) = ∂g ∂x = ∂ ∂x ( ∂f ∂x ) This is often written as h(x, y) = ∂2f ∂x2 and is known as a second order partial derivative of the function f(x, y). Now consider the case where we compute h(x, y) by first taking a partial derivative in x then followed by a partial derivative in y, that is h(x, y) = ∂g ∂y = ∂ ∂y ( ∂f ∂x ) 135 and this is normally written as h(x, y) = ∂2f ∂y∂x Note the order on the bottom line – you should read this from right to left. It tells you that to take a partial derivative in x then a partial derivative in y. The function z = f(x, y) has two partial derivatives fx and fy. Taking partial derivatives of fx and fy yields four second order partial derivatives of the function f(x, y). It’s now a short leap to cases where we might try to find, say, the fifth partial derivatives, such as P (x, y) = ∂5Q ∂x∂y∂y∂x∂x Partial derivatives that involve one or more of the independent variables are known as mixed partial derivatives. Example 5.23. Given f(x, y) = 3x2+2xy compute ∂2f ∂x∂y and ∂2f ∂y∂x . What do you notice? Order of partial derivatives does not matter: Clairaut’s Theorem If f(x, y) is a twice-differentiable function (ie the second order partial derivatives exist) then the order in which its mixed partial derivatives are calculated does not matter. Each ordering will yield the same function. For a function of two variables this means ∂2f ∂x∂y = ∂2f ∂y∂x This is not immediately obvious but it can be proved and it is a very useful result. A quick word on notation: The second order partial derivatives can be written as follows: 136 ∂2f ∂x2 = fxx ∂2f ∂y2 = fyy ∂2f ∂x∂y = fxy Example 5.24. Use the above theorem to show that P (x, y) = ∂5Q ∂x∂y∂y∂x∂x = ∂5Q ∂y∂y∂x∂x∂x = ∂5Q ∂x∂x∂x∂y∂y The theorem allows us to simplify our notation, all we need do is record how many of each type of partial derivative are required, thus the above can be written as P (x, y) = ∂5Q ∂x3∂y2 = ∂5Q ∂y2∂x3 Example 5.25. Show that the function u(x, y) = e−x cos y is a solution of Laplace’s equation ∂2u ∂x2 + ∂2u ∂y2 = 0. Solution: ux = ∂ ∂x ( e−x cos y ) = −e−x cos y and uy = ∂ ∂y ( e−x cos y ) = −e−x sin y Then uxx = ∂ ∂x (− e−x cos y) = and uyy = ∂ ∂y (− e−x sin y) = Hence uxx + uyy = 137 5.6.1 Taylor polynomials of higher degree In earlier lectures we discovered that the linear approximation function T1(x, y) to a function of two variables f(x, y) near the point (a, b) is the same as the equation of the tangent plane at the point (a, b). In other words, for (x, y) near (a, b) we have f(x, y) ≈ T1(x, y) = f(a, b) + fx(a, b) · (x− a) + fy(a, b) · (y − b) (5.3) The function T1(x, y) is also known as the Taylor polynomial of degree one for f(x, y) near (a, b), and clearly uses first partial derivatives of f(x, y). Now the tangent plane provides a good fit to f(x, y) only if (x, y) are suf- ficiently close to (a, b). But this is obviously not always going to be the case. If we want to obtain a more accurate polynomial approximation to the graph of the surface z = f(x, y), we need to take into account the local curvature of the surface at (a, b). This is done by including the second partial derivatives of f(x, y), namely fxx, fxy and fyy. Using these gives us T2(x, y) which is the Taylor polynomial of degree two. T2(x, y) is also known as the quadratic approximation function: T2(x, y) = f(a, b) + fx(a, b) · (x− a) + fy(a, b) · (y − b) + 1 2! [ fxx(a, b)(x− a)2 + 2fxy(a, b)(x− a)(y − b) + fyy(a, b)(y − b)2 ] (5.4) Example 5.26. Derive the Taylor polynomial of degree two for the function f(x, y) = e−x cos y near the point (a, b) = (0, 0). Solution: Function Value at (0, 0) f(x, y) = e−x cos y f(0, 0) = e−0 cos 0 = 1 · 1 = 1 fx(x, y) = ∂f ∂x = ∂ ∂x ( e−x cos y ) = −e−x cos y fx(0, 0) = −e−0 cos 0 = −1 fy(x, y) = ∂f ∂y = ∂ ∂y ( e−x cos y ) = −e−x sin y fy(0, 0) = −e−0 sin 0 = 0 fxx(x, y) = ∂ ∂x ( fx ) = ∂∂x ( − e−x cos y ) = e−x cos y fxx(0, 0) = e−0 cos 0 = 1 fxy(x, y) = ∂ ∂y ( fx ) = ∂∂y ( − e−x cos y ) = e−x sin y fxy(0, 0) = e−0 sin 0 = 0 fyy(x, y) = ∂ ∂y ( fy ) = ∂∂y ( − e−x sin y ) = −e−x cos y fyy(0, 0) = −e−0 cos 0 = −1 138 Collecting terms and substituting them into equation (5.4) we obtain: T2(x, y) = f(0, 0) + fx(0, 0) · (x− 0) + fy(0, 0) · (y − 0) + 1 2 fxx(0, 0) · (x− 0)2 + fxy(0, 0) · (x− 0)(y − 0) + 1 2 fyy(0, 0) · (y − 0)2 = 1− 1 · x+ 0 · y + 12 · 1 · x2 + 0 · xy − 12y2 = 1− x+ 12(x2 − y2) Lastly we can graph the surface z = f(x, y) = e−x cos y and the quadratic approximation function T2(x, y). Graph of z = e−x cos y 139 Graph of z = 1− x+ 12(x2 − y2) Looking at these two graphs we see that the Taylor polynomial of degree two, namely T2(x, y), does a good job in mimicking the shape of the surface z = f(x, y) for points (x, y) close to (0, 0). In the plane x = 1, the quadratic approximation z = T2(x, y) falls off too steeply along the y axis as we move away from y = 0, while in the plane x = −1 it falls away too slowly along the y axis as we move away from y = 0. 140 5.6.2 Exceptions: when derivatives do not exist In earlier lectures we noted that at the very least a function must be con- tinuous if it is to have a meaningful derivative. When we take successive derivatives we may need to revisit the question of continuity for each new function that we create. If a function fails to be continuous at some point then we most certainly can not take its derivative at that point. Example 5.27. Consider the function f(x) = 0 −∞ < x < 0 3x2 0 < x <∞ It is easy to see that something interesting might happen at x = 0. Its also not hard to see that the function is continuous over its whole domain, and thus we can compute its derivative everywhere, leading to df(x) dx = 0 −∞ < x < 0 6x 0 < x <∞ This too is continuous and we thus attempt to compute its derivative, d2f(x) dx2 = 0 −∞ < x < 0 6 0 < x <∞ Now we notice that this second derivative is not continuous at x = 0. We thus can not take any more derivatives at x = 0. Our chain of differentiation has come to an end. We began with a continuous function f(x) and we were able to compute only its first two derivatives over the domain x ∈ R. We say such that the function is twice differentiable over R. This is also often abbreviated by saying f is C2 over R. The symbol C reminds us that we are talking about continuity and the superscript 2 tells us how many derivatives we can apply before we encounter a non-continuous function. The clause ‘over R’ just reminds us that the domain of the function is the set of real numbers (−∞,∞). 141 We should always keep in mind that a function may only posses a finite number of derivatives before we encounter a discontinuity. The tell-tale signs to watch out for are sharp edges, holes or singularities in the graph of the function. 5.7 Stationary points 5.7.1 Finding stationary points Suppose you run a commercial business and that by some means you have constructed the following formula for the profit of one of your lines of busi- ness f = f(x, y) = 4− x2 − y2. Clearly the profit f depends on two variables x and y. Sound business prac- tice suggest that you would like to maximise your profits. In mathematical terms this means find the values of (x, y) such that f is a maximum. A simple plot of the graph of f shows us that the maximum occurs at (0, 0) (corresponding to a maximum profit of 4 units). We may not be able to do this so easily for other functions, and thus we need some systematic way of computing the points (x, y) at which f is maximised. You have seen similar problems for the case of a function of one variable. And from that you may expect that for the present problem we will be making a statement about the derivatives of f in order that we have a maximum (i.e. that the derivatives should be zero). Let’s make this precise. Let’s denote the (as yet unknown) point at which the function is a maximum by P . Now if we have a maximum at this point, then moving in any direction from this point should see the function decrease. That is the directional derivative must be non-positive in every direction from P . In other words we must have df ds = t˜ · (∇f)p ≤ 0 for every choice of t˜. Let us assume (for the moment) that (∇f)p 6= 0 thenwe should be able to compute λ > 0 so that t˜= λ (∇f)p is a unit vector. Ifyou now substitute this into the above you will find λ (∇f)p · (∇f)p ≤ 0 Look carefully at the left hand side. Each term is positive (remember a˜ · a˜is the squared length of a vector a˜) yet the right hand side is either zero or 142 negative. Thus this equation does not make sense and we have to reject our only assumption, that (∇f)p 6= 0. We have thus found that if f is to have a maximum at P then we must have 0 = (∇f)p This is a vector equation and thus each component of ∇f is zero at P , that is 0 = ∂f ∂x , and 0 = ∂f ∂y at P It is from these equations that we would then compute the (x, y) coordinates of P . Of course we could have posed the related question of finding the points at which a function is minimised. The mathematics would be much the same except for a change in words (maximum to minimum) and a corresponding change in ± signs. The end result is the same though, the gradient ∇f must vanish at P . Example 5.28. Find the points at which f = 4− x2 − y2 attains its maxi- mum. Recall that stationary points (or critical points) were found in functions of one variable by setting the derivative to zero and solving for x. We either found a local maximum, a local minimum, or an inflection point (for example f(x) = x3 at x = 0). As we have just seen, for functions of two variables, the stationary points are found similarly, and we obtain the following types: • A local minimum • A local maximum • A saddle point 143 When we solve the equations 0 = (∇f)p we might get more than one point P . What do we make of these points? Some of them might correspond to minimums while others might correspond to maximums of f , and others still may correspond to saddle points. The three options are shown in the following graphs. A typical local minimum A typical local maximum 144 A typical saddle point A typical case might consist of any number of points like the above. It is for this reason that each point is referred to as a local maxima or a local minima. 5.7.2 Notation Rather than continually having to qualify the point as corresponding to a minimum, maximum or a saddle point of f we commonly lump these into the one term local extrema. Note when we talk of minima, maxima and extrema we are talking about the (x, y) points at which the function has a local minimum, maximum or extremum respectively. 5.7.3 Maxima, Minima or Saddle point We have just seen that a function of two variables has stationary points, and thus local extrema, when ∂f ∂x = ∂f ∂y = 0. If (a, b) are the (x, y) coordinates of a stationary point for f(x, y), then we can say • A local maxima occurs when f(x, y) ≤ f(a, b) for all (x, y) close to (a, b) • A local minimum occurs when f(x, y) ≥ f(a, b) for all (x, y) close to (a, b) There is another way we can classify the extrema of a function of several variables. You should recall that for a function of one variable, f = f(x), 145 that its extrema could be characterised simply by evaluating the sign of the second derivative. In other words, for y = f(x), the extrema are where df dx = 0 Then for these values of x (where df dx = 0) we examine the second derivative. If • d 2f dx2 > 0 then this corresponds to a local minima • d 2f dx2 < 0 then this corresponds to a local maxima • d 2f dx2 = 0 then no decision can be made (eg x3 or x4). Now we want to take this idea to functions of several variables. Can we do this? Yes, but with some modifications. Without going into the details (these are covered in a different course) we state the following test. Characterising extrema - second derivative test If 0 = ∇f at a point P then, at P compute D = ∂2f ∂x2 ∂2f ∂y2 − ( ∂2f ∂x∂y )2 Using D we can now classify the stationary points P . 146 A local minima when D > 0 and ∂2f ∂x2 > 0 A local maxima when D > 0 and ∂2f ∂x2 < 0 A Saddle point when D < 0 Inconclusive when D = 0 Example 5.29. Find the local extrema of f(x, y) = x2 + y2− 2x− 6y+ 14. Example 5.30. Find the local extrema of f(x, y) = y2 − x2. 147 5.7.4 Application of extrema As a final note, we will now turn to some applications of the use of extrema. Example 5.31. We are required to build a rectangular box with a volume of 12 cubic centimetres. Since we are trying to economise on building costs, we also require the box to be made out of the smallest amount of material. What are the dimensions of the box that will satisfy these requirements? Solution: First we need to set up our equations. Let the dimensions of the box be x, y and z for the length, width and height respectively. The volume of the box is given by V = xyz = 12 The total surface area of the box is A = 2xy + 2xz + 2yz Rearranging the equation for the volume will give z = 12 xy Substituting this into our equation for the surface area will give A = 2xy + 2x 12 xy + 2y 12 xy = 2xy + 24 y + 24 x Now we have a function A = f(x, y) which we can minimise. Taking partial derivatives of A = f(x, y) with respect to x and y gives ∂A ∂x = and ∂A ∂y = Now let ∂A ∂x = 0 and ∂A ∂y = 0 and solve for both x and y. 148 Thus the dimensions of the box are x = , y = and z = . The second partial derivatives (for the above values of x and y) are ∂2A ∂x2 = ∂2A ∂y2 = ∂2A ∂x∂y = Using the Second Derivative Test, D = ∂2A ∂x2 ∂2A ∂y2 − ( ∂2A ∂x∂y ) Since D = the dimensions of the box of volume 12cm3 that uses the minimum amount of material are x = y = z = . 149 Earlier we looked at methods of finding the shortest distance from a point to a plane. We can now use extrema to answer questions such as these. Example 5.32. A plane has the equation 2x+ 3y+ z = 12. Find the point on the plane closest to the origin. Solution: To answer this question we clearly want to minimise the distance from the origin to the point on the plane. Let (x, y, z) be the point on the plane. The distance between two points is given by d = √ (x− x0)2 + (y − y0)2 + (z − z0)2 Note that if we let G = d2 and minimise G this is the same as minimising d. Since (x0, y0, z0) = (0, 0, 0) we can now write G = x2 + y2 + z2 Using the equation of the plane, where z = 12− 2x− 3y we now have G = x2 + y2 + (12− 2x− 3y)2 We now take partial derivatives of G with respect to x and y. ∂G ∂x = and ∂G ∂y = Let both ∂G ∂x = 0 and ∂G ∂y = 0 and solve for x and y (simultaneous equations is handy here). Thus x = and y = . We can substitute these into the equation of the plane to find z = . 150 The second partial derivatives, using the above values for x and y, are ∂2G ∂x2 = ∂2G ∂y2 = and ∂2G ∂x∂y = Using the Second Derivative Test, D = ∂2G ∂x2 ∂2G ∂y2 − ( ∂2G ∂x∂y ) = Thus the point on the plane that gives the minimum distance to the origin is (x, y, z) = ( ). Example 5.33. You are given three positive numbers. The product of the three numbers is P . The sum of the three numbers is 10. (a) Find the three numbers that will give a maximum product? (b) Show that this gives a maximum product. (c) If it was required that the three numbers be whole numbers, can you find the three (non-zero)numbers that sum to 10 and give a maximum product? Does your answer to (a) help you find this? 151
欢迎咨询51作业君