Matrix Pdf 174482 | Lecture11

Partial capture of text on file.
                                                                                                           Jim Lambers
                                                                                                          MAT419/519
                                                                                              Summer Session 2011-12
                                                                                                      Lecture 11 Notes
                             These notes correspond to Section 3.4 in the text.
                             Broyden’s Method
                             One of the drawbacks of using Newton’s Method to solve a system of nonlinear equations g(x) = 0
                             is the computational expense that must be incurred during each iteration to evaluate the partial
                             derivatives of g at x(k), and then solve a system of linear equations involving the resulting Jacobian
                             matrix. The algorithm does not facilitate the re-use of data from previous iterations, and in some
                             cases evaluation of the partial derivatives can be unnecessarily costly.
                                    Analternative is to modify Newton’s Method so that approximate partial derivatives are used,
                             since the slightly slower convergence resulting from such an approximation is oﬀset by the improved
                             eﬃciency of each iteration. However, simply replacing the analytical Jacobian matrix J (x) of g
                                                                                                                                                                                                    g
                             with a matrix consisting of ﬁnite diﬀerence approximations of the partial derivatives does not do
                             muchtoreduce the cost of each iteration, because the cost of solving the system of linear equations
                             is unchanged.
                                    However, because Jg(x) consists of the partial derivatives evaluated at an element of a conver-
                             gent sequence, intuitively Jacobian matrices from consecutive iterations are “near” one another in
                             some sense, which suggests that it should be possible to cheaply update an approximate Jacobian
                             matrix from iteration to iteration, in such a way that the inverse of the Jacobian matrix, which is
                             what is really needed during each Newton iteration, can be updated eﬃciently as well.
                                    This is the case when an n×n matrix B has the form
                                                                                                          B=A+u⊗v,
                             where u and v are given vectors in Rn, and u⊗v is the outer product of u and v, deﬁned by
                                                                                                              u1v1           u1v2         · · ·    u1vn 
                                                                                                              u2v1           u2v2         · · ·    u2vn 
                                                                               u⊗v=uvT =                                                                      .
                                                                                                              .                .                       .      
                                                                                                              .                  ..                    .      
                                                                                                                     .                                  .
                                                                                                                  u v         u v          · · ·    u v
                                                                                                                    n 1          n 2                  n n
                             This modiﬁcation of A to obtain B is called a rank-one update. This is because u⊗v has rank one,
                             since every column of u⊗v is a scalar multiple of u. To obtain B−1 from A−1, we note that if
                                                                                                                 Ax=u,
                                                                                                                        1
                then
                                  Bx=(A+u⊗v)x=Ax+uvTx=u+u(v·x)=(1+v·x)u,
                which yields
                                                      B−1u=          1       A−1u.
                                                                        −1
                                                             −11+v·A u
                Onthe other hand, if x is such that v · A      x=0,then
                                 −1                   −1         −1        T −1                   −1
                             BA x=(A+u⊗v)A x=AA x+uv A x=x+u(v·A x)=x,
                which yields
                                                               −1       −1
                                                             B x=A x.
                    This takes us to the following more general problem: given a matrix C, we wish to construct a
                matrix D such that the following conditions are satisﬁed:
                    • Dw=z,for given vectors w and z
                    • Dy=Cy,if y is orthogonal to a given vector g.
                                            −1         −1                            −1    −1               −T
                In our application, C = A      , D = B    , w = u, z = 1/(1+v·A u)A u, and g = A               v.
                    To solve this problem, we set
                                                        D=C+(z−Cw)⊗g.
                                                                       g·w
                Then, if g · y = 0, the second term in the deﬁnition of D vanishes, and we obtain Dy = Cy, but
                in computing Dw, we obtain factors of g·w in the numerator and denominator that cancel, which
                yields
                                                      Dw=Cw+(z−Cw)=z.
                    Applying this deﬁnition of D, we obtain
                                           h    1      −1       −1       i −1
                                                   −1 A u−A u ⊗v A                            −1          −1
                           B−1 =A−1+         1+v·A u                              =A−1−A (u⊗v)A .
                                                              −1                                       −1
                                                         v·A u                                1+v·A u
                This formula for the inverse of a rank-one update is known as the Sherman-Morrison Formula.
                    We now return to the problem of approximating the Jacobian of g, and eﬃciently obtaining
                its inverse, at each iterate x(k). We begin with an exact Jacobian, D = J (x(0)), and use D to
                                                                                           0     g                  0
                compute the ﬁrst iterate, x(1), using Newton’s Method as follows:
                                                (0)       −1    (0)      (1)    (0)    (0)
                                               d   =−D g(x ), x =x +d .
                                                          0
                Then, we use the approximation
                                                                  g(x )−g(x )
                                                        g′(x ) ≈     1        0 .
                                                            1        x −x
                                                                      1     0
                Generalizing this approach to a system of equations, we seek an approximation D1 to Jg(x(1)) that
                has these properties:
                                                                    2
                             • D (x(1) −x(0)) = g(x(1))−g(x(0)) (the Secant Condition)
                                   1
                             • If z · (x(1) − x(0)) = 0, then D z = J (x(0))z = D0z.
                                                                              1         g
                             It follows from previous discussion that
                                                                                                           (0)
                                                                                             y −D d
                                                                                               0        0             (0)
                                                                           D =D +                               ⊗d ,
                                                                              1        0         (0)     (0)
                                                                                               d ·d
                        where
                                                                   (0)       (1)       (0)        (0)           (1)           (0)
                                                                 d     =x −x , y =g(x )−g(x ).
                                                                                                                               (0)           (1)
                        However, it can be shown (Chapter 3, Exercise 15) that y − D d                                              =g(x ), which yields the
                                                                                                                  0        0
                        simpliﬁed formula
                                                                                                1             (1)        (0)
                                                                       D =D +                           g(x )⊗d .
                                                                          1        0        (0)     (0)
                                                                                         d ·d
                        Once we have computed D−1, we can apply the Sherman-Morrison formula to obtain
                                                                 0
                                                                                          −1        1           (1)        (0)     −1
                                                                                       D          (0)   (0) g(x     ) ⊗d          D
                                                               −1             −1         0      d ·d                                 0
                                                            D        = D −                                                           
                                                              1               0                 (0)      −1          1           (1)
                                                                                        1+d ·D                    (0)  (0) g(x      )
                                                                                                         0      d ·d
                                                                                            −1         (1)        (0)     −1
                                                                                         D        g(x )⊗d                D
                                                                     = D−1−                 0                               0
                                                                              0          (0)     (0)       (0)      −1        (1)
                                                                                       d ·d +d ·D g(x )
                                                                                                                    0
                                                                                           (0)       (0)     −1
                                                                     = D−1− (u ⊗d )D0 ,
                                                                              0          (0)      (0)        (0)
                                                                                       d ·(d +u )
                                    (0)         −1       (1)                                                                      (1)
                        where u          =D g(x ). Then, as D is an approximation to J (x ), we can obtain our next
                                                0                                  1                                         g
                        iterate x(2) as follows:
                                                                    D d(1) = −g(x(1)),               x(2) = x(1) + d(1).
                                                                       1
                             Repeating this process, we obtain the following algorithm, which is known as Broyden’s Method:
                        Choose x(0)
                        D =J (x(0))
                          0        g
                          (0)           −1       (0)
                        d     =−D g(x )
                                        0
                          (1)       (0)       (0)
                        x     =x +d
                        k = 0
                        while not converged do
                               u(k) = D−1g(x(k+1))
                                            k
                                         (k)      (k)        (k)
                               ck = d        · (d      +u )
                                  −1          −1       1     (k)       (k)     −1
                               D        =D − [u ⊗d ]D
                                 k+1          k       ck                       k
                               k = k +1
                                                                                                  3
                      d(k) = −D−1g(x(k))
                                 k
                       (k+1)     (k)     (k)
                      x      =x +d
                 end
                 NotethatitisnotnecessarytocomputeD fork ≥ 1; onlyD−1 isneeded. Itfollowsthatnosystems
                                                              k                   k
                 of linear equations need to be solved during an iteration; only matrix-vector multiplications are
                 required, thus saving an order of magnitude of computational eﬀort during each iteration compared
                 to Newton’s Method.
                 Example We consider the system of equations g(x) = 0, where
                                                                 2       2    2      
                                                                 x2+y2+z −3 
                                                   g(x,y,z) =       x +y −z−1           .
                                                                     x+y+z−3
                 We will begin with one step of Newton’s Method to solve this system of equations, with initial
                         (0)     (0)  (0)  (0)
                 guess x    =(x ,y ,z )=(1,0,1).
                     As computed in a previous example,
                                                                    2x 2y      2z 
                                                     Jg(x,y,z) =  2x 2y −1 .
                                                                       1    1     1
                 Therefore, the Newton iterate x(1) is obtained by solving the system of equations
                                                    J (x(0))(x(1) − x(0)) = −g(x(0)),
                                                      g
                                        (0)        (0)
                 or, equivalently, D d     =−g(x )where
                                     0
                                                        (0)     (0)    (0)              (1)      (0) 
                                             (0)       2x     2y      2z           (0)      x   −x
                                  D =J (x )=            (0)     (0)        ,   d    = y(1) −y(0) ,
                                    0    g             2x     2y       −1                    (1)    (0)
                                                           1      1      1                  z   −z
                                                             (0) 2     (0) 2    (0) 2     
                                                  (0)      (x   ) +(y ) +(z ) −3
                                              g(x )=          (0) 2     (0) 2    (0)      .
                                                             (x   ) +(y ) −z         −1
                                                                 (0)    (0)    (0)
                                                                x   +y +z −3
                                 (0)  (0)  (0)
                 Substituting (x    , y  , z  ) = (1,0,1) yields the system
                                                  2 0       2  x(1) −1          1 
                                                  2 0 −1            y(1)   = 1 .
                                                    1 1      1      z(1) − 1         1
                                                                     4
The words contained in this file might help you see if this file matches what you are looking for:

...Jim lambers mat summer session lecture notes these correspond to section in the text broyden s method one of drawbacks using newton solve a system nonlinear equations g x is computational expense that must be incurred during each iteration evaluate partial derivatives at k and then linear involving resulting jacobian matrix algorithm does not facilitate re use data from previous iterations some cases evaluation can unnecessarily costly analternative modify so approximate are used since slightly slower convergence such an approximation oset by improved eciency however simply replacing analytical j with consisting nite dierence approximations do muchtoreduce cost because solving unchanged jg consists evaluated element conver gent sequence intuitively matrices consecutive near another sense which suggests it should possible cheaply update way inverse what really needed updated eciently as well this case when n b has form u v where given vectors rn outer product dened uv uvn uvt modication...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area