Newton Method For System Of Nonlinear Equations 181536

Partial capture of text on file.

                                                      Jim Lambers
                                                      MAT461/561
                                                Spring Semester 2009-10
                                                    Lecture 24 Notes
               These notes correspond to Section 10.3 in the text.
               Quasi-Newton Methods
               Oneof the drawbacks of using Newton’s Method to solve a system of nonlinear equations F(x) = 0
               is the computational expense that must be incurred during each iteration to evaluate the partial
               derivatives of F at x(k), and then solve a system of linear equations involving the resulting Jacobian
               matrix. The algorithm does not facilitate the re-use of data from previous iterations, and in some
               cases evaluation of the partial derivatives can be unnecessarily costly.
                  AnalternativeistomodifyNewton’sMethodsothatapproximatepartialderivativesareused, as
               in the Secant Method for a single nonlinear equation, since the slightly slower convergence is oﬀset
               by the improved eﬃciency of each iteration. However, simply replacing the analytical Jacobian
               matrix of F with a matrix consisting of ﬁnite diﬀerence approximations of the partial derivatives
               does not do much to reduce the cost of each iteration, because the cost of solving the system of
               linear equations is unchanged.
                  However, because the Jacobian matrix consists of the partial derivatives evaluated at an element
               of a convergent sequence, intuitively Jacobian matrices from consecutive iterations are “near” one
               another in some sense, which suggests that it should be possible to cheaply update an approximate
               Jacobian matrix from iteration to iteration, in such a way that the inverse of the Jacobian matrix
               can be updated eﬃciently as well.
                  This is the case when a matrix has the form
                                                       B=A+uvT,
               where u and v are given vectors. This modiﬁcation of A to obtain B is called a rank-one update,
               since uvT, an outer product, has rank one, since every vector in the range of uvT is a scalar multiple
                                −1        −1
               of u. To obtain B   from A , we note that if
                                                          Ax=u,
               then
                                              Bx=(A+uvT)x=(1+vTx)u,
               which yields
                                                B−1u=         1      A−1u.
                                                              T −1
                                                         1+v A u
                                                             1
               Onthe other hand, if x is such that vTA−1x = 0, then
                                               BA−1x=(A+uvT)A−1x=x,
               which yields
                                                         −1      −1
                                                       B x=A x.
                   This takes us to the following more general problem: given a matrix C, we wish to construct a
               matrix D such that the following conditions are satisﬁed:
                  ∙ Dw=z,for given vectors w and z
                  ∙ Dy=Cy,if y is orthogonal to a given vector g.
                                        −1        −1                      T −1     −1             −T
               In our application, C = A  , D = B    , w = u, z = 1/(1+v A u)A u, and g = A          v.
                   To solve this problem, we set
                                                             (z−Cw)gT
                                                   D=C+ gTw .
               Then, if gTy = 0, the second term in the deﬁnition of D vanishes, and we obtain Dy = Cy, but
               in computing Dw, we obtain factors of gTw in the numerator and denominator that cancel, which
               yields
                                                 Dw=Cw+(z−Cw)=z.
                   Applying this deﬁnition of D, we obtain
                                              1      −1      −1  T −1
                              −1     −1    1+vTA−1uA u−A u v A                 −1   A−1uvTA−1
                            B =A +                     T −1                =A −           T −1 .
                                                      v A u                         1+v A u
               This formula for the inverse of a rank-one update is known as the Sherman-Morrison Formula.
                   We now return to the problem of approximating the Jacobian of F, and eﬃciently obtaining
               its inverse, at each iterate x(k). We begin with an exact Jacobian, A = J (x(0)), and use A to
                                                                                   0    F                 0
               computetheﬁrstiterate, x(1), using Newton’s Method. Then, we recall that for the Secant Method,
               we use the approximation
                                                            f(x )−f(x )
                                                   f′(x ) ≈    1       0 .
                                                       1       x −x
                                                                1    0
               Generalizing this approach to a system of equations, we seek an approximation A1 to JF(x(1) that
               has these properties:
                  ∙ A (x(1) −x(0)) = F(x(1))−F(x(0))
                      1
                        T  (1)    (0)                      (0)
                  ∙ If z (x   −x )=0,thenA z=J (x )z=A z.
                                                 1     F            0
                                                              2
                    It follows from previous discussion that
                                                                    y −A s
                                                        A =A + 1           0 1sT,
                                                          1     0      sTs      1
                                                                        1 1
                 where
                                               s =x(1)−x(0),      y =F(x(1))−F(x(0)).
                                                1                   1
                                                           −1
                 Furthermore, once we have computed A         , we have
                                                           0
                                                  −1y −A s T −1
                                                A       1   0 1s   A                       −1     T −1
                                   −1      −1     0      sTs1   1    0       −1    (s1 −A y1)s A
                                 A =A −                   1           =A +               0      1  0 .
                                   1       0             −1 y −A s           0            T −1
                                                 1+sTA         1   0 1                   s A y1
                                                      1  0      sTs                       1  0
                                                                 1 1
                 Then, as A is an approximation to J (x(1)), we can obtain our next iterate x(2) as follows:
                             1                           F
                                                  A s =−F(x(1)),       x(2) = x(1) + s .
                                                    1 2                                2
                    Repeating this process, we obtain the following algorithm, which is known as Broyden’s Method:
                 Choose x(0)
                 A0 =J (x(0))
                        F
                 s1 = −A−1F(x(0))
                          0
                 x(1) = x(0) + s1
                 k = 1
                 while not converged do
                      yk = F(x(k))−F(x(k−1))
                      w =A−1 y
                        k     k−1 k
                      c = 1/sTw
                             k   k
                      A−1 =A−1 +c(sk−wk)sTA−1
                       k       k−1                k  k−1
                                 −1     (k)
                      sk+1 = −A F(x )
                                 k
                      x(k+1) = x(k) +sk+1
                      k = k +1
                 end
                                                                                −1
                 NotethatitisnotnecessarytocomputeAk fork ≥ 1; onlyA               is needed. It follows that no systems
                                                                                k
                 of linear equations need to be solved during an iteration; only matrix-vector multiplications are
                 required, thus saving an order of magnitude of computational eﬀort during each iteration compared
                 to Newton’s Method.
                                                                    3

The words contained in this file might help you see if this file matches what you are looking for:

...Jim lambers mat spring semester lecture notes these correspond to section in the text quasi newton methods oneof drawbacks of using s method solve a system nonlinear equations f x is computational expense that must be incurred during each iteration evaluate partial derivatives at k and then linear involving resulting jacobian matrix algorithm does not facilitate re use data from previous iterations some cases evaluation can unnecessarily costly analternativeistomodifynewton smethodsothatapproximatepartialderivativesareused as secant for single equation since slightly slower convergence oset by improved eciency however simply replacing analytical with consisting nite dierence approximations do much reduce cost because solving unchanged consists evaluated an element convergent sequence intuitively matrices consecutive are near one another sense which suggests it should possible cheaply update approximate such way inverse updated eciently well this case when has form b uvt where u v given...

Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area