Calculus Pdf 168793 | 1ad5e235 824d 4bbf 81d6 Ffd2040e37ec

Partial capture of text on file.
                                                Notes on Matrix Calculus
                                                           Paul L. Fackler∗
                                                  North Carolina State University
                                                         September 27, 2005
                                Matrix calculus is concerned with rules for operating on functions of
                            matrices. For example, suppose that an m ×n matrix X is mapped into a
                            p×q matrix Y. We are interested in obtaining expressions for derivatives
                            such as
                                   ∂Yij ,
                                  ∂Xkl
                            for all i,j and k,l. The main diﬃculty here is keeping track of where things
                            are put. There is no reason to use subscripts; it is far better instead to use
                            a system for ordering the results using matrix operations.
                                Matrix calculus makes heavy use of the vec operator and Kronecker
                            products. The vec operator vectorizes a matrix by stacking its columns (it
                            is convention that column rather than row stacking is used). For example,
                            vectorizing the matrix
                                   1 2 
                                   3 4 
                                   5 6 
                               ∗Paul L. Fackler is an Associate Professor in the Department of Agricultural and Re-
                            source Economics at North Carolina State University. These notes are copyrighted mate-
                            rial. They may be freely copied for individual use but should be appropriated referenced
                            in published work.
                                Mail: Department of Agricultural and Resource Economics
                                      NCSU, Box 8109
                                      Raleigh NC, 27695, USA
                                e-mail: paul fackler@ncsu.edu
                             c  Web-site: http://www4.ncsu.edu/∼pfackler/
                            ° 2005, Paul L. Fackler
                                                                   1
                                                  produces
                                                              1 
                                                              3 
                                                                    
                                                              5 
                                                                    
                                                                    
                                                              2 
                                                                    
                                                              4 
                                                                 6
                                                  The Kronecker product of two matrices, A and B, where A is m×n and B
                                                  is p × q, is deﬁned as
                                                                               A B A B ... A B 
                                                                                      11             12                       1n       
                                                                               A B A B ... A B 
                                                             A⊗B= 21                                 22                       2n       ,
                                                                               ...                   . . .       . . .       . . .     
                                                                                   A B A B ... A B
                                                                                      m1              m2                      mn
                                                  which is an mp × nq matrix. There is an important relationship between
                                                  the Kronecker product and the vec operator:
                                                             vec(AXB)=(B⊤⊗A)vec(X).
                                                  This relationship is extremely useful in deriving matrix calculus results.
                                                         Another matrix operator that will prove useful is one related to the vec
                                                  operator. Deﬁne the matrix Tm,n as the matrix that transforms vec(A) into
                                                             ⊤
                                                  vec(A ):
                                                                                                  ⊤
                                                             T       vec(A) = vec(A ).
                                                               m,n
                                                  Note the size of this matrix is mn × mn. T                                                     has a number of special
                                                                                                                                         m,n
                                                  properties. The ﬁrst is clear from its deﬁnition; if T                                                       is applied to the
                                                                                                                                                       m,n
                                                  vec of an m × n matrix and then T                                            applied to the result, the original
                                                                                                                       n,m
                                                  vectorized matrix results:
                                                             T       T        vec(A) = vec(A).
                                                               n,m m,n
                                                  Thus
                                                             T       T         =I .
                                                               n,m m,n               mn
                                                  The fact that
                                                             T        =T−1
                                                               n,m            m,n
                                                  follows directly. Perhaps less obvious is that
                                                             T        =T ⊤
                                                               m,n            n,m
                                                                                                                       2
                           (also combining these results means that T    is an orthogonal matrix).
                                                                     m,n
                              The matrix operator T      is a permutation matrix, i.e., it is composed
                                                     m,n
                           of 0s and 1s, with a single 1 on each row and column. When premultiplying
                           another matrix, it simply rearranges the ordering of rows of that matrix
                           (postmultiplying by T    rearranges columns).
                                                m,n
                              The transpose matrix is also related to the Kronecker product. With A
                           and B deﬁned as above,
                                B⊗A=T (A⊗B)T .
                                           p,m         n,q
                           This can be shown by introducing an arbitrary n×q matrix C:
                                T   (A⊗B)T vec(C) = T (A⊗B)vec(C⊤)
                                 p,m          n,q             p,m
                                                                         ⊤ ⊤
                                                         = Tp,mvec(BC A )
                                                         = vec(ACB⊤)
                                                         = (B⊗A)vec(C).
                           This implies that ((B ⊗ A) − Tp,m(A ⊗ B)Tn,q)vec(C) = 0. Because C is
                           arbitrary, the desired result must hold.
                              An immediate corollary to the above result is that
                                (A⊗B)T       =T (B⊗A).
                                         n,q    m,p
                           It is also useful to note that T  =T =I . Thus, if A is 1×n then
                                                         1,m     m,1    m
                           (A⊗B)T      =(B⊗A). When working with derivatives of scalars this can
                                    n,q
                           result in considerable simpliﬁcation.
                              Turning now to calculus, deﬁne the derivative of a function mapping
                            n     m
                           ℜ →ℜ asthem×nmatrixofpartial derivatives:
                                [Df] = ∂fi(x).
                                     ij    ∂x
                                             j
                           For example, the simplest derivative is
                                dAx =A.
                                 dx
                              Using this deﬁnition, the usual rules for manipulating derivatives apply
                           naturally if one respects the rules of matrix conformability. The summation
                           rule is obvious:
                                D[αf(x)+βg(x)] = αDf(x)+βDg(x),
                                                               3
                            where α and β are scalars. The chain rule involves matrix multiplication,
                                                                                          n       m
                            which requires conformability.   Given two functions f : ℜ       → ℜ and
                                p      n
                            g : ℜ → ℜ , the derivative of the composite function is
                                 D[f(g(x))] = f′(g(x))g′(x).
                            Notice that this satisﬁes matrix multiplication conformability, whereas the
                            expression g′(x)f′(g(x)) attempts to postmultipy an n × p matrix by an
                                                                                                  ⊤
                            m×nmatrix. To deﬁne a product rule, consider the expression f(x) g(x),
                                          n     m
                            where f,g : ℜ → ℜ . The derivative is the 1×n vector given by
                                         ⊤             ⊤ ′           ⊤ ′
                                 D[f(x) g(x)] = g(x) f (x)+f(x) g (x).
                            Notice that no other way of multiplying g by f′ and f by g′ would ensure
                            conformability. A more general version of the product rule is deﬁned below.
                               The product rule leads to a useful result about quadratic functions:
                                    ⊤
                                  dx Ax       ⊤      ⊤ ⊤      ⊤        ⊤
                                    dx    =x A+x A =x (A+A ).
                                                                                     ⊤             ⊤
                            When A is symmetric this has the very natural form dx Ax/dx = 2x A.
                               These rules deﬁne derivatives for vectors. Deﬁning derivatives of matri-
                            ces with respect to matrices is accomplished by vectorizing the matrices, so
                            dA(X)/dX is the same thing as dvec(A(X))/dvec(X). This is where the
                            the relationship between the vec operator and Kronecker products is useful.
                                                       ⊤
                            Consider diﬀerentiating dx Ax with respect to A (rather than with respect
                            to x as above):
                                        ⊤            ⊤     ⊤
                                  dvec(x Ax)      d(x ⊗x )vec(A)          ⊤    ⊤
                                    dvec(A)    =       dvec(A)       =(x ⊗x )
                            (the derivative of an m × n matrix A with respect to itself is Imn).    n
                               A more general product rule can be deﬁned. Suppose that f : ℜ →
                             m×p           n      p×q                 n      m×q
                            ℜ     and g : ℜ → ℜ      , so f(x)g(x) : ℜ → ℜ       . Using the relationship
                            between the vec and Kronecker product operators
                                                            ⊤
                                 vec(I f(x)g(x)I ) = (g(x) ⊗I )vec(f(x)) = (I ⊗f(x))vec(g(x)).
                                      m           q              m                 q
                            Anatural product rule is therefore
                                                     ⊤        ′                  ′
                                 Df(x)g(x) = (g(x) ⊗I )f (x)+(I ⊗f(x))g (x).
                                                          m           q
                               This can be used to determine the derivative of dA⊤A/dA where A is
                            m×n.
                                       ⊤              ⊤              ⊤             ⊤       ⊤
                                 vec(A A) = (I ⊗A )vec(A) = (A ⊗I )vec(A ) = (A ⊗I )T                  vec(A).
                                                n                          n                    n   m,n
                                                                  4
The words contained in this file might help you see if this file matches what you are looking for:

...Notes on matrix calculus paul l fackler north carolina state university september is concerned with rules for operating functions of matrices example suppose that an m n x mapped into a p q y we are interested in obtaining expressions derivatives such as yij xkl all i j and k the main diculty here keeping track where things put there no reason to use subscripts it far better instead system ordering results using operations makes heavy vec operator kronecker products vectorizes by stacking its columns convention column rather than row used vectorizing associate professor department agricultural re source economics at these copyrighted mate rial they may be freely copied individual but should appropriated referenced published work mail resource ncsu box raleigh nc usa e edu c web site http www pfackler produces product two b dened mn which mp nq important relationship between axb this extremely useful deriving another will prove one related dene tm transforms t note size has number speci...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area