0480 Item Download 2023-01-28 12-10-04

Partial capture of text on file.
              21st International Conference on Pattern Recognition (ICPR 2012)
              November 11-15, 2012. Tsukuba, Japan
                                                    Orthogonal Isometric Projection
                                                       1                    1,2            1                   1
                                          Yali Zheng , Yuan Yan Tang           , Bin Fang , Taiping Zhang
                           1. Chongqing University, Chongqing, China 2. University of Macau, Macau, China
                                              zhengyl, fb, tpzhang@cqu.edu.cn, yytang@umac.mo
                                         Abstract                               with the Graph Laplacian theory. Cai et al. [5] pro-
                                                                                posedavariationofLaplacianface(LPP)—Orthogonal
                     In contrast with Isomap, which learns the low-             Laplacianface (OLPP), which iteratively computed the
                  dimension embedding, and solves problem under the             orthogonal eigenvectors to compose the projection ma-
                  classic Multi-dimension Scaling (MDS) framework, we           trix. Kokiopoulou and Saad [6] analyzed and compared
                  propose a dimensionality reduction technique, called          LPPandOLPP,andproposedanOrthogonalNeighbor-
                  Orthogonal Isometric Projection (OIP), in this paper.         hood Preserving Projections (ONPP). It can be thought
                  We consider an explicit orthogonal linear projection          of as an orthogonal version of LLE, but projections are
                  by capturing the geodesic distance, which is able to          learned explicitly as a standard eigenvalue problem.
                  handle new data straightforward, and leads to a stan-            Dimensionalityreductiontechniquesiseithertoseek
                  dard eigenvalue problem. And we extend our method to          for a representation of data in low dimension to beneﬁt
                  Sparse Orthogonal Isometric Projection (SOIP), which          the data analysis or to map data from high-dimensional
                  can be solved efﬁciently using LARS. Numerical exper-         space to low-dimensional space through an explicit
                  iments are reported to demonstrate the performance of         linear or nonlinear projection learned from a training
                  OIPbycomparingwithafewcompetingmethods.                       dataset. Multidimensional Scaling (MDS) [1] is a clas-
                                                                                sic data embedding technique, and considers preserv-
                                                                                ing the pairwise distance to obtain the low dimension
                  1. Introduction                                               conﬁguration. Principal component analysis (PCA) can
                                                                                be used as a projection method, which learns a lin-
                     Researchers are interested in reducing dimensions of       ear projection by maximizing the variance of data in
                  digital data is because of the existence of digital infor-    low dimension. PCA is identical to Classical Multidi-
                  mation redundancy. It is well known that extracting ef-       mensional Scaling if euclidean distance is used [1], but
                  ﬁcient features from data can improve object classiﬁ-         PCA learns the projection. Linear Discriminant Anal-
                  cation and recognition, and simplify the visualization        ysis (LDA) maximizes the ratio of between-class vari-
                  of data. Manifold learning theory [4, 3, 11, 12] was          ance to the within-class variance to determine an ex-
                  introduced into dimensionality reduction ﬁeld in early        plicit projection as well.
                  20 century, which assumed that a low-dimension mani-             In this paper, we are motivated by Isomap [4] and
                  fold is embedded in high-dimension data. Researchers          Isometric projection [8], and propose a linear projec-
                  pay lots of attention to discovery, and prove that a low-     tion method, called Orthogonal Isometric Projection,
                  dimension manifold exists. Isomap was proposed by             which is a variation of Isometric Projection. Cai pro-
                  Tenenbaum et al. [4], in which geodesic distance was          posed Isometric Projection [8] addressed the same pur-
                  used to capture the global structure in high dimension,       pose as ours. However, in this paper, we constrain the
                  and can be solved under MDS framework. Roweis and             projection is orthogonal which differs from Cai’s, and
                  Saul [3] proposed a nonlinear dimensionality reduc-           solve a standard eigenvalue problem. The main differ-
                  tion method, Locally Linear Embedding (LLE), which            ence between our method and Cai’s orthogonal version
                  aimedatpreservingthesamelocalconﬁgurationofeach               of Isometric Projection is that: we build a reasonable
                  neighborhood in low dimensional space as in high di-          objective function, and solve the optimization in a stan-
                  mensional space. He and Niyogi [2] found an optimal           dard eigenvalue problem, while Cai solved the problem
                  linear approximations to eigenfunctions of the nonlin-        in a generalized eigenvalue problem. We extend our
                  ear Laplacian Eigenmap, called Local Preserving Pro-          method to the sparse orthogonal isometric projection.
                  jection (LPP), and gave the justiﬁcation in the paper         Wetest our method on USPS data set. In the following
              978-4-9906441-1-6 ©2012 IAPR                                 405
                  section wewillbrieﬂyintroduceCai’sIsometricProjec-            Tosolvetheproblemefﬁcientlyincomputationcost,
                  tion to demonstrate the advantages of our algorithm.       Cai also applied the regression in [8, 13] over Y and
                                                                             Xcalled spectral regression (SR). Y is computed ﬁrst,
                  2. A brief review of Isomap and Isometric                  which is the eigenvector of τ(DG), then
                      Projection                                                                  m
                                                                                                 X T             2        2
                                                                                     a=argmin        (a x −y ) +αkak           (4)
                                                                                              a           i    i
                  2.1   Isomap                                                                   i=1
                     Isomap was proposed by Tenenbaum and et al. [4],           TheconditionVTXXTV =I constrainedthatlow-
                  and is one of most popular manifold learning tech-         dimension embedding of points is orthogonal, namely,
                  niques. It aims to obtain an euclidean embedding of        YTY =I.
                  points such that the geodesic distance in high dimen-
                  sional space gets close to the euclidean distance be-      3. Orthogonal Isometric Projection
                  tween each pair of points. The mathematical formu-
                  lation is                                                     The main idea of orthogonal isometric projection is
                                X                           2                to seek an orthogonal mapping over the training data set
                           min     (d (x ,x )−d (y ,y ))            (1)
                                     G i j         E i j                     so as to best preserve the geodesic distance on a neigh-
                                i,j                                          borhood graph, and learn a linear projection under the
                  d is the geodesic distance, which is deﬁned locally to     general Isomap framework, but has a different and rea-
                   G                                                         sonable constraint that projections are orthogonal.
                  betheshortestpathonthemanifold,d istheeuclidean
                                                      E
                  distance, and in a matrix form:
                                minkτ(D )−τ(D )k                    (2)      3.1   TheObjectiveFunctionofOIP
                                          G         E 2
                  D isthegeodesicdistancematrix,D istheeuclidean                Under the Isomap framework, it minimizes the ob-
                    G                                 E                      jective function in Equation 2.   In math τ(D ) =
                  distance matrix, τ is an operation which converts the                                                     G
                  euclidean distance into an inner product form.   The       −CWC/2, and C is the centering matrix deﬁned by
                                                                                                   T                        T
                                                                             C = I −1/N ·e e , where e           = [1,...,1] , W
                  problem is solved under the MDS framework. Isomap                 n            N N           N            N
                  makes an assumption that a manifold existing in high       is a Dijkstra distance matrix based on K nearest neigh-
                  dimension space, and applies the geodesic distance to      bor graph over all data points. Let f(V ) = kτ(DG) −
                                                                             XTVVTXk ,weareseekingforalinearprojection:
                  measure the similarity of each point pair. However, if                  2
                  insufﬁcient samples are given or the data is noised, the                         minf(V)                     (5)
                  intrinsic geometry of the data is difﬁcult to be captured                         V
                  byconstructing the neighborhood graph.                     Let S = τ(D ),whichisaknownneighborhoodgraph
                                                                                          G
                  2.2   Isometric Projection                                 constructed from the given data set. We have
                                                                             f(V) = tr((S−XTVVTX)T(S−XTVVTX))
                     Cai et al. [8] extended Isomap algorithm to learn a             = tr(STS)+tr((XTVVTX)T(XTVVTX)
                  linear projection by solving a spectral graph optimiza-
                  tion problem. Suppose that Y = V TX, they minimized                    −2ST(XTVVTX))
                  the objective function,                                            = tr((XTVVTX)T −2ST)(XTVVTX)
                              minkτ(D )−XTVVTXk                     (3)                  +tr(STS)
                                        G                 2
                  To make the problem tractable, they imposed a con-         Sotheobjective function equation 5 is equivalent to
                  straint V TXXTV = I, and rewrote the minimization
                  problem as                                                    min tr(VTX((XTVVTX)T −2ST)XTV)
                                                                                 s.t.                VTV =I
                           argmax tr(VTXτ(D )XTV)
                                V                 G
                                                                             Let M = X(XTX −2ST)XT, then the problem be-
                                s.t.     VTXXTV =I                           comesto
                  whichisequivalenttoageneralizedeigenvalueproblem                           min tr(VTMV)
                               Xτ(D )XTV =λXXTV                                                s.t.  VTV =I
                                     G
                                                                        406
                   Algorithm 1 Orthogonal Isometric Projection                      v is the solution of the regression system:
                                                                                     p
                     1: Construct a neighborhood graph G over the data
                                                                                                              n
                        points.   Compute the Dijkstra distances matrix                                     X T i          i 2
                                                                                             v =argmin          (v x −y ) ,
                        W(i,j) = dG(i,j) over the graph between every                          p         v                 p
                        point pair. d (i,j) is the shortest path of i and j,                                i=1
                                     G
                        otherwise dG(i,j) = inf.                                    where yi is the ith element of y . Similar to Zou et
                     2: Compute τ(D ) = −CWC/2, C is the centering                           p                         p
                                       G                                            al. [10], we can get the sparse solutions by adding L
                                                           T                                                                                 1
                        matrix, and C = I −1/N ·e e .
                                           n           N N                          regularization:
                     3: Compute the eigenvectors of M = X(XTX −
                        2τ(DG)T)XT. V is determined by eigenvectors                                    n                      m
                        of M associated with p smallest eigenvalues.                                 X T i          i 2      X j
                                                                                       v =argmin         (v x −y ) +α            kv k
                                                                                        p         v                 p              p
                                                                                                      i=1                    j=1
                   which leads to a standard eigenvalue problem,                    where vj is the jth element of v . The regression can
                                                                                            p                         p
                                          MV =λV                          (6)       be solved efﬁciently using LARS algorithm.
                   V is determined by eigenvectors of M corresponding to            5. Experiments
                   psmallest eigenvalues.
                   3.2    TheAlgorithmofOIP                                            USPS is a well known handwritten digits corpus
                                                                                    from US postal service. It contains normalized gray
                      First we need to construct the neighborhood graph             scale images of size 16 × 16, and totally 9298 sam-
                   G. As all we know, there are two options:                        ples with 256 features. Fig. 1 shows some examples
                                                                                    of USPS dataset. We evaluate our algorithm on USPS,
                                                                                                                                 1
                      • if x ∈ N(x ), N(x ) is the k nearest neighbor of            which are downloaded from public website , and com-
                            j        i       i                                      pare with PCA, LDA, LPP, IP, and IP+SR, and demon-
                         x ,
                          i                                                         strate the average accuracy and average error rates in
                      • if kx −x k2 ≤ ǫ, ǫ is a small number as a thresh-           the section. A human error rate estimated to be 2.37%
                              i    j
                         old.                                                       shows that it is a hard task over USPS dataset [9]. We
                   the edge between i and j weighted by gaussian kernel             randomlysample25timesfromthedatasetsasthetrain-
                                  −kx −x k2                                         ing sets with varying rates from 20% to 80%, and the
                   is deﬁned as e     i   j  .                                      rest are used for testing. We map points in test sets
                      WesummarizethealgorithminAlg1.                                by projections learned from training sets, and apply the
                                                                                    nearest neighbor to determine categories labels.      We
                   4    SparseOrthogonalIsometricProjection                         also demonstrate the dimensions versus average error
                                                                                    rate by half training and half testing. Assume that l is
                                                                                                                                          i
                      Our problem essentially solves a standard optimiza-           the ground truth, b is the label assigned after dimen-
                                                                                                        i
                   tion in Equation 6, which also can be incorporated into          sionality reduction by methods, Ntest is the number of
                   a regression framework in the way of Sparse PCA [10].            test samples,
                   The optimal V is the eigenvectors with respect to the
                                                                                                         N P
                   maximum eigenvalues of Equation 6. Since M is a                                   1 X Ntestδ(l ,proj(b ))
                   real symmetric matrix, so M can be decomposed into                        Acc =             i=1      i        i  ,
                                                                                                     N               N
                    ˜ ˜T                                            ˜                                   j=1            test
                   XX .SupposetherankofX isr,andSVD(X)is
                                          ˜    ˜˜˜T                                 and N = 25 in our experiment.
                                         X=UΣV ,                                       Wecompare our method with IP, IP+SR, LDA, and
                                                                    ˜               LPPinTable1. Ourmethodoutperformsallothermeth-
                   it is easy to verify that the column vectors in U are the
                                     ˜ ˜T                                           ods.   The bolds are from our method, ± shows the
                   eigenvectors of XX . Let Y = [y ,y ,··· ,y ]             ,
                                                          1   2       r n×r         square root of the variance. With the number of training
                   each row vector is a sample vector in r-dimensional
                   subspace, and V = [v ,v ,··· ,v ]         . Therefore, the       samples, the classiﬁcation precision increases as what
                                          1   2       r m×r                         weexpect. Fig. 2 shows with the number of dimensions
                   projective functions of OIP are solved by the linear sys-
                   tems:                                                            increases, the average classiﬁcation error decreases.
                              XTv =y ,p=1,2,··· ,r.                                    1http://www.zjucadcg.cn/dengcai/Data/TextData.html
                                    p     p
                                                                              407
                                                            Table 1. Comparison on USPS
                                Train ratio      LDA              LPP              IP            IP+SR             OIP
                                   20%       89.13 ± 0.34     92.95 ± 0.36    92.11 ± 0.39    93.90 ± 0.50     95.10 ± 0.20
                                   30%       90.39 ± 0.32     94.47 ± 0.30    93.61 ± 0.33    94.96 ± 0.36     95.93 ± 0.17
                                   40%       91.18 ± 0.36     95.20 ± 0.27    94.48 ± 0.28    95.69 ± 0.36     96.40 ± 0.20
                                   50%       91.54 ± 0.32     95.75 ± 0.29    94.85 ± 0.42    95.91 ± 0.34     96.65 ± 0.26
                                   60%       91.97 ± 0.32     96.14 ± 0.24    95.21 ± 0.31    96.14 ± 0.35     97.01 ± 0.25
                                   70%       92.02 ± 0.49     96.43 ± 0.33    95.61 ± 0.35    96.59 ± 0.32     97.17 ± 0.23
                                   80%       92.19 ± 0.58     96.71 ± 0.42    95.97 ± 0.40    96.74 ± 0.40     97.35 ± 0.34
                                                                                   [4] J. B. Tenenbaum, V. Silva and J. C. Lang-
                                                                                       ford, A global geometric framework for nonlin-
                                                                                       ear dimensionality reduction, Science, 290(2000),
                                                                                       pp. 2319–2323.
                                                                                   [5] D. Cai, X. He, J. Han and H.J. Zhang, Orthog-
                                                                                       onal laplacianfaces for face recognition, IEEE
                                                                                       Transactions on Image Processing, 15(11)(2006),
                                                                                       pp. 3608–3614.
                                Figure 1. USPS examples                            [6] E. Kokiopoulou and Y. Saad, Orthogonal neigh-
                                                                                       borhood preserving projections:      a projection-
                      0.09                                                             based dimensionality reduction technique, IEEE
                                                                    IP                 transactions on Pattern Analysis and Machine In-
                                                                    IP+SR
                      0.08                                          LDA                telligence, 29(12)(2007), pp. 2143–2156.
                                                                    OIP
                                                                    LPP
                      0.07                                                         [7] X. He, S. Yan, Y. Hu, P. Niyogi and H.J. Zhang,
                                                                                       IEEE Transactions on Pattern Analysis and Ma-
                      0.06                                                             chine Intelligence, 27(3)(2005), pp. 328–340.
                      Average error rate0.05                                       [8] D. Cai, X. He and J. Han, Isometric projection,
                                                                                       Proceedings of the National Conference on Artiﬁ-
                      0.04                                                             cial Intelligence, 2007, pp. 528–533.
                      0.03                                                         [9] I. Chaaban and M. Scheessele, Human perfor-
                        20    30    40    50    60    70    80   90    100
                                             Dimensions                                manceontheUSPSdatabase,2007,Techniquere-
                                                                                       port, Indiana University South Bend.
                      Figure 2. Dimensions vs. average classi-                   [10] H.Zou,T.HastieandR.Tibshirani,Sparseprinci-
                      ﬁcation error on USPS dataset.                                   pal component analysis, Journal of computational
                                                                                       andgraphicalstatistics, 15(2)(2006), pp. 265–286.
                   References                                                    [11] J.A. Lee and M. Verleysen, Nonlinear dimension-
                                                                                       ality reduction of data manifolds with essential
                    [1] W. S. Torgerson, Multidimensional scaling: I.                  loops, Neurocomputing, 67(2005), pp. 29–53.
                        Theory and method, Psychometrika, 17(4)(1952),           [12] H. Choi and S. Choi, Robust kernel isomap, Pat-
                        pp. 401–419.                                                   tern Recognition, 40(3)(2007), pp. 853–862.
                    [2] X. He and P. Niyogi, Locality preserving projec-         [13] D. Cai, X. He and J. Han, SRDA: An efﬁcient
                        tions, Proceeding of Advances in Neural Informa-               algorithm for large-scale discriminant analysis,
                        tion Processing Systems, 2003, pp. 153–160.                    IEEE Transactions on Knowledge and Data En-
                    [3] S.T.RoweisandL.K.Saul,Nonlineardimension-                      gineering, 20(1)(2008), pp. 1–12.
                        ality reduction by locally linear embedding, Sci-
                        ence, 290(2000), pp. 2323–2326.
                                                                            408
The words contained in this file might help you see if this file matches what you are looking for:

...St international conference on pattern recognition icpr november tsukuba japan orthogonal isometric projection yali zheng yuan yan tang bin fang taiping zhang chongqing university china of macau zhengyl fb tpzhang cqu edu cn yytang umac mo abstract with the graph laplacian theory cai et al pro posedavariationoflaplacianface lpp in contrast isomap which learns low laplacianface olpp iteratively computed dimension embedding and solves problem under eigenvectors to compose ma classic multi scaling mds framework we trix kokiopoulou saad analyzed compared propose a dimensionality reduction technique called lppandolpp andproposedanorthogonalneighbor oip this paper hood preserving projections onpp it can be thought consider an explicit linear as version lle but are by capturing geodesic distance is able learned explicitly standard eigenvalue handle new data straightforward leads stan dimensionalityreductiontechniquesiseithertoseek dard extend our method for representation benet sparse soip an...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area