127x Filetype PDF File size 0.31 MB Source: projet.liris.cnrs.fr
21st International Conference on Pattern Recognition (ICPR 2012) November 11-15, 2012. Tsukuba, Japan Orthogonal Isometric Projection 1 1,2 1 1 Yali Zheng , Yuan Yan Tang , Bin Fang , Taiping Zhang 1. Chongqing University, Chongqing, China 2. University of Macau, Macau, China zhengyl, fb, tpzhang@cqu.edu.cn, yytang@umac.mo Abstract with the Graph Laplacian theory. Cai et al. [5] pro- posedavariationofLaplacianface(LPP)—Orthogonal In contrast with Isomap, which learns the low- Laplacianface (OLPP), which iteratively computed the dimension embedding, and solves problem under the orthogonal eigenvectors to compose the projection ma- classic Multi-dimension Scaling (MDS) framework, we trix. Kokiopoulou and Saad [6] analyzed and compared propose a dimensionality reduction technique, called LPPandOLPP,andproposedanOrthogonalNeighbor- Orthogonal Isometric Projection (OIP), in this paper. hood Preserving Projections (ONPP). It can be thought We consider an explicit orthogonal linear projection of as an orthogonal version of LLE, but projections are by capturing the geodesic distance, which is able to learned explicitly as a standard eigenvalue problem. handle new data straightforward, and leads to a stan- Dimensionalityreductiontechniquesiseithertoseek dard eigenvalue problem. And we extend our method to for a representation of data in low dimension to benefit Sparse Orthogonal Isometric Projection (SOIP), which the data analysis or to map data from high-dimensional can be solved efficiently using LARS. Numerical exper- space to low-dimensional space through an explicit iments are reported to demonstrate the performance of linear or nonlinear projection learned from a training OIPbycomparingwithafewcompetingmethods. dataset. Multidimensional Scaling (MDS) [1] is a clas- sic data embedding technique, and considers preserv- ing the pairwise distance to obtain the low dimension 1. Introduction configuration. Principal component analysis (PCA) can be used as a projection method, which learns a lin- Researchers are interested in reducing dimensions of ear projection by maximizing the variance of data in digital data is because of the existence of digital infor- low dimension. PCA is identical to Classical Multidi- mation redundancy. It is well known that extracting ef- mensional Scaling if euclidean distance is used [1], but ficient features from data can improve object classifi- PCA learns the projection. Linear Discriminant Anal- cation and recognition, and simplify the visualization ysis (LDA) maximizes the ratio of between-class vari- of data. Manifold learning theory [4, 3, 11, 12] was ance to the within-class variance to determine an ex- introduced into dimensionality reduction field in early plicit projection as well. 20 century, which assumed that a low-dimension mani- In this paper, we are motivated by Isomap [4] and fold is embedded in high-dimension data. Researchers Isometric projection [8], and propose a linear projec- pay lots of attention to discovery, and prove that a low- tion method, called Orthogonal Isometric Projection, dimension manifold exists. Isomap was proposed by which is a variation of Isometric Projection. Cai pro- Tenenbaum et al. [4], in which geodesic distance was posed Isometric Projection [8] addressed the same pur- used to capture the global structure in high dimension, pose as ours. However, in this paper, we constrain the and can be solved under MDS framework. Roweis and projection is orthogonal which differs from Cai’s, and Saul [3] proposed a nonlinear dimensionality reduc- solve a standard eigenvalue problem. The main differ- tion method, Locally Linear Embedding (LLE), which ence between our method and Cai’s orthogonal version aimedatpreservingthesamelocalconfigurationofeach of Isometric Projection is that: we build a reasonable neighborhood in low dimensional space as in high di- objective function, and solve the optimization in a stan- mensional space. He and Niyogi [2] found an optimal dard eigenvalue problem, while Cai solved the problem linear approximations to eigenfunctions of the nonlin- in a generalized eigenvalue problem. We extend our ear Laplacian Eigenmap, called Local Preserving Pro- method to the sparse orthogonal isometric projection. jection (LPP), and gave the justification in the paper Wetest our method on USPS data set. In the following 978-4-9906441-1-6 ©2012 IAPR 405 section wewillbrieflyintroduceCai’sIsometricProjec- Tosolvetheproblemefficientlyincomputationcost, tion to demonstrate the advantages of our algorithm. Cai also applied the regression in [8, 13] over Y and Xcalled spectral regression (SR). Y is computed first, 2. A brief review of Isomap and Isometric which is the eigenvector of τ(DG), then Projection m X T 2 2 a=argmin (a x −y ) +αkak (4) a i i 2.1 Isomap i=1 Isomap was proposed by Tenenbaum and et al. [4], TheconditionVTXXTV =I constrainedthatlow- and is one of most popular manifold learning tech- dimension embedding of points is orthogonal, namely, niques. It aims to obtain an euclidean embedding of YTY =I. points such that the geodesic distance in high dimen- sional space gets close to the euclidean distance be- 3. Orthogonal Isometric Projection tween each pair of points. The mathematical formu- lation is The main idea of orthogonal isometric projection is X 2 to seek an orthogonal mapping over the training data set min (d (x ,x )−d (y ,y )) (1) G i j E i j so as to best preserve the geodesic distance on a neigh- i,j borhood graph, and learn a linear projection under the d is the geodesic distance, which is defined locally to general Isomap framework, but has a different and rea- G sonable constraint that projections are orthogonal. betheshortestpathonthemanifold,d istheeuclidean E distance, and in a matrix form: minkτ(D )−τ(D )k (2) 3.1 TheObjectiveFunctionofOIP G E 2 D isthegeodesicdistancematrix,D istheeuclidean Under the Isomap framework, it minimizes the ob- G E jective function in Equation 2. In math τ(D ) = distance matrix, τ is an operation which converts the G euclidean distance into an inner product form. The −CWC/2, and C is the centering matrix defined by T T C = I −1/N ·e e , where e = [1,...,1] , W problem is solved under the MDS framework. Isomap n N N N N makes an assumption that a manifold existing in high is a Dijkstra distance matrix based on K nearest neigh- dimension space, and applies the geodesic distance to bor graph over all data points. Let f(V ) = kτ(DG) − XTVVTXk ,weareseekingforalinearprojection: measure the similarity of each point pair. However, if 2 insufficient samples are given or the data is noised, the minf(V) (5) intrinsic geometry of the data is difficult to be captured V byconstructing the neighborhood graph. Let S = τ(D ),whichisaknownneighborhoodgraph G 2.2 Isometric Projection constructed from the given data set. We have f(V) = tr((S−XTVVTX)T(S−XTVVTX)) Cai et al. [8] extended Isomap algorithm to learn a = tr(STS)+tr((XTVVTX)T(XTVVTX) linear projection by solving a spectral graph optimiza- tion problem. Suppose that Y = V TX, they minimized −2ST(XTVVTX)) the objective function, = tr((XTVVTX)T −2ST)(XTVVTX) minkτ(D )−XTVVTXk (3) +tr(STS) G 2 To make the problem tractable, they imposed a con- Sotheobjective function equation 5 is equivalent to straint V TXXTV = I, and rewrote the minimization problem as min tr(VTX((XTVVTX)T −2ST)XTV) s.t. VTV =I argmax tr(VTXτ(D )XTV) V G Let M = X(XTX −2ST)XT, then the problem be- s.t. VTXXTV =I comesto whichisequivalenttoageneralizedeigenvalueproblem min tr(VTMV) Xτ(D )XTV =λXXTV s.t. VTV =I G 406 Algorithm 1 Orthogonal Isometric Projection v is the solution of the regression system: p 1: Construct a neighborhood graph G over the data n points. Compute the Dijkstra distances matrix X T i i 2 v =argmin (v x −y ) , W(i,j) = dG(i,j) over the graph between every p v p point pair. d (i,j) is the shortest path of i and j, i=1 G otherwise dG(i,j) = inf. where yi is the ith element of y . Similar to Zou et 2: Compute τ(D ) = −CWC/2, C is the centering p p G al. [10], we can get the sparse solutions by adding L T 1 matrix, and C = I −1/N ·e e . n N N regularization: 3: Compute the eigenvectors of M = X(XTX − 2τ(DG)T)XT. V is determined by eigenvectors n m of M associated with p smallest eigenvalues. X T i i 2 X j v =argmin (v x −y ) +α kv k p v p p i=1 j=1 which leads to a standard eigenvalue problem, where vj is the jth element of v . The regression can p p MV =λV (6) be solved efficiently using LARS algorithm. V is determined by eigenvectors of M corresponding to 5. Experiments psmallest eigenvalues. 3.2 TheAlgorithmofOIP USPS is a well known handwritten digits corpus from US postal service. It contains normalized gray First we need to construct the neighborhood graph scale images of size 16 × 16, and totally 9298 sam- G. As all we know, there are two options: ples with 256 features. Fig. 1 shows some examples of USPS dataset. We evaluate our algorithm on USPS, 1 • if x ∈ N(x ), N(x ) is the k nearest neighbor of which are downloaded from public website , and com- j i i pare with PCA, LDA, LPP, IP, and IP+SR, and demon- x , i strate the average accuracy and average error rates in • if kx −x k2 ≤ ǫ, ǫ is a small number as a thresh- the section. A human error rate estimated to be 2.37% i j old. shows that it is a hard task over USPS dataset [9]. We the edge between i and j weighted by gaussian kernel randomlysample25timesfromthedatasetsasthetrain- −kx −x k2 ing sets with varying rates from 20% to 80%, and the is defined as e i j . rest are used for testing. We map points in test sets WesummarizethealgorithminAlg1. by projections learned from training sets, and apply the nearest neighbor to determine categories labels. We 4 SparseOrthogonalIsometricProjection also demonstrate the dimensions versus average error rate by half training and half testing. Assume that l is i Our problem essentially solves a standard optimiza- the ground truth, b is the label assigned after dimen- i tion in Equation 6, which also can be incorporated into sionality reduction by methods, Ntest is the number of a regression framework in the way of Sparse PCA [10]. test samples, The optimal V is the eigenvectors with respect to the N P maximum eigenvalues of Equation 6. Since M is a 1 X Ntestδ(l ,proj(b )) real symmetric matrix, so M can be decomposed into Acc = i=1 i i , N N ˜ ˜T ˜ j=1 test XX .SupposetherankofX isr,andSVD(X)is ˜ ˜˜˜T and N = 25 in our experiment. X=UΣV , Wecompare our method with IP, IP+SR, LDA, and ˜ LPPinTable1. Ourmethodoutperformsallothermeth- it is easy to verify that the column vectors in U are the ˜ ˜T ods. The bolds are from our method, ± shows the eigenvectors of XX . Let Y = [y ,y ,··· ,y ] , 1 2 r n×r square root of the variance. With the number of training each row vector is a sample vector in r-dimensional subspace, and V = [v ,v ,··· ,v ] . Therefore, the samples, the classification precision increases as what 1 2 r m×r weexpect. Fig. 2 shows with the number of dimensions projective functions of OIP are solved by the linear sys- tems: increases, the average classification error decreases. XTv =y ,p=1,2,··· ,r. 1http://www.zjucadcg.cn/dengcai/Data/TextData.html p p 407 Table 1. Comparison on USPS Train ratio LDA LPP IP IP+SR OIP 20% 89.13 ± 0.34 92.95 ± 0.36 92.11 ± 0.39 93.90 ± 0.50 95.10 ± 0.20 30% 90.39 ± 0.32 94.47 ± 0.30 93.61 ± 0.33 94.96 ± 0.36 95.93 ± 0.17 40% 91.18 ± 0.36 95.20 ± 0.27 94.48 ± 0.28 95.69 ± 0.36 96.40 ± 0.20 50% 91.54 ± 0.32 95.75 ± 0.29 94.85 ± 0.42 95.91 ± 0.34 96.65 ± 0.26 60% 91.97 ± 0.32 96.14 ± 0.24 95.21 ± 0.31 96.14 ± 0.35 97.01 ± 0.25 70% 92.02 ± 0.49 96.43 ± 0.33 95.61 ± 0.35 96.59 ± 0.32 97.17 ± 0.23 80% 92.19 ± 0.58 96.71 ± 0.42 95.97 ± 0.40 96.74 ± 0.40 97.35 ± 0.34 [4] J. B. Tenenbaum, V. Silva and J. C. Lang- ford, A global geometric framework for nonlin- ear dimensionality reduction, Science, 290(2000), pp. 2319–2323. [5] D. Cai, X. He, J. Han and H.J. Zhang, Orthog- onal laplacianfaces for face recognition, IEEE Transactions on Image Processing, 15(11)(2006), pp. 3608–3614. Figure 1. USPS examples [6] E. Kokiopoulou and Y. Saad, Orthogonal neigh- borhood preserving projections: a projection- 0.09 based dimensionality reduction technique, IEEE IP transactions on Pattern Analysis and Machine In- IP+SR 0.08 LDA telligence, 29(12)(2007), pp. 2143–2156. OIP LPP 0.07 [7] X. He, S. Yan, Y. Hu, P. Niyogi and H.J. Zhang, IEEE Transactions on Pattern Analysis and Ma- 0.06 chine Intelligence, 27(3)(2005), pp. 328–340. Average error rate0.05 [8] D. Cai, X. He and J. Han, Isometric projection, Proceedings of the National Conference on Artifi- 0.04 cial Intelligence, 2007, pp. 528–533. 0.03 [9] I. Chaaban and M. Scheessele, Human perfor- 20 30 40 50 60 70 80 90 100 Dimensions manceontheUSPSdatabase,2007,Techniquere- port, Indiana University South Bend. Figure 2. Dimensions vs. average classi- [10] H.Zou,T.HastieandR.Tibshirani,Sparseprinci- fication error on USPS dataset. pal component analysis, Journal of computational andgraphicalstatistics, 15(2)(2006), pp. 265–286. References [11] J.A. Lee and M. Verleysen, Nonlinear dimension- ality reduction of data manifolds with essential [1] W. S. Torgerson, Multidimensional scaling: I. loops, Neurocomputing, 67(2005), pp. 29–53. Theory and method, Psychometrika, 17(4)(1952), [12] H. Choi and S. Choi, Robust kernel isomap, Pat- pp. 401–419. tern Recognition, 40(3)(2007), pp. 853–862. [2] X. He and P. Niyogi, Locality preserving projec- [13] D. Cai, X. He and J. Han, SRDA: An efficient tions, Proceeding of Advances in Neural Informa- algorithm for large-scale discriminant analysis, tion Processing Systems, 2003, pp. 153–160. IEEE Transactions on Knowledge and Data En- [3] S.T.RoweisandL.K.Saul,Nonlineardimension- gineering, 20(1)(2008), pp. 1–12. ality reduction by locally linear embedding, Sci- ence, 290(2000), pp. 2323–2326. 408
no reviews yet
Please Login to review.