129x Filetype PDF File size 0.61 MB Source: www.researchsquare.com
Isometric Projection with Autoencoder Ruisheng Ran ( rshran@cqnu.edu.cn ) Chongqing Normal University Qianghui Zeng ( 2021210516092@stu.cqnu.edu.cn ) Chongqing Normal University Xiaopeng Jiang ( 2021210516042@stu.cqnu.edu.cn ) Chongqing Normal University Bin Fang ( fb@cqu.edu.cn ) Chongqing University Research Article Keywords: DOI: https://doi.org/ License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Isometric Projection with Autoencoder 1* 1† 1† Ruisheng Ran , Qianghui Zeng , Xiaopeng Jiang and Bin Fang2† 1*The College of Computer and Information Science, Chongqing Normal University, , Chongqing, 401331, , China. 2The College of Computer Science, Chongqing University, , Chongqing, 400044, , China. *Corresponding author(s). E-mail(s): rshran@cqnu.edu.cn; Contributing authors: 2021210516092@stu.cqnu.edu.cn; 2021210516042@stu.cqnu.edu.cn; fb@cqu.edu.cn; †These authors contributed equally to this work. Abstract Isometric Projection (IsoP) is a linear dimensionality reduction method, which proviedes the best linear approximation to the true isometric embedding of data. However, IsoP and all its variants only consider the one-way mapping from high-dimensional space to low-dimensional space. The projected low-dimensional data may not “represent” the original sample accurately and effectively. In this paper, based on the structure of linear autoencoder, a new IsoP method called IsoP-AE (Isometric projection with autoencoder) has been proposed. In this method, the conventional projection of IsoP is viewed as the encoding stage, and the decoder is used to reconstruct the original high-dimensional data from the projected low-dimensional data. In this way, our algorithm makes the low-dimensional embedding data “represent” the original data more accurately and effectively. Experiment results on Handwritten Alphadig- its, COIL-100, Olivetti Research Laboratory (ORL) and Georgia Tech face datasets show that the proposed IsoP-AE approach provides a better representation of the data and achieves much higher recognition accuracy. Keywords: Isometric Projection, autoencoder, dimensionality reduction, manifold learning 1 2 Isometric Projection with Autoencoder 1 Introduction Curse of dimensionality [1, 2] was first proposed by mathematictian Richard Bellman when he studied dynamic programming problems, and it is used to describe a series of mathematical phenomena in high-dimensional spaces. In particular, in the field of Machine Learning (ML) [3], the curse of dimensional- ity often refers to the exponential relationship between dataset dimensionality and data size. In general, as the number of features grows, the number of sam- ples required for the machine model training algorithm increases exponentially. The difficulty of training machine learning models due to high-dimensional data is known as the “curse of dimensionality”. Dimensionalty reduction (DR) [4, 5] is one of the effective ways to solve the curse of dimensionality. Dimensionality reduction methods are gener- ally divided into linear and nonlinear [6]. Linear dimensionality reduction techniques assume that the data structure is linear. It uses a simple linear function to project high-dimensional data to low-dimensional data to obtain low-dimensional features of the data. The representative algorithms of lin- ear dimension reduction include Principal Component Analysis (PCA) [7] and Linear Discriminant Analysis (LDA) [8, 9]. Their commonality is that they all assume that the original dataset is embedded in a global linear structure. However, both PCA and LDA are linear methods, and the non-linear data will lead to poor dimensionality reduction. For many nonlinear problems, nonlinear methods have different processing methods: kernel-based [10] and manifold-based [11] dimensionality reduction methods are proposed. The kernel function-based dimensionality reduction method will project the data to a higher dimensional space to make it linearly possible, but the selection of the most critical kernel method is more difficult and can only be judged empirically. Due to the limitation of dimensionality reduction of kernel methods, manifold learning methods have appeared in front of people as another important nonlinear dimensionality reduction technology in recent years, and its representative method is Locally Linear Embedding (LLE) [12] and Isomap [13]. However, the disadvantage of nonlinear methods is that they are only defined on the training set and cannot be mapped on the test set. Therefore, nonlinear manifold linearization versions are proposed, such as Locality Preserving Projections (LPP) [14] is the linearization of Laplacian Eigenmap(LE)[15,16]andIsometricProjection(IsoP)[17]isthelinearization of Isomap. The IsoP algorithm first constructs the nearest neighbor graph of the observed data, and then computes the shortest paths for all pairs of data points in the graph. Through this process, an estimate of the global structure of the data is obtained. Then the Multi-dimensional Scaling (MDS) [18, 19] technology is used, and the mapping function is required to be linear, and the objective function of IsoP is obtained. IsoP retains the advantages of Isomap while overcoming the disadvantage of only providing embeddings for training data. Isometric Projection with Autoencoder 3 There are many ways to improve IsoP, and the effect is better than IsoP. In ML, we are often faced with high-dimensional data. In this case, the num- ber of samples is much smaller than the dimension of the samples, and the matrix singular value problem will occur when manifold learning algorithms are solved. This is the so-called small-sample-size (SSS) [20] problem, and the IsoP algorithm also faces this problem. Therawdatais usually preprocessed using PCA or Singular Value Decom- position (SVD) [21, 22], which avoids the SSS problem but also inherits the shortcomings of PCA [23]. To address this issue, other variants of IsoP have also been proposed, such as Tensor based Isometric Projection (TIsoP) [24] and Isometric Projection base on Maximal Margin Criterion (IsoP-MMC) [25]. Other improved methods of IsoP include Orthogonal Isometric Projection (OIsoP) [26] and Uncorrelated Discriminant Isometric Projections (UDIsoP) [27], of which OIsoP can be regarded as an extension of IsoP, and UDIsoP is a feature extraction method based on face recognition. According to the regular- ization method given in Ref. [28], it can be applied to the IsoP method, that is, the Regularized Isometric Projection (RIsoP) is obtained, and the Expo- nential Isometric Projection (EIsoP) can be obtained from the exponential embedding using matrix exponential given in Ref. [29]. Theidea of OIsoP is the same as IsoP, but further requires that the projec- tion matrix is orthogonal, and its constraints are different from the orthogonal projection of Cai’s projection. TIsoP is also another extension of IsoP. The algorithm uses a two-dimensional image matrix instead of a traditional one- dimensional vector, and performs SVD in the tensor space, thereby avoiding the small sample problem. However, current IsoP methods and their variants only consider one-way mapping from high-dimensional popular space to low-dimensional space. This mappingenables the embedded low-dimensional data points to preserve intrin- sic geometry of the original sample, but it may not “represent” the original sample very accurately and efficiently. In this work, based on the structure of linear autoencoder, a new IsoP method called IsoP-AE (Isometric projection with autoencoder) has been pro- posed. Specifically, under the condition of maintaining the geodesic distance information of the sample, the data points in high-dimensional manifold space are encoded into data points in low-dimensional space by using the conven- tional IsoP projection model. However, we also consider using the decoder to reconstruct the original high-dimensional data points from the embedded low- dimensional data points. That is, compared with the conventional IsoP, the new IsoP method has an additional reconstruction stage. This stage enables the embedded low-dimensional data to retain as much information as possible of the original high-dimensional data, so the embedded low-dimensional data “represent” the original samples more accurately and effectively. The rest of this paper proceeds as follows: in second section, we review the Isomap method, IsoP method and autoencoder. In third section, we propose the novel IsoP method with the encoder-decoder paradigm. In fourth section,
no reviews yet
Please Login to review.