Binary Codes Pdf 196302 | 40753 Item Download 2023-02-07 09-35-02

Partial capture of text on file.
                                      Learning Binary Codes for High-Dimensional Data Using Bilinear Projections
                                                 Yunchao Gong                                                                    Sanjiv Kumar                                                        HenryA.Rowley                                                                Svetlana Lazebnik
                                              UNCChapelHill                                                                 Google Research                                                          Google Research                                                          University of Illinois
                                         yunchao@cs.unc.edu                                                            sanjivk@google.com                                                              har@google.com                                                     slazebni@illinois.edu
                                                                                                                                                                                                                                                                                    5
                                                                                                                                                                                                                                                                                                                                                        
                                                                                                                                                                                                                                                                                 10
                                                                                                                                                                                                          1
                                                                                                                                                                                                                                                                                            ITQ (feasible)
                                                                                             Abstract                                                                                                            ITQ
                                                                                                                                                                                                                                                                                            ITQ (unfeasible)
                                                                                                                                                                                                                 Proposed
                                                                                                                                                                                                                                                                                            Proposed
                                                                                                                                                                                                        0.8
                                        Recent advances in visual recognition indicate that to                                                                                                                                                                                      3
                                                                                                                                                                                                                                                                                 10
                                achievegoodretrievalandclassiﬁcationaccuracyonlarge-                                                                                                                    0.6
                                scale datasets like ImageNet, extremely high-dimensional                                                                                                                                                                                            1
                                                                                                                                                                                                                                                                                 10
                                visual descriptors, e.g., Fisher Vectors, are needed.                                                                                          We                     Recall@500.4
                                                                                                                                                                                                                                                                                memory (MB) for projections
                                present a novel method for converting such descriptors to                                                                                                               0.2
                                                                                                                                                                                                                                                                                    −1
                                compact similarity-preserving binary codes that exploits                                                                                                                                                                                         10
                                                                                                                                                                                                             
                                                                                                                                                                                                          0                                                                            
                                                                                                                                                                                                              16        64       256      1000     4000     16000 64000
                                                                                                                                                                                                                                                                                        16        64       256      1000     4000     16000 64000
                                their natural matrix structure to reduce their dimensional-                                                                                                                                            code size
                                                                                                                                                                                                                                                                                                                 code size
                                ity using compact bilinear projections instead of a single                                                                                                                           (a) NN search recall.                                                 (b) Storage requirements.
                                large projection matrix. This method achieves compara-                                                                                                               Figure 1. Comparison of our proposed binary coding method with
                                ble retrieval and classiﬁcation accuracy to the original de-                                                                                                         state-of-the-art ITQ method [6] for retrieval on 50K Flickr images
                                scriptors and to the state-of-the-art Product Quantization                                                                                                           (1Kqueries) with 64,000-dimensional VLAD descriptors. (a) Re-
                                approachwhilehavingordersofmagnitudefastercodegen-                                                                                                                   call of 10NN from the original feature space at top 50 retrieved
                                eration time and smaller memory footprint.                                                                                                                           points. (b) Storage requirements for projection matrices needed
                                                                                                                                                                                                     for coding (note the logarithmic scale of the vertical axis). The
                                                                                                                                                                                                     dashed curve for ITQ is an estimate, since running ITQ for more
                                1. Introduction                                                                                                                                                      than 4K bits is too expensive on our system.                                                                      While ITQ has
                                                                                                                                                                                                     slightly higher accuracy than our method for shorter code sizes,
                                        Today, image search and recognition are being per-                                                                                                           larger code sizes are needed to get the highest absolute accuracy,
                                formed on ever larger databases.                                                             For example, the Im-                                                    and ITQ cannot scale to them due to memory limitations. By con-
                                ageNet database [2] currently contains around 15M im-                                                                                                                trast, our method can be used to efﬁciently compute very high-
                                ages and 20K categories. To achieve high retrieval and                                                                                                               dimensional codes that achieve comparable accuracy to the origi-
                                classiﬁcation accuracy on such datasets, it is necessary to                                                                                                          nal descriptor.
                                use extremely high-dimensional descriptors such as Fisher
                                Vectors (FV) [23, 24, 28], Vectors of Locally Aggregated                                                                                                             sizes of 16K-64K bits. Thus, our goal is to convert very
                                Descriptors (VLAD) [14], or Locality Constrained Linear                                                                                                              high-dimensional real vectors to long binary strings. To do
                                Codes(LLC)[32]. Ourgoalistoconvertsuchdescriptorsto                                                                                                                  so, wemustovercomesigniﬁcantcomputationalchallenges.
                                binary codes in order to improve the efﬁciency of retrieval
                                and classiﬁcation without sacriﬁcing accuracy.                                                                                                                              Acommon step of many binary coding methods is lin-
                                        There is a lot of recent work on learning similarity-                                                                                                        ear projection of the data (e.g., PCA or a Gaussian ran-
                                preserving binary codes [6, 9, 10, 11, 15, 18, 19, 20, 21,                                                                                                           dom projection). When the dimensionality of both the in-
                                27, 30, 31, 33], but most of it is focused on relatively                                                                                                             put feature vector and the resulting binary string are sufﬁ-
                                low-dimensional descriptors such as GIST [22], which are                                                                                                             ciently high, the storage and computation requirements for
                                not sufﬁcient for state-of-the-art applications. By contrast,                                                                                                        even a random projection matrix become extreme (Figure
                                we are interested in descriptors with tens or hundreds of                                                                                                            1(b)). For example, a 64K×64K random projection ma-
                                thousands of dimensions. Perronnin et al. [24, 28] have                                                                                                              trix takes roughly 16GB of memory and the projection step
                                found that to preserve discriminative power, binary codes                                                                                                            takes more than one second. Methods that require the pro-
                                for such descriptors must have a very large number of                                                                                                                jection matrix to be learned, such as Iterative Quantization
                                bits. Indeed, Figure 1(a) shows that compressing a 64K-                                                                                                              (ITQ) [6] become infeasible even more quickly since their
                                dimensional descriptor to 100-1,000 bits (typical code sizes                                                                                                         training time scales cubically in the number of bits.
                                in most similarity-preserving coding papers) results in an                                                                                                                  Thereareafewworksoncompressinghigh-dimensional
                                unacceptable loss of retrieval accuracy compared to code                                                                                                             descriptors such as FV and VLAD. Perronnin et al. [24]
                have investigated a few basic methods including thresh-                       2.1. Random Bilinear Binary Codes
                olding, Locality Sensitive Hashing (LSH) [1], and Spec-                          To convert a descriptor x ∈ Rd to a d-dimensional bi-
                tral Hashing (SH) [33]. A more powerful method, Prod-                         nary string, we ﬁrst consider the framework of [1, 6] that
                uct Quantization (PQ) [13], has produced state-of-the-art                     applies a random rotation R ∈ Rd×d to x:
                results for compressing FV for large-scale image classiﬁca-
                tion [28]. However, descriptors like FV and VLAD require
                arandomrotationtobalancetheirvariancepriortoPQ[14],                                                  H(x)=sgn(RTx).                               (2)
                and as explained above, rotation becomes quite expensive
                for high dimensions.                                                          Since x can be represented as a matrix X ∈ Rd1×d2, to
                   In this paper, we propose a bilinear method that takes                     make rotation more efﬁcient, we propose a bilinear formu-
                advantage of the natural two-dimensional structure of de-                     lation using two random orthogonal matrices R1 ∈ Rd1×d1
                scriptors such as FV, VLAD, and LLC and uses two smaller                      and R2 ∈ Rd2×d2:
                matrices to implicitly represent one big rotation or projec-                                                                  
                tion matrix. Thismethodisinspiredbybilinearmodelsused                                      H(X)=vec sgn(RTXR ) ,                                  (3)
                for other applications [26, 29, 34]. We begin with a method                                                          1      2
                based on random bilinear projections and then show how to
                efﬁciently learn the projections from data. We demonstrate                    where vec(·) denotes column-wise concatenation.
                the promise of our method, dubbed bilinear projection-                           It is easy to show that applying a bilinear rotation to X ∈
                                                                                                d ×d
                based binary codes (BPBC), through experiments on two                         R 1     2 is equivalent to applying a d d × d d rotation to
                                                                                                                                          1 2       1 2
                                                                                                                                        ˆ
                large-scale datasets and two descriptors, VLAD and LLC.                       vec(X). This rotation is given by R = R ⊗ R , where ⊗
                                                                                                                                                2       1
                For most scenarios we consider, BPBC produces little or no                    denotes the Kronecker product:
                degradation in performance compared to the original con-
                tinuous descriptors; furthermore, it matches the accuracy of                              T                T      T                 ˆT
                                                                                                  vec(R XR2)=(R ⊗R )vec(X)=R vec(X)
                PQcodes while having much lower running time and stor-                                    1                2      1
                age requirements for code generation.                                         follows from the properties of the Kronecker product [16].
                2. Bilinear Binary Codes                                                      Another basic property of the Kronecker product is that if
                                                                                              R and R are orthogonal, then R ⊗ R is orthogonal as
                                                                                                1         2                             2       1
                   Mosthigh-dimensionaldescriptors have a natural matrix                      well [16]. Thus, a bilinear rotation is simply a special case
                                                                                                                                                           ˆ
                or tensor structure. For example, a HOG descriptor is a                       of a full rotation, such that the full rotation matrix R can be
                                                                                              reconstructed from two smaller matrices R and R .
                two-dimensional grid of histograms, and this structure has                                                                        1        2
                been exploited for object detection [26]. A Fisher Vector                        While the degree of freedom of our bilinear rotation is
                can be represented as a k × 2l matrix, where k is the visual                  more restricted than a full rotation, the projection matri-
                vocabulary size and l is the dimensionality of the local im-                  ces are much smaller, and the projection speed is much
                age features (the most common choice is SIFT with l=128).                     faster. In terms of time complexity, performing a full ro-
                                                                                              tation on x takes O((d d )2) time, while our approach is
                VLAD,whichcanbeseenasasimpliﬁedversionofFV,can                                                            1 2
                                                                                                   2            2
                                                                                              O(d d + d d ). In terms of space for projections, full
                berepresented as a k×l matrix. Finally, an LLC descriptor                          1 2       1 2
                                                                                              rotation takes O((d d )2), and our approach only takes
                with s spatial bins can be represented as a k × s matrix.                                             1 2
                                                                                                   2      2
                                                                                              O(d + d ). For example, as will be shown in Section
                   Let x ∈ Rd denote our descriptor vector. Based on the                           1      2
                structureandinterpretationofthedescriptor,1 wereorganize                      3.4, for a 64K-dimensional vector, a full rotation will take
                it into a d × d matrix with d = d d :                                         roughly 16GB of RAM, while the bilinear rotations only
                           1      2                       1 2                                 take 1MBofRAM.Theprojectiontimeforafullrotationis
                                    d d ×1                   d ×d                             morethanasecond, vs. only 3 ms for bilinear rotations.
                            x∈R1 2             7→ X ∈R 1 2.                         (1)
                We assume that each vector x ∈ Rd is zero-centered and                        2.2. Learning Bilinear Binary Codes
                has unit norm, as L normalization is a widely used prepro-
                                       2                                                         In this section, we present a method for learning the ro-
                cessing step that usually improves performance [25].                          tations R and R that is inspired by two-sided Procrustes
                   We will ﬁrst introduce a randomized method to obtain                                  1         2
                d-bit bilinear codes in Section 2.1 and then explain how                      analysis [29] and builds on our earlier work [5, 6, 7].
                                                                                                                                                       ˆ
                to learn data-dependent codes in Section 2.2. Learning of                        Following [6], we want to ﬁnd a rotation R such that
                                                                                                                                                          ˆT
                                                                                              the angle θ between a rotated feature vector R x                     =
                reduced-dimension codes will be discussed in Section 2.3.                                   i                                                   i
                                                                                              vec(RTX R ) and its binary encoding (geometrically, the
                                                                                                      1   i  2
                   1                                                                                                                                      ˆT
                    We have also tried to randomly reorganize descriptors into matrices       nearest vertex of the binary hypercube), sgn(R x) =
                but found it to produce inferior performance.                                 vec(sgn(RTX R )), is minimized.                 Given N training
                                                                                                           1   i   2
                                                                               
                                                                                                                                                                                                                                                                      
                                                                                                                                                                                                        0.02
                                                                                  0.05
                                                                                                                                                                                                        0.015
                                                                                  0.04
                                                                                                                                                                                                        0.01
                                                                                  0.03
                                                                                                                                                                                                        0.005
                                                                                  0.02
                                                                                                                                                                                                        0
                                                                                  0.01
                                                                                                                                                                                                        −0.005
                                                                                  0
                                                                                           SIFT dimension                                                                                               −0.01
                               SIFT dimensions                                    −0.01                                                              SIFT dimensions                                             SIFT dimensions
                                                                                                                                                                                                        −0.015
                                                                                  −0.02
                                                                                                                                                                                                        −0.02
                                                                                  −0.03
                                                                                                                                                                                                        −0.025
                                 
                                                                                                                                                                                                                   
                                                                                                              visual codewords
                                               visual codewords                                                                                                      visual codewords                                               visual codewords
                                 (a) Original VLAD descriptor                                    (b) Original binary code                               (c) Bilinearly rotated VLAD                            (d) Bilinearly rotated binary code
                         Figure 2. Visualization of the VLAD descriptor and resulting binary code (given by the sign function) before and after learned bilinear
                         rotation. We only show the ﬁrst 32 SIFT dimensions and visual codewords. Before the rotation, we can clearly see a block pattern, with
                         manyzerovalues. After the rotation, the descriptor and the binary code look more whitened.
                                                                                                                                                                                  P
                         points, we want to maximize                                                                                                  where D =                        N (B RTXT). The above expression is
                                                                                                                                                                        1              i=1         i     2      i
                            X                                                                                                                         maximized with the help of polar decomposition: R1 =
                                   N                                                                                                                  V UT,whereD =U S VT istheSVDofD .
                                          cos(θ )                                                                                                        1                            1           1 1                                            1
                                   i=1               i                                                                                                         1                                            1
                                         X   ˆT T                                                    !                                                (S3) Update R2:
                                                N           sgn(R xi)                    ˆT
                                 =              i=1                  √                (R xi)                                          (4)                                           X
                                                                         d                                                                                                                 N                     T       T
                                                                                                                                                         Q(R ) =                                   tr(B R X R )
                                         X                                      T                 T                                                              2                       i=1              i    2      i      1
                                                N           vec(sgn(R X R ))
                                                                                 1      i    2                      T                                                               X
                                 =                                            √                        vec(R X R )                                                                         N
                                                i=1                                                                 1      i    2                                           =                      tr(RTXTR B )
                                                                                  d                                                                                                        i=1              2      i      1     i
                                                  X                                                                                                                                                                                
                                            1            N                                                                                                                                        X
                                          √                                       T               T                                                                                            T         N            T                                  T
                                 =                                vec(B ) vec(R X R )
                                                                              i                   1      i    2                                                             = tr R                              (X R B ) =tr(R D ),
                                              d          i=1                                                                                                                                   2         i=1          i      1     i                     2      2
                                                  X
                                            1            N                     T      T                                                                                          P
                                 = √                            tr(B R X R ),                                                         (5)                                             N            T
                                                                         i     2      i      1                                                        where D =                             (X R B ). Analogously to the update
                                              d          i=1                                                                                                            2             i=1         i      1     i
                                                                                                                                                      rule for R , the update rule for R is R = U V T, where
                                                                                                                                                                          1                                              2           2           2 2
                         where B = sgn(RTX R ). Notice that (4) involves the                                                                          D =U S VT istheSVDofD .
                                          i                     1      i     2                                                                            2           2 2 2                                          2
                                                                         ˆ            d×d
                         large projection matrix R ∈ R                                        , direct optimization of                                      Wecycle between these updates for several iterations to
                         which is challenging. However, after reformulation into bi-                                                                  obtainalocalmaximum. Theconvergenceoftheabovepro-
                         linear form (5), the expression only involves the two small                                                                  gram is guaranteed in ﬁnite number of iterations as the op-
                         matrices R and R . Letting B = {B ,...,B }, our ob-
                                              1               2                                     1               N                                 timal solution of each step is exactly obtained, each step
                         jective function is as follows:                                                                                              is guaranteed not to decrease the objective function value,
                                                                                    N                                                                 and the objective function is bounded from above. In our
                                  Q(B,R ,R )= max Xtr(B RTXTR )                                                                        (6)            implementation, we initialize R1 and R2 by random rota-
                                                 1       2         B,R ,R                           i    2      i      1
                                                                          1     2 i=1                                                                 tions and use three iterations. We have not found signiﬁcant
                                                                  d ×d               T                         T                                      improvement of performance by using more iterations. The
                         s.t. B ∈ {−1,+1} 1                                2, R R =I, R R =I.
                                       i                                             1      1                  2      2                                                                                                                 3        3
                                                                                                                                                      timecomplexityofthisprogramisO(N(d +d ))whered1
                                                                                                                                                                                                                                        1        2
                         This optimization problem can be solved by block coordi-                                                                     andd aretypicallyfairlysmall(e.g.,d = 128, d = 500).
                                                                                                                                                                 2                                                               1                    2
                         nate ascent by alternating between the different variables                                                                         Figure 2 visualizes the structure of a VLAD descrip-
                         {B ,...,B }, R , and R . We describe the update steps                                                                        tor and the corresponding binary code before and after a
                               1               N            1                2
                         for each variable below, assuming the others are ﬁxed.                                                                       learned bilinear rotation.
                         (S1) Update Bi. When R1 and R2 are ﬁxed, we indepen-                                                                         2.3. Learning with Dimensionality Reduction
                         dently solve for each B by maximizing
                                                                      i
                                                                                           P P                                                              The formulation of Section 2.2 is used to learn d-
                                                                  T      T                      d1           d2         kl elk
                                Q(B ) = tr(B R X R ) =                                                              B V ,
                                         i                   i    2      i      1               k=1          l=1        i     i                       dimensional binary codes starting from d-dimensional de-
                                       elk                                                   e             T       T                                  scriptors. Now, to produce a code of size c = c1 × c2,
                         where V               denote the elements of V = R X R . Q(B )
                                         i                                                     i           2      i      1              i             where c1 < d1 and c2 < d2, we need projection matrices
                                                                               eT                                                                                       d ×c                            d ×c                                T
                         is maximized by Bi = sgn(Vi ).                                                                                               R ∈ R 1 1, R ∈ R 2 2 such that R R = I and
                                                                                                                                                          1                               2                                                 1      1
                         (S2) Update R . Expanding (6) with R and B ﬁxed, we                                                                          RTR =I.EachB isnowac ×c binaryvariable. Con-
                                                       1                                               2               i                                  2      2                           i                     1        2
                         have the following:                                                                                                          sider the cosine of the angle between a lower-dimensional
                                                                                                                                                                                        ˆT                                                                  ˆT
                                                        XN                                                                                            projected vector R xi and its binary encoding sgn(R x):
                             Q(R ) =                                  tr(B RTXTR )
                                      1                       i=1              i    2      i       1
                                                                                                                                                                                                       ˆT           T        ˆT
                                                                                                                                                                                            sgn(R x )                     R xi
                                                              X                                                                                                                                                 i
                                                                     N                 T      T                                                                         cos(θ ) =                      √                                    ,
                                               = tr                         (B R X )R                        =tr(D R ),                                                            i                                        ˆT
                                                                                 i    2       i        1                    1     1                                                                        c            kR x k
                                                                     i=1                                                                                                                                                            i 2
                             d d ×c c          T
                      ˆ        1 2  1 2       ˆ ˆ
              where R ∈ R               and R R = I. This formulation                 The second dataset is the ILSVRC2010 subset of Ima-
                                                              ˆT
              differs from that of (4) in that it contains kR x k in the           geNet [2], which contains 1.2M images and 1,000 classes.
                                                                   i 2
              denominator,whichmakestheoptimizationdifﬁcult[5]. In-                Onthisdataset, weusethepubliclyavailableSIFTfeatures,
              stead, we follow [5] to deﬁne a relaxed objective function           which are densely extracted from the images at three dif-
              based on the sum of linear correlations                              ferent scales. We cluster the features into 200 centers and
                                  X   ˆT T                         !               then aggregate them into VLAD vectors of 128 × 200 =
                                      N     sgn(R xi)       ˆT                     25,600 dimensions. These vectors are also power- and L -
               Q(B,R ,R )=                        √        (R x ) .                                                                             2
                        1   2         i=1           c            i                 normalized. In addition, we compute LLC features [32] on
              The optimization framework for this objective is similar to          this dataset using a 5,000-dimensional visual codebook and
              that of Section 2.2. For the three alternating optimization          athree-level spatial pyramid (21 spatial bins). The resulting
              steps, (S1) remains the same. For (S2) and (S3), we com-             features have 5,000 × 21 = 105,000 dimensions. Unlike
              pute the SVD of D and D as U S VT and U S VT re-                     VLADdescriptors, which are dense and have both positive
                                   1        2     1 1 1           2 2 2            and negative values, the LLC descriptors are nonnegative
                                                                  ˆ    T and
              spectively, and set the two rotations to R      = V U
                                                            1      1 1             and sparse. For improved results, we zero-center and L -
                     ˆ    T          ˆ                                                                                                          2
              R =U V ,whereV isthetopc singular vectors of V
                2      2 2            1             1                       1      normalize them.
                   ˆ
              and U is the top c singular vectors of U . To initialize the
                    2             2                       2
              optimization, we generate random orthogonal directions.              3.2. Experimental Protocols
              2.4. Distance Computation for Binary Codes                              To learn binary codes using the methods of Sections 2.2
                 At retrieval time, given a query descriptor, we need to           and 2.3, we randomly sample 20,000 training images from
              compute its distance to every binary code in the database.           each dataset. We then set aside a number of query images
              The most popular measure of distance for binary codes is             that were not used for training and run nearest neighbor
              the Hammingdistance. Wecomputeitefﬁcientlybyputting                  searches against all the other images in the dataset. For
              bits in groups of 64 and performing an XOR and bit count             Holiday+Flickr1M, we sample the training images from
              (popcount). Forimprovedaccuracy,weuseasymmetricdis-                  the Flickr subset only and use the 500 predeﬁned queries.
              tance, in which the database points are quantized but the            For ILSVRC2010, we use 1,000 random queries. For each
              query is not [3, 8, 13]. In this work, we adopt the lower-           dataset, we evaluate two types of retrieval tasks:
              bounded asymmetric distance proposed in [3] to measure                 1. Retrieval of ground-truth nearest neighbors from the
              the distance between the query and binary codes. For a                     original feature space. These are deﬁned as the top
              query x ∈ Rc and database points b ∈ {−1,+1}c, the                         ten Euclidean nearest neighbors for each query based
              lower-bounded asymmetric distance is simply the L dis-                     on original descriptors. Our performance measure for
                                                                       2
              tance between x and b: d (x,b) = kxk2 + c − 2xTb.                          this task is the recall of 10NN for different numbers of
                                           a                2
                         2
              Since kxk is on the query side and c is ﬁxed, in practice,                 retrieved points [9, 13].
                         2
                                            T
              we only need to compute x b. We do this by putting bits                2. Retrieval of “relevant” or “semantically correct”
              in groups of 8 and constructing a 1 × 256 lookup table to                  neighbors. For Holiday+Flickr1M, these are deﬁned
              makethedot-product computation more efﬁcient.                              as the images showing the same object instance (re-
              3. Experiments                                                             call from Section 3.1 that each query has around
                                                                                         three matches). The standard performance measure
              3.1. Datasets and Features                                                 for this task on this dataset is mean average precision
                                                                                         (mAP) [8, 12, 14]. For ILSVRC2010, we deﬁne the
                 Wetestourproposedapproach,bilinearprojection-based                      ground-truth “relevant” neighbors as the images shar-
              binary codes (BPBC), on two large-scale image datasets                     ing the same category label. In this case, there are very
              and two feature types. The ﬁrst one is the INRIA Holiday                   many matches for each query. Following [6, 18, 20],
              dataset with 1M Flickr distractors (Holiday+Flickr1M)                      we report performance using precision at top k re-
              [12]. There are 1,419 images in the Holiday dataset cor-                   trieved images (precision@k).
              responding to 500 different scene instances, and each in-            In addition, in Section 3.8 we perform image classiﬁcation
              stance has three images on average. There is a set of 500            experiments on the ILSVRC2010 dataset.
              query images, and the remaining 919 images together with
              the 1M Flickr images are used as the database. We use the            3.3. Baseline Methods
              SIFT features of interest points provided by [14] and clus-
              ter them to 500 k-means centers. Then we represent each                 Our main baseline is the state-of-the-art Product Quan-
              image by a 128 × 500 = 64,000 dimensional VLAD. The                  tization (PQ) approach [13]. PQ groups the data dimen-
              vectors are power-normalized(element-wisesquare-rooted)              sions in batches of size s and quantizes each group with k
              and L2-normalized as in [25].                                        codebook centers. In our experiments, we use s = 8 and
The words contained in this file might help you see if this file matches what you are looking for:

...Learning binary codes for high dimensional data using bilinear projections yunchao gong sanjiv kumar henrya rowley svetlana lazebnik uncchapelhill google research university of illinois cs unc edu sanjivk com har slazebni itq feasible abstract unfeasible proposed recent advances in visual recognition indicate that to achievegoodretrievalandclassicationaccuracyonlarge scale datasets like imagenet extremely descriptors e g fisher vectors are needed we recall memory mb present a novel method converting such compact similarity preserving exploits their natural matrix structure reduce code size ity instead single nn search b storage requirements large projection this achieves compara figure comparison our coding with ble retrieval and classication accuracy the original de state art on k flickr images scriptors product quantization kqueries vlad re approachwhilehavingordersofmagnitudefastercodegen call from feature space at top retrieved eration time smaller footprint points matrices note lo...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area