Geometry Pdf 166860 | Multiview Intro

Partial capture of text on file.
           Chapter 7
           Introduction to multiple view
           reconstruction
           In the previous chapter we have studied the geometry of points in space as seen from two views.
           There, the epipolar constraint plays a crucial role in that it captures the relationship between
           corresponding points and camera conﬁguration without reference to the position of such points in
           space. The epipolar constraint gives a complete answer to the question of what the relationship
           between corresponding points in two views are. It is then natural to ask this question in the context
           of an arbitrary number of views: what is the relationship between corresponding points in m views?
           Not only is this question crucial to the undertanding of the geometry of points in space as seen
           from a number of views, but – like in the two-view case – the constraints will be used to derive
           algorithms for reconstructing camera conﬁguration and, ultimately, the 3-D position of points.
              The search for the m-view analogue of the epipolar constraint has been an active research area
           for a decade. It was realized early on [24] that the relationship between multiple views of the same
           point can be characterized by multi-linear constraints. Like the epipolar constraint is bilinear, i.e. it
           is linear in the coordinates of the point in one view having ﬁxed the coordinates in the other, one can
           showthatthereexistm-linearconstraints between correspondingpointsinmimages. Consequently,
           the study of multiple view geometry has involved multi-dimensional linear operators, or tensors,
           with as many indices as views. There is now a substantial literature on the properties of so-called
           tri-focal and quadri-focal tensors that involves the use of sophisticated algebraic geometric tools.
           In the interest of simplicity, this book takes an approach that is substantially diﬀerent from the
           current literature, in that it only involves linear algebra and matrix manipulations. We will show
           that all the constraints between corresponding points in m images can be written as constraints on
           the rank of certain matrices. In order to put this approach into a historic context, in the exercises
           we explore the relation between these rank conditions and multi-linear tensors.
           7.1  Preliminaries
           To set the stage, we recall the notation from Chapter 3: consider the generic point in space p ∈ E3
           with coordinates X =[X,Y,Z,1]T relative to a ﬁxed (inertial) coordinate frame. Let the point be
           projected onto the image plane of a moving camera, and call x(t)=[x(t),y(t),1]T its projective
           coordinates. These are related to the inertial coordinates via
                                     λ(t)x(t)=A(t)Pg(t)X,                    (7.1)
                                            117
                  118               CHAPTER7. INTRODUCTIONTOMULTIPLEVIEWRECONSTRUCTION
                  whereλ(t) ∈ R+ is the (unknown)depth of the point p relative to the camera frame, A(t) ∈ SL(3) is
                  the camera calibration matrix, P =[I,0] ∈ R3×4 is the constant projection matrix and g(t) ∈ SE(3)
                  is the coordinate transformation from the world frame to the camera frame at time t.Intheabove
                  equation, all x,X and g are in homogeneous representation. Here we let the calibration matrix A
                  to change in time, in order to model eﬀects such as zooming or aperture changes.
                      Since we assume that images are measured at discrete time instances t ,t ,...,t , for simplicity
                                                                                                   1   2      m
                  of notation we denote             .              .              .
                                                λ =λ(t ),      x =x(t ),      Π =A(t)Pg(t ).                               (7.2)
                                                  i       i      i      i       i       i      i
                                                                                      th
                  The matrix Π is then a 3 × 4 matrix which relates the i                image of the point p to its world
                                  i
                  coordinates X by
                                                                  x λ =ΠX                                                  (7.3)
                                                                    i i     i
                  for i =1,...,m. We can re-write the above equations in matrix form as
                                                 x     0    ···    0  λ   Π 
                                                    1                         1            1
                                                 0 x ··· 0  λ   Π 
                                                        2              2 = 2 X                                       (7.4)
                                                 .      .   .      .   .   . 
                                                 .      .    ..    .   .   . 
                                                    .    .          .         .           .
                                                   00··· x                  λ            Π
                                                                    m         m            m
                  which, after deﬁning
                                          x      0   ···    0              λ               Π 
                                              1                                 1                   1
                                          0 x ··· 0                        λ               Π 
                                       .          2                 ~ .      2          .      2 
                                    I = .        .    .     .   ,    λ= . , Π= . ,                                   (7.5)
                                          .      .     ..   .              .               . 
                                             .    .          .                  .                  .
                                             00··· x                          λ                  Π
                                                              m                 m                  m
                  becomes
                                                                     ~
                                                                   Iλ=ΠX.                                                  (7.6)
                                                                                                   ~       3m
                  which is just a matrix version of (7.3). For obvious reasons, we call λ ∈ R                  the depth scale
                  vector,andΠ∈ R3m×4 the projection matrix associated to the image matrix I∈R3m×m.The
                  reader will notice that, with the exception of x ’s, everything else in this equation is unknown!
                                                                         i
                  Solving for the depth scales and the projection matrices directly from such equations is by no
                  means straightforward. Like we did in the two-view case, therefore, we will decouple the recovery
                  of the camera conﬁguration matrices Π ’s from recovery of scene structure, λ ’s and X.
                                                              i                                          i
                      In order for equation (7.6) to be satisﬁed, it is necessary for the columns of I and Π to be
                  linearly dependent. In other words, the matrix
                                                          Π      x      0   ···    0 
                                                               1    1
                                                                             .     .   
                                                          Π                   ..   .   
                                             .                     0    x           .
                                         N=[Π, I]= 2                    2              ∈R3m×(m+4)                        (7.7)
                                                          .       .    .     .         
                                                          .       .     ..    ..       
                                                              .    .                0
                                                            Π      0   ···    0    x
                                                              m                      m
                  must have a non-trivial null-space
                                                               rank(N) ≤ m+3.                                              (7.8)
                                                                                              .    T    ~T T      m+4
                  In fact, from equation (7.6) it is immediate to see that the vector v =[X ,−λ ]             ∈R       is in the
                  null space of the matrix N, i.e. Nv=0.
                      This apparently obvious observation is the basis of our characterization of all multi-linear con-
                  straints among corresponding points in m views.
                  7.2.  EPIPOLARGEOMETRYREVISITED                                                                            119
                  7.2     Epipolar geometry revisited
                  Notethatthecondition thatthem+4columnsofI andΠbelinearlydependentdoesnotinvolve the
                  3-D position of the points, just its images and camera conﬁgurations. Therefore, for the particular
                  case of m = 2 views, one would like to make sure that this condition implies the epipolar constraint.
                  Since 3m = m+4=6, the rank condition (7.8) is equivalent to
                                                                      Π     x     0 
                                                      det(N)=det          1    1        =0.                                 (7.9)
                                                                        Π     0   x
                                                                          2         2
                  Since the determinant is linear in the elements of the matrix, the equation above is trivially bilinear
                  in x ,x . It is an easy exercise to show that any bilinear function x ,x can be written as
                      1   2                                                                    1   2
                                                                       T
                                                                      x Fx                                                 (7.10)
                                                                       2    1
                  for some matrix F ∈ R3×3. In particular, the entries of F are the 4 × 4 minors of the matrix
                                                                    Π 
                                                              Π= 1 ∈R6×4.
                                                                      Π
                                                                       2
                  For instance, for the case of two calibrated cameras, the matrix F is exactly the essential matrix1
                  To see this, without loss of generality, let us assume that the ﬁrst camera frame coincides with the
                  world frame and the camera motion between the two views is (R,T). Then we have
                                                    Π =[I, 0],       Π =[R, T]        ∈R3×4.                               (7.11)
                                                      1                2
                  The determinant of the matrix N =[ΠI] yields
                                                                                                    
                                                        I   0   x     0             I   000
                                                 det             1         =det
                                                       RT0 x                        0 TRx x
                                                                       2                        1    2
                                            =det[T,Rx ,x ]=det[x ,T,Rx ].
                                                            1   2          2        1
                  For the last step, we use the fact from linear algebra that det[v ,v ,v ]=vT(v ×v ). This yields
                                                                                          1  2   3     1   2     3
                                                                              T b
                                                        det[x ,T,Rx ]=x TRx .
                                                              2        1      2      1
                  Therefore, we obtain
                                                                               T b
                                                       det(N)=0 ⇔ x TRx =0                                                 (7.12)
                                                                               2      1
                  which says that, for the two-view case, the rank condition (7.8) is equivalent to the epipolar
                  constraint.   The advantage of the rank condition is that it generalizes to multiple views in a
                  straightforward manner, unlike the epipolar constraint. A similar derivation can be followed for
                  the uncalibrated case by just allowing R ∈ R3 to be general and not a rotation matrix.
                      In the next section, for the purpose of example, we show how to derive all independent con-
                  straints between correspondences in m = 3 views. We do not show explicitly how to derive inde-
                  pendent constraints in m ≥ 4 views, since we will see in Chapter 8 that they do not exist! There
                  we will unravel the geometry of arbitrary points seen from arbitrary views and conclude that any
                  algebraic constraints among the m images can be reduced to those among 2 or 3 images at a time,
                  and that in general all the trilinear constraints algebraically imply all bilinear constraints.
                     1In the tensorial jargon, the epipolar constraint is sometimes referred to as the bilinear constraint, and the essential
                  or fundamental matrix F as the bifocal tensor. Mathematically, F can be interpreted as a covariant 2-tensor whose
                  contraction with x1 and x2 is zero.
               120             CHAPTER7. INTRODUCTIONTOMULTIPLEVIEWRECONSTRUCTION
               7.3     Three views
               For three views, we have 3m =9>m+4=7.Thematrix
                                                                     
                                                     Π x 00
                                                       1   1
                                              N=Π 0 x 0 ∈R9×7                                         (7.13)
                                                       2        2
                                                     Π     00x
                                                       3            3
               then has more rows than columns. Therefore, the rank condition (7.8) implies that all 7 × 7 sub-
               matrices are singular. As we explore in the exercises, the only irreducible minor of N must contain
               at least two rows from each image. For example, if we choose all three rows of the ﬁrst image and
                        th                                   th
               the {1,2}   rows from the second and the {1,2}  rows from the third, we obtain a 7×7 sub-matrix
               of N as                                                 
                                                      Π x 00
                                                       1    1
                                                    1                  
                                                      Π    0   x21   0
                                                    2                       7×7
                                                    2                  
                                            N033 =    Π    0   x22   0    ∈R                            (7.14)
                                                    2                  
                                                    1                  
                                                      Π    00x
                                                       3             31
                                                       2
                                                      Π    00x
                                                       3             32
               where the subscript rst of N    indicates that the rth, sth and tth rows of images 1,2and3re-
                                            rst
               spectively are omitted from the original matrix N when N    is formed. The rank condition (7.8)
                                                                        rst
               implies that
                                                        det(N   ) = 0                                   (7.15)
                                                              033
               and, as in the two-view case, the determinant is linear in each of the x ,x ,x . The above equation,
                                                                                 1   2  3
               written in tensor form, is called the trifocal tensor; we explore its properties in the exercises. Here,
               instead, we derive its expression directly from the rank condition.
                   Let us demonstrate this by choosing the ﬁrst camera frame to be the reference frame. That
               gives the three projection matrices Π ,i=1,2,3
                                                   i
                                     Π =[I,0],    Π =[R ,T], Π =[R ,T] ∈R3×4,                           (7.16)
                                      1            2      2  2     3     3   3
               where, in the case of uncalibrated cameras, R ∈ R3×3,i=2,3 may not be rotation matrices. The
                                                           i
               matrix N is then                                        
                                                    I   0   x    00
                                                             1
                                            N=R T 0 x 0 ∈R9×7.                                        (7.17)
                                                     2   2        2
                                                   R T 00x
                                                     3   3            3
               This matrix should satisfy the rank condition. After a column manipulation (which eliminates x
                                                                                                             1
               from the ﬁrst row), it is easy to see that N has the same rank as the matrix
                                                                        
                                                   I   0000
                                          N′ =  R     T   R x    x    0 ∈R9×7.                        (7.18)
                                                    2   2    2 1   2
                                                  R T Rx 0 x
                                                    3   3    3 1        3
               For this matrix to drop rank, its sub-matrix
                                                    T   R x    x    0 
                                             N′′ =    2    2 1   2       ∈R6×4                          (7.19)
                                                     T   R x     0  x
                                                      3    3 1        3
The words contained in this file might help you see if this file matches what you are looking for:

...Chapter introduction to multiple view reconstruction in the previous we have studied geometry of points space as seen from two views there epipolar constraint plays a crucial role that it captures relationship between corresponding and camera conguration without reference position such gives complete answer question what are is then natural ask this context an arbitrary number m not only undertanding but like case constraints will be used derive algorithms for reconstructing ultimately d search analogue has been active research area decade was realized early on same point can characterized by multi linear bilinear i e coordinates one having xed other showthatthereexistm linearconstraints correspondingpointsinmimages consequently study involved dimensional operators or tensors with many indices now substantial literature properties so called tri focal quadri involves use sophisticated algebraic geometric tools interest simplicity book takes approach substantially dierent current algebra...
Related files

Share

Help

Related files

Share

Share to social media

Help

Login Area