139x Filetype PDF File size 0.11 MB Source: cs.gmu.edu
Chapter 7 Introduction to multiple view reconstruction In the previous chapter we have studied the geometry of points in space as seen from two views. There, the epipolar constraint plays a crucial role in that it captures the relationship between corresponding points and camera configuration without reference to the position of such points in space. The epipolar constraint gives a complete answer to the question of what the relationship between corresponding points in two views are. It is then natural to ask this question in the context of an arbitrary number of views: what is the relationship between corresponding points in m views? Not only is this question crucial to the undertanding of the geometry of points in space as seen from a number of views, but – like in the two-view case – the constraints will be used to derive algorithms for reconstructing camera configuration and, ultimately, the 3-D position of points. The search for the m-view analogue of the epipolar constraint has been an active research area for a decade. It was realized early on [24] that the relationship between multiple views of the same point can be characterized by multi-linear constraints. Like the epipolar constraint is bilinear, i.e. it is linear in the coordinates of the point in one view having fixed the coordinates in the other, one can showthatthereexistm-linearconstraints between correspondingpointsinmimages. Consequently, the study of multiple view geometry has involved multi-dimensional linear operators, or tensors, with as many indices as views. There is now a substantial literature on the properties of so-called tri-focal and quadri-focal tensors that involves the use of sophisticated algebraic geometric tools. In the interest of simplicity, this book takes an approach that is substantially different from the current literature, in that it only involves linear algebra and matrix manipulations. We will show that all the constraints between corresponding points in m images can be written as constraints on the rank of certain matrices. In order to put this approach into a historic context, in the exercises we explore the relation between these rank conditions and multi-linear tensors. 7.1 Preliminaries To set the stage, we recall the notation from Chapter 3: consider the generic point in space p ∈ E3 with coordinates X =[X,Y,Z,1]T relative to a fixed (inertial) coordinate frame. Let the point be projected onto the image plane of a moving camera, and call x(t)=[x(t),y(t),1]T its projective coordinates. These are related to the inertial coordinates via λ(t)x(t)=A(t)Pg(t)X, (7.1) 117 118 CHAPTER7. INTRODUCTIONTOMULTIPLEVIEWRECONSTRUCTION whereλ(t) ∈ R+ is the (unknown)depth of the point p relative to the camera frame, A(t) ∈ SL(3) is the camera calibration matrix, P =[I,0] ∈ R3×4 is the constant projection matrix and g(t) ∈ SE(3) is the coordinate transformation from the world frame to the camera frame at time t.Intheabove equation, all x,X and g are in homogeneous representation. Here we let the calibration matrix A to change in time, in order to model effects such as zooming or aperture changes. Since we assume that images are measured at discrete time instances t ,t ,...,t , for simplicity 1 2 m of notation we denote . . . λ =λ(t ), x =x(t ), Π =A(t)Pg(t ). (7.2) i i i i i i i th The matrix Π is then a 3 × 4 matrix which relates the i image of the point p to its world i coordinates X by x λ =ΠX (7.3) i i i for i =1,...,m. We can re-write the above equations in matrix form as x 0 ··· 0 λ Π 1 1 1 0 x ··· 0 λ Π 2 2 = 2 X (7.4) . . . . . . . . .. . . . . . . . . 00··· x λ Π m m m which, after defining x 0 ··· 0 λ Π 1 1 1 0 x ··· 0 λ Π . 2 ~ . 2 . 2 I = . . . . , λ= . , Π= . , (7.5) . . .. . . . . . . . . 00··· x λ Π m m m becomes ~ Iλ=ΠX. (7.6) ~ 3m which is just a matrix version of (7.3). For obvious reasons, we call λ ∈ R the depth scale vector,andΠ∈ R3m×4 the projection matrix associated to the image matrix I∈R3m×m.The reader will notice that, with the exception of x ’s, everything else in this equation is unknown! i Solving for the depth scales and the projection matrices directly from such equations is by no means straightforward. Like we did in the two-view case, therefore, we will decouple the recovery of the camera configuration matrices Π ’s from recovery of scene structure, λ ’s and X. i i In order for equation (7.6) to be satisfied, it is necessary for the columns of I and Π to be linearly dependent. In other words, the matrix Π x 0 ··· 0 1 1 . . Π .. . . 0 x . N=[Π, I]= 2 2 ∈R3m×(m+4) (7.7) . . . . . . .. .. . . 0 Π 0 ··· 0 x m m must have a non-trivial null-space rank(N) ≤ m+3. (7.8) . T ~T T m+4 In fact, from equation (7.6) it is immediate to see that the vector v =[X ,−λ ] ∈R is in the null space of the matrix N, i.e. Nv=0. This apparently obvious observation is the basis of our characterization of all multi-linear con- straints among corresponding points in m views. 7.2. EPIPOLARGEOMETRYREVISITED 119 7.2 Epipolar geometry revisited Notethatthecondition thatthem+4columnsofI andΠbelinearlydependentdoesnotinvolve the 3-D position of the points, just its images and camera configurations. Therefore, for the particular case of m = 2 views, one would like to make sure that this condition implies the epipolar constraint. Since 3m = m+4=6, the rank condition (7.8) is equivalent to Π x 0 det(N)=det 1 1 =0. (7.9) Π 0 x 2 2 Since the determinant is linear in the elements of the matrix, the equation above is trivially bilinear in x ,x . It is an easy exercise to show that any bilinear function x ,x can be written as 1 2 1 2 T x Fx (7.10) 2 1 for some matrix F ∈ R3×3. In particular, the entries of F are the 4 × 4 minors of the matrix Π Π= 1 ∈R6×4. Π 2 For instance, for the case of two calibrated cameras, the matrix F is exactly the essential matrix1 To see this, without loss of generality, let us assume that the first camera frame coincides with the world frame and the camera motion between the two views is (R,T). Then we have Π =[I, 0], Π =[R, T] ∈R3×4. (7.11) 1 2 The determinant of the matrix N =[ΠI] yields I 0 x 0 I 000 det 1 =det RT0 x 0 TRx x 2 1 2 =det[T,Rx ,x ]=det[x ,T,Rx ]. 1 2 2 1 For the last step, we use the fact from linear algebra that det[v ,v ,v ]=vT(v ×v ). This yields 1 2 3 1 2 3 T b det[x ,T,Rx ]=x TRx . 2 1 2 1 Therefore, we obtain T b det(N)=0 ⇔ x TRx =0 (7.12) 2 1 which says that, for the two-view case, the rank condition (7.8) is equivalent to the epipolar constraint. The advantage of the rank condition is that it generalizes to multiple views in a straightforward manner, unlike the epipolar constraint. A similar derivation can be followed for the uncalibrated case by just allowing R ∈ R3 to be general and not a rotation matrix. In the next section, for the purpose of example, we show how to derive all independent con- straints between correspondences in m = 3 views. We do not show explicitly how to derive inde- pendent constraints in m ≥ 4 views, since we will see in Chapter 8 that they do not exist! There we will unravel the geometry of arbitrary points seen from arbitrary views and conclude that any algebraic constraints among the m images can be reduced to those among 2 or 3 images at a time, and that in general all the trilinear constraints algebraically imply all bilinear constraints. 1In the tensorial jargon, the epipolar constraint is sometimes referred to as the bilinear constraint, and the essential or fundamental matrix F as the bifocal tensor. Mathematically, F can be interpreted as a covariant 2-tensor whose contraction with x1 and x2 is zero. 120 CHAPTER7. INTRODUCTIONTOMULTIPLEVIEWRECONSTRUCTION 7.3 Three views For three views, we have 3m =9>m+4=7.Thematrix Π x 00 1 1 N=Π 0 x 0 ∈R9×7 (7.13) 2 2 Π 00x 3 3 then has more rows than columns. Therefore, the rank condition (7.8) implies that all 7 × 7 sub- matrices are singular. As we explore in the exercises, the only irreducible minor of N must contain at least two rows from each image. For example, if we choose all three rows of the first image and th th the {1,2} rows from the second and the {1,2} rows from the third, we obtain a 7×7 sub-matrix of N as Π x 00 1 1 1 Π 0 x21 0 2 7×7 2 N033 = Π 0 x22 0 ∈R (7.14) 2 1 Π 00x 3 31 2 Π 00x 3 32 where the subscript rst of N indicates that the rth, sth and tth rows of images 1,2and3re- rst spectively are omitted from the original matrix N when N is formed. The rank condition (7.8) rst implies that det(N ) = 0 (7.15) 033 and, as in the two-view case, the determinant is linear in each of the x ,x ,x . The above equation, 1 2 3 written in tensor form, is called the trifocal tensor; we explore its properties in the exercises. Here, instead, we derive its expression directly from the rank condition. Let us demonstrate this by choosing the first camera frame to be the reference frame. That gives the three projection matrices Π ,i=1,2,3 i Π =[I,0], Π =[R ,T], Π =[R ,T] ∈R3×4, (7.16) 1 2 2 2 3 3 3 where, in the case of uncalibrated cameras, R ∈ R3×3,i=2,3 may not be rotation matrices. The i matrix N is then I 0 x 00 1 N=R T 0 x 0 ∈R9×7. (7.17) 2 2 2 R T 00x 3 3 3 This matrix should satisfy the rank condition. After a column manipulation (which eliminates x 1 from the first row), it is easy to see that N has the same rank as the matrix I 0000 N′ = R T R x x 0 ∈R9×7. (7.18) 2 2 2 1 2 R T Rx 0 x 3 3 3 1 3 For this matrix to drop rank, its sub-matrix T R x x 0 N′′ = 2 2 1 2 ∈R6×4 (7.19) T R x 0 x 3 3 1 3
no reviews yet
Please Login to review.