Triangulation methods Mihaylova


Published on

Work for the seminar "Robotik und Medizin" (Robotics and Medicine")

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Triangulation methods Mihaylova

  1. 1. Institut Autom Institut fur Prozessrechentechnik, ¨ Automation und Robotik (IPR) Triangulation Methods Seminar paper of Zlatka Mihaylova SS 2009 Supervisor : M.Phys. Matteo Ciucci
  2. 2. Contents 1 Introduction 1 2 Basics 2 2.1 Epipolar geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2.2 Fundamental matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Camera matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.4 Essential matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Reconstructing 3D points from an image pair 5 3.1 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Computation of the fundamental matrix . . . . . . . . . . . . . . . . . . . . . 5 3.2.1 Normalized eight point algorithm . . . . . . . . . . . . . . . . . . . . 6 3.2.2 Algebraic minimization algorithm . . . . . . . . . . . . . . . . . . . . 6 3.2.3 Gold standard algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2.4 Automatic computation of the fundamental matrix . . . . . . . . . . . 7 3.3 Image rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Triangulation methods 9 4.1 Linear triangulation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.2 Minimization of geometric error . . . . . . . . . . . . . . . . . . . . . . . . . 9 4.3 Sampson approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 4.4 The optimal solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5 Practical Examples of Triangulation 11 5.1 Triangulation with structured light . . . . . . . . . . . . . . . . . . . . . . . . 11 5.1.1 Light spot technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.1.2 Stripe projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1.3 Projection of a static line pattern . . . . . . . . . . . . . . . . . . . . . 12 5.1.4 Projection of encoded patterns . . . . . . . . . . . . . . . . . . . . . . 13 5.1.5 Light spot stereo analysis . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2 Triangulation from stereo vision . . . . . . . . . . . . . . . . . . . . . . . . . 14 6 Conclusion 15 i
  3. 3. 1 INTRODUCTION 1 1 Introduction The ability of our brain to percept the relative distance between objects and to particular object is not a consequence of measuring the exact length. We calculate it with the help of information collected through our eyes. This approach is widely used in robotics, because all decisions to be taken are based on knowing the relationships between objects and almost never these could be measured directly. One method for computing the location of a point in three dimensional space is by com- paring two or more images taken from a different points of view. In case of two images, this method is called triangulation. The human visual perception system is based on this principle, namely producing two slightly displaced images, which encode information about the positions in the space. The next chapter introduces the basic terms of the triangulation. For better a understanding, the reader should be familiar with the geometry standing behind this concept and the mathe- matical description of the problem.
  4. 4. 2 BASICS 2 2 Basics In the following section we explain the basic concepts on which the triangulation is based. This kind of mathematics is often referred to as epipolar geometry. There are some special matrices involved in the description of relationships between three points in a space, which we are going to observe in the next few sections. They precisely describe the main principles in stereo vision. 2.1 Epipolar geometry The epipolar geometry is a subset of projective geometry which helps for searching correspond- ing points between two images. This kind of geometry is independent of the scene structure, but only of the used cameras, their internal parameters and relative positions. The line (also referred to as baseline) connecting the camera centers and intersecting the image planes in point e , respectively e defines the epipoles. Another important term is the epipolar plane π. Three points are needed to define this plane - an external object point C and its projections on the images. This statement can be reformulated by saying that the plane is also defined by the object point and the camera centers, because the projection of the 3D point lies on the line CA, respectively CB. From Figure 1 it is obvious, that π is not the same for all three dimensional points, but all epipolar planes contain the baseline. If we know c - the projection of C on the first image and additionally the plane π , then the projection on the other plane is still ambiguous. That means c is not fixed. However, instead of searching the whole second image plane, we can reduce the computational time by searching the projected point on only one line - the epipolar line e c . C � epipolar plane c c’ ep e lin ipo lar ar ol lin e e’ ip e ep A baseline B Figure 1: The 3D point C is projected on the both images in c and c . The points A and B indicate the centers of two pinhole cameras. By definition they are located on the epipolar plane π.
  5. 5. 2 BASICS 3 2.2 Fundamental matrix The fundamental 3 × 3 matrix F represents the connection between one point in the first image and a line in the second. This kind of mapping is very important, because as long as the funda- mental matrix is computed, we would be able to estimate easily the rest correspondences. As shown in Figure 1 any possible corresponding point of the projected c belongs to the epipolar line of the other image. There are different ways for computing F depending on the information we have. If both cameras are calibrated, which means their intrinsic parameters are given and furthermore information about the transfer plane π is available, then the correlation between the points in the image planes is already defined. This correlation is mathematically described by F. The most important property of the fundamental Matrix F is that for all pairs of points c and c the following equation must be valid: T c Fc = 0 (1) Well-known as a correspondence condition, this equation implies that there is another way of computing the correspondences between two sets of points. In difference to already dis- cussed possibility, now we see that the knowledge of the camera matrices P and P is not necessary. The fundamental matrix can be estimated if we have the coordinates of at least seven corresponding point pairs [HZ03]. From the correspondence condition another interesting properties of the fundamental matrix F can be derived. For example the definition for the epipolar line: l = Fc (2) where l is the epipolar line in the second image. We can convert the equation similarly for the another epipole line, because this relation is transposable. There is no image or camera prioritization, the both pictures are treated equally. F is a homogeneous matrix. Therefore it should have eight degree of freedom. Actually it has seven degree of freedom, because its determinant is null by default. This observation has a direct connection to the number of point correspondences needed for the computation of F . Finally we have to mention that assigning one line to a point is unidirectional. Trying to find the corresponding point to a line is meaningless and not possible. 2.3 Camera matrices The camera matrices P and P describe the projective properties of the cameras. One important issue, discussed in [HZ03] , is how these matrices relate to the fundamental matrix F . In one direction, both P and P result in an unique fundamental matrix F , but in the other direction these camera matrices can be determined from the fundamental matrix F up to a projective transformation of 3D space. The resulted ambiguity can be solved by adding an additional constraint to the product P T F P , that is the resulting matrix P T F P should be skew-symmetric. Skew symmetric matrices (also known as antisymmetric matrices [Wol]) have the form:   0 a12 a13 −a12 0 a23  (3) −a13 −a23 0 or in other words, matrices satisfying the condition: A = −AT .
  6. 6. 2 BASICS 4 As we stated in the previous section: (1) c T F c = 0. Knowing the relations c = P C and c = P C, we could now prove, that the matrix P T F P equivalent to C T P T F P C should be antisymmetric. 2.4 Essential matrix The matrix already discussed in section 2.2 is a generalization of another matrix called essential matrix E. The both matrices represent the epipolar constraint defined in the same section, but in the case of E, we have information about the intrinsic camera parameters, the cameras are said to be calibrated. The intrinsic camera parameters are for example the focal length of the camera, image format, principle point and the radial distortion coefficient of the lens. This additional information reduces the degrees of freedom of the essential matrix to five: three degrees of freedom of the rotation matrix R and two degrees of freedom of vector t, where t −→ is the coordinate vector of the translation AB separating the two cameras’ coordinate systems (more information available in [FP03]). The epipolar constraint is satisfied also by Essential matrix. T c Ec = 0 (4) The relation between F and E can be expressed with the following equation: −T F =P EP −1 (5) where P and P are already discussed in section calibration matrices. If the intrinsic camera parameters are given, then we need to know only five, and not seven point correspondences. However, the most difficult part of the triangulation approach is exactly finding the corresponding points in the two images.
  7. 7. 3 RECONSTRUCTING 3D POINTS FROM AN IMAGE PAIR 5 3 Reconstructing 3D points from an image pair 3.1 General approach One simple algorithm for reconstruction of a 3D point from an image pair is proposed in [BB82]. The simple technique involves taking two images of a scene, separated by a base- line, then identifying the correspondences and applying triangulation rules for defining the two lines, on which the world point lies. The intersection between these lines give us as result the values of the 3D point world coordinates Unfortunately, finding the corresponding point pairs is not a trivial work. This usually happens via pattern matching. The main idea is to find a correlation between the pixels of the both images. For this purpose pixel areas from the first image are compared to pixel areas from the second and if a pattern has been found, we compute the disparity (displacement) between the positions of these patterns in the both images. The correlation of two images is very expensive operation, which means it requires huge amount of computational power (the complexity of this operation is O(n2 m2 ) for m × m patch and n × n pixel image). But the biggest disadvantage of correlation is that some parts of the 3D scene could not be matched properly. For example, when a point exists in the first view, but in the second view it lies hidden behind some object. The bigger the distance between the camera centers, the higher the possibility of such an error. Otherwise, we can choose a definitely smaller distance to place the both cameras, but in this case, the accuracy of the depth computation decreases also. Supposing enough point correspondences are found, then the algorithm for determining the world point proposed by [HZ03] involves the following steps: • Computing the fundamental matrix F from the point pairs. At least eight corresponding point pairs are necessary for building a liner system with unknown F . The result of this linear system will be the coefficients of the fundamental matrix. • Using F for determining the camera matrices P and P . In case when the both cameras have the same intrinsic parameters, we simply use the equation P T F P = 0. In practice, we actually deal with calibrated cameras, which is to say, we have computed the essential matrix E. • Reconstructing the three-dimensional point C for every pair of corresponding points c and c with the help of both equations: c = P C and c = P C given in section 2.4. The special case with world point C laying on the baseline, can not be calculated, because all points on the baseline are projected in the epipoles and thus not uniquely defined. If the intrinsic camera parameters are given, then instead computing the fundamental matrix, of course, it is better to be found the essential matrix. This information makes the second step useless, because the essential matrix E contains the camera calibration parameters. The described method give a solution only for the idealized case of the problem. Which means, that in a real situation, where the images are distorted by a different kinds of noise, the general approach will not be error resistant. Therefore some further methods with better practical results are proposed, for example in the section 3.2.3. 3.2 Computation of the fundamental matrix The importance of the fundamental matrix F estimation is clear from the previous sections. Having this matrix computed give us the possibility to find not only the 3D points from the
  8. 8. 3 RECONSTRUCTING 3D POINTS FROM AN IMAGE PAIR 6 scene but also the camera calibrations. Therefore various computational methods are being invented for its determination. 3.2.1 Normalized eight point algorithm Beginning with the most simple method, which fundamentals were described in the section 2.2. The equation (1) holds for every point pair c and c , which means, that in theory every eight such pairs define an uniquely F up to scaling (because the fundamental matrix has eight degrees of freedom). We assume, that the homogeneous coordinates of the points c and c are, respectively (x, y, 1) and (x , y , 1) ([HZ03]). Then every point pair defines an equation, which solution contains the nine coefficients of the fundamental matrix: x xf11 + x yf12 + x f13 + y xf21 + y yf22 + y f23 + xf31 + yf32 + f33 = 0 (6) But in section 2.2 we mentioned, that we actually need only seven point correspondences. In fact, there is no mistake. We can really compute the fundamental matrix out of seven known point pairs, but in this case, the method is less stable and needs more computational time. Another important issue ist the singularity property of F . Which means, additional infor- mation that det(A) = 0 is given. In other words, if the found F appears to be not singular, we use the Frobenius norm ||F − F || to replace it with the closest singular matrix to F , namely F . Forcing the singularity of F is necessary, because otherwise there could be discrepancies between epipolar lines - they all could not meet in the epipole. The normalized eight point algorithm has been proposed for the first time in [HH81]. It is nothing more than improvement of the already described approach based on eight point cor- respondences. The important part of the normalized eight point algorithm is the cleverer con- struction of the linear equations (6). As pointed out by [HZ03], the normalization consists in translating and scaling the image, in order to organize the reference points around the origin of the coordinates system before solving the linear equations. The following normalization, suggested in [PCF06], for example, is a good solution of the problem: c i = K −1 c , where  w+h w  2 0 2 w+h h KN =  0 2 2 (7) 0 0 1 and h is the height, w is the width of the image. This transformation makes the normalized eight point algorithm showing better performance and stability of the result. Unfortunately, in the reality this idealized situation is very rare, which means, that most often, we have to deal with noisy measurements. For this reason, other statistically more stable algorithms are invented. 3.2.2 Algebraic minimization algorithm The algebraic minimization algorithm is based on the previous simple eight point algorithm for estimating the fundamental matrix. The difference between those two approaches is following: after finding F from the previous eight point algorithm, we try to minimize the algebraic error. The linear system build from the equation (6) for every point pair can be written in the form: Af = 0 (8) where A is the matrix derived from the coordinates of the two corresponding points and f is the vector containing the coefficients of F . The fundamental matrix F could be written as a
  9. 9. 3 RECONSTRUCTING 3D POINTS FROM AN IMAGE PAIR 7 product of any non singular matrix M and e corresponding to homogeneous coordinates of the epipole in one of the images. Decomposing F to f = E m gives us the possibility to present the minimization problem as follow: min ||AE m|| subject to ||E m|| = 1, where E is a 9 × 9 matrix, computed iteratively from e and m contains the coefficients of M . Although iterative, this algorithm, proposed by [HZ03] is effective and simple for imple- mentation. 3.2.3 Gold standard algorithm The gold standard algorithm belongs to the group of algorithms trying to minimize the geomet- ric image distance. It uses as basis some of the previous methods and brings the most important improvement of performing very well in real situations. Usually, the most common type of noise appearing in real measurements is Gaussian. Therefore we should rather use the advance of statistical models, than pursuing exact results. There are two things we have to assume. The first assumption is that we are dealing with erroneous measurements, which in fact describes the real situation. Secondly, we suppose, that the noise in our images has a Gaussian distribution. Under these assumption, our model has been reduced to a minimization problem. That is to say, we can calculate the fundamental matrix, by minimizing the Likelihood function: ˆ ˆ 2 p(ci , ci )2 + p(c i , c i ) = 0 (9) i ˆ The terms p(ci , ci ) and p(c i , c i ) express the probability of observation ci , respectively c i ˆ when in fact the exact (correct) corresponding points are ci and cˆi . ˆ The gold standard algorithm provides the best results from all discussed methods in terms of being stable in those systems distorted by Gaussian noise. In fact, this is the case for almost every reality based model, therefore one can be sure, that using the gold standard algorithm will give back the most accurate results. 3.2.4 Automatic computation of the fundamental matrix If we want to use triangulation method in robotics, there is one very important step, we should not miss. We have already some very useful algorithms for computing the fundamental ma- trix, but this is only one part of the whole measurement process. The robot vision functions on the following principle: given two input images as sensor data and the robot must some- how acquire the knowledge of exact object position. The missing part of this process is an answer of the question: how can a robot detect those point correspondences, which he needs in order to compute the fundamental matrix? An algorithm, able to automatically detect point correspondences should be invented. Meanwhile, there are available a lot of algorithms for extracting key features from images. For example, the Harris detector can be used to find the corners in one image. It is a simple approach, and has the biggest disadvantages of being scaling dependent. However adapting Harris detector to be invariant to affine transformation is not impossible task. Very successful combination of Harris and Laplacian detectors is presented in [MS04]. There are, of course, a great number of algorithms detecting so called ”points of interest”. For example Laplacian and Difference of Gaussian (DoG) detectors work on the principle of finding areas with fast changing color value. They are scale invariant, because they filter the image with Gaussian kernel and this way define regions with structures of interest. Another very interesting approach for detecting key structures in a picture is submitted in the paper [KZB01] and it is called Salient region detector. The main idea of this method is to
  10. 10. 3 RECONSTRUCTING 3D POINTS FROM AN IMAGE PAIR 8 use the local complexity as a measure of saliency. One area can be marked as salient, only if the local attributes in this area show unpredictability over certain set of scales. The procedure consists of three steps. First, the Shannon entropy H(s) is calculated for different scales and in the second step, the optimal scales are selected as the scales with the highest entropy. In the next step, magnitude change of the probability density function W (s) as a function of scale at each peak is calculated and finally the result is formed as product of both: H(s) and W (s) of each circular window with radius s. This method could be further extended in order to become affine invariant. Specially for the needs of stereo problem analysis, an algorithm calculating the so called maximally stable extremal regions (MSER) was developed and suggested in the paper [MCUP02]. On the basis of local binarization of the image, these maximally stable extremal regions are de- tected and an exploration of their properties shows some very positive characteristics. They are invariance to affine transformation, stable and allow multi-scale detection, which means the fine structures are detected, as well as the very large ones. The informal explanation of the MSER concept is following: all pixels of one image are divided according to some varying threshold in two groups. Shifting the threshold from the one end of the intensity scale to the other makes our binary images change. This way we can define our regions of maximum intensity and in- verting the image gives us, respectively the minimum regions. The authors of the paper propose an algorithm running with complexity O n log log n which guarantees fast liner performance with increasing pixel number. 3.3 Image rectification Image rectification is an often used method in computer vision, simplifying the search of match- ing points between the images. The simple idea behind image rectification is to project the both images on another plane, so that they are forced to share a common plane. The benefits of these transformations are significant. If we want to find a matching point c of c then we don’t need to search the whole plane, but only a line of it and this line is however parallel to the x-axis. The implementation of this idea can be done by projecting the both images on another plane, so that their epipolar lines are becoming scanlines of the new image, and they are also parallel to the baseline. [FP03]. An important point to be mentioned, is that the image rectification algorithms are based on the already discussed methods for finding corresponding points. It is an advantage when the underlying point correspondences detector performs automatically. The next steps involve mapping the epipole to a point in infinity and then applying it to the other image, so that it matches the epilpolar line. This algorithm is explained in [HZ03].
  11. 11. 4 TRIANGULATION METHODS 9 4 Triangulation methods In this chapter, we are going to state the problems by triangulation and their solutions. Assuming that the fundamental matrix F and the both camera matrices P and P are given and we can rely on their correctness, the first idea, that come immediately in mind is to back-project the rays from the corresponding image points c and c . The point in 3D space, where these rays will intersect each other, is exactly what we search. At first, this idea seems to work, but in practice, we can never be sure, that the images contain perfect measurements. In the case of noise-distorted image pair, the previously discussed idea will fail, because the back-projected rays won’t intersect in a 3D point at all. One possible solution of this problem, already discussed in section 3.2.3 is to estimate the fundamental matrix and the world point C simultaneously, using the Gold standard algorithm. The second possibility is by obtaining an optimal Maximum Likelihood estimator for the point. In the following section, we are going to discuss the second possibility. For the first one, please refer to section 3.2.3. 4.1 Linear triangulation methods The fact, that two rays calculated from the image points don’t cross at a world point, can be geometrically represented with the statement: c = P C and c = P C are not satisfied for any C. We can remodel and combine these two equation to become one equation of the form: AC = 0, where A is a matrix, derived from the homogeneous coordinates of the points c and c , as well as the columns of the camera matrices p1 , p2 , p3 , p 1 , p 2 , p 3 . As suggested in [HZ03]: xp3 T − p1 T    yp3 T − p2 T  A =x p 3 T − p 1 T   (10) y p 3T − p 2T This way we have a linear system from four equation, in order to find the four homogeneous coordinates (X, Y, Z, 1)T of the world point C. There are two linear methods for finding the best solution for C. The homogeneous method tries to find the solution as the unit singular vector corresponding to the smallest singular value of A. The alternative inhomogeneous method turns the set of equations into a inhomogeneous set of linear equations. All linear methods have the same disadvantage - they are not projective invariant, which means, that objects like c, c , P and P do not remain the same by transformation under the laws of projective geoetry. In other words, there is no such transformation H, for which τ (c, c , P, P ) = H −1 τ (c, c , P H −1 , P H −1 ), where τ () marks the triangulation function. Thus, there are more suitable methods for solving the same problem, discussed in the following sec- tions. 4.2 Minimization of geometric error As we assumed in the previous section, the measured image points c and c don’t satisfy the epipolar constraint, because they are noise distorted. If we mark the corresponding points, ˆ which satisfy the epipolar constraint with c and c , then we can turn the problem into minimiza- ˆ tion problem: ˆ ˆ 2 min d(c, c)2 + d(c , c ) (11)
  12. 12. 4 TRIANGULATION METHODS 10 ˆT ˆ where d(a, b) stays for the Euclidean distance and the constraint c F c = 0 holds. Once we find ˆ the points c and c , the solution for C is easy and can be calculated by any triangulation method. ˆ 4.3 Sampson approximation An alternative to the minimization of the geometric error method is the so called Sampson approximation. Without examining it in small details, we will make an overview of the method. The Sampson correction δc of the world point C is expressed by (x, y, x , y ), where (x, y)T and (x , y )T are the coordinates of the points c, respectively c . Logically C could be presented ˆ as the calculated C from the faulty measurements plus the Sampson correction δc . After some transformations (for details, please refer to [HZ03]), the end result looks like:       xˆ x (F T c )1  y   y   ˆ = − c TFc  (F T c )  2  ˆ  x   x  (F c)2 + (F c)2 + (F T c )2 + (F T c )2  (F c)1   (12) 1 2 1 2 yˆ y (F c)2 where, for example the expression (F T c )1 replaces the polynomial f11 x + f21 y + f31 . The Sampson approximation is accurate only in case the needed correction is very small. Otherwise, there is a more stable algorithm, presented in the next section, which results satisfy the epipolar constraint. 4.4 The optimal solution The optimal algorithm tries to return an accurate result by finding a global minimum in a cost function, similar to the Likelihood function (9) presented in the previous chapter. Using the knowledge that the corresponding point always lies on the corresponding epipolar line, we define the cost function as: 2 d(c, l)2 + d(c , l ) (13) where l and l are corresponding polar lines to the points c , respectively c. With a proper parameterization of the epipolar pencils in the images, the solution of this minimization problem is optimal.
  13. 13. 5 PRACTICAL EXAMPLES OF TRIANGULATION 11 5 Practical Examples of Triangulation 5.1 Triangulation with structured light In all triangulation methods with structured light, one of the two cameras is replaced by light source. Therefore these technics are often referred as active triangulation. In the following sections are presented the most basic principles of triangulation via structured light. There are, of course, a lot of variations and improvements, but the basic idea remains always the same. 5.1.1 Light spot technique The light spot technique is based on a simple construction with a laser ray, object lens and detector, which can be either charge-coupled device (CCD) or position-sensing detector (PSD). As shown in the Figure 2, the laser ray points on the object’s surface and the lens projection of this point plays the role of photodiode. It produces differences in the electric current on the light sensitive area of the PSD. On the basis of this difference, we can measure the exact position of the point on the sensor, respective to calculate the position of its image on the object. Scanning of a surface succeed via sample points. laser Ө p’ PSD q’ h q p measured object Figure 2: The picture visualize how depth information can be gained via light spot technique. This technique has a lot of advantages. The result is fast, accurate and additionally inde- pendent from the surface color. But there is one constraint for this method, namely, the surface must be no ideal mirror, because part of the light should reflect in the direction of the objective. There are also some problems to be solved. For example, if part of the surface is hidden by an- other structure from the same surface, then it is impossible for the laser ray to reach the hidden one.
  14. 14. 5 PRACTICAL EXAMPLES OF TRIANGULATION 12 5.1.2 Stripe projection The main idea of stripe projection is to show how the object’s surface modulate the input signal. In this particular case, the input signal is one laser line. Where the line intersects an object, on the image taken from the camera, we can see displacements in the light stripe proportional to the distance of the object. For this purpose we need to know previously where would the line be projected, if no object was placed in front of the camera. Having this information and knowing how the measured object impacts the light line, we can easily estimate the position of almost all 3D points, lying on the object. Figure 3 illustrates the geometry of this approach: camera laser d measured object h Ө reference surface (a) (b) Figure 3: (a): The camera registers the point displacement, so we can now calculate its position in 3D space (b): Demonstration of the method’s implementation. We know exactly where our point on the reference surface should be projected and the distance to the reference surface r is also previously known. This means, if we manage to find h, then the distance to the object is simple the difference between r and h. Finding h is a simple task with the knowledge of the displacement d and the angle θ on which our laser ray is inclined: d h= (14) tan θ 5.1.3 Projection of a static line pattern One obvious disadvantage of the stripe projection method is, that all objects are scanned line by line, which means slow performance, requiring one image per every single line. In order to make the approach faster, we can project more lines simultaneously. The end result is static line pattern, deformed by the object’s surface. Although this improvement shows better results, it has additional disadvantage. By surfaces with fast changing forms, the run of every single line is very difficult to be followed. Therefore projection of a static line patter should be further ex- tended to encode every single point from the surface uniquely. A lot of methods were developed in order to accomplish this task. Some of them are discussed in the following section.
  15. 15. 5 PRACTICAL EXAMPLES OF TRIANGULATION 13 5.1.4 Projection of encoded patterns In order to encode each surface point uniquely, the static stripe pattern should be extended. This can happen either with adding more colors in the projected pattern, or just taking more pictures of the scene lighted by slightly changing pattern. Figure 4 illustrates one example solution for projected pattern. Stripe patterns with different wavelengths are projected successively, building an unique code for every point on the surface. The same procedure can be repeated with horizontal lines also. This approach, called binary coding via sequence of fringe patterns is very successful but it fails when the stripe pattern needs to be very fine. If high resolution position information required, then a better approach could be projection of phase shifted pattern with the same wavelength. But this is suitable only for really fine structures, therefore most often hybrid methods are used, which means one mixture of the both methods bringing the quality of precisely encoded rough, as well as fine object structures. Figure 4: Active triangulation by projecting stripe patterns on the object’s surface. The different wave- length of the pattern encodes uniquely every point. The images are taken from [J¨ h05]. a The second possibility of encoding one point uniquely is via colored pattern. Certain vari- ation of the pattern is possible, for example they could be differentiates not only by color, but also by width and the pattern itself. Because the projected pattern is previously known, finally the result and the expected pattern are compared. In this way, if some occlusions are present, they could be very easily detected. 5.1.5 Light spot stereo analysis The light spot stereo analysis is a method inspired by the human binocular vision. Two cameras take pictures from the scene. A laser ray is projected over the object’s surface and registered by the camera pair. The disparity between the laser points in the both images functions as basis for computing the distance to that point. This approach is a mixture between active triangulation method and triangulation from stereo vision.
  16. 16. 5 PRACTICAL EXAMPLES OF TRIANGULATION 14 5.2 Triangulation from stereo vision Similar to the human vision system, the triangulation from stereo vision functions with two cameras. The distance between the plane, on which the both cameras are positioned and the 3D point in the space can be easily measured. As shown in Figure 5, the distance vector between the cameras called stereoscopic basis is marked with b. Assuming, that the distance X3 is at least two times greater than the focal length d , we can express the relationship between these three quantities as: d p=b (15) X3 The newly introduced quantity p is referred as parallax or disparity and its geometrical meaning is the offset between the two projections of the world point on the image plane, defined by the parallel optical axes of the camera pair. For more details and derivation of the final equation (15), please refer to [J¨ h05]. a d’ x3 x1 b Figure 5: The graphic represents the view angles of the both cameras and geometrically visualizes how disparity depends on the focal length d of the cameras, stereoscopic basis b and the distance to the object X3 There are some interesting consequences of the equation (15). Firstly, one can conclude, that the disparity p is proportional to the stereoscopic base b. Secondly, the disparity is in- versely proportional to the distance to the measured object X3 . Summarizing this observation, the greater distance to the measured object means loss of accuracy by estimating the depth information and the bigger stereoscopic base invokes on the contrary higher precision.
  17. 17. 6 CONCLUSION 15 6 Conclusion One very interesting issue is the usage of triangulation methods in medicine. Therefore instead of summarizing all written till now, as conclusion, we would like to mention some real examples taken from the medical area. This would perfectly illustrate the importance of such methods in in science as well as in our everyday life. Triangulation principles are used most often in the optical tracking systems. The most popular and widely used system is Polaris R produced by Northern Digital Inc. (also known as NDI). The Polaris R family members (presented in Figure 6) offer passive, active and hybrid tracking and the points, which are needed for the triangulation itself are implemented as markers fixed on the surgical instruments. Figure 6: The picture taken from the NDI webpage [pol] shows two members of Polaris family. Another example of optical tracking systems, used rather for research purposes are the ART R systems produced by Advanced Realtime Tracking GmbH (ART GmbH). The exam- ple system smARTtrack R presented on the Figure 7 consists of two cameras fixed on a rigid bar, so that no calibration is needed. Different configurations are possible, depending on param- eters like focal length, baseline length and angle between the cameras. The ART R trademark is very popular, because it allows the building of multiple cameras systems, for example with three, four and five cameras. Figure 7: The picture from the ART webpage [art] illustrates the smARTtrack R stereo vision system. 3D vision system could be implemented also in endoscopic instruments. For this purpose two very small cameras are embedded in a tube, with relatively small stereoscopic base, which is no problem, because the measured distances by endoscopy are also very limited. Figure 8 a)
  18. 18. 6 CONCLUSION 16 shows how such device looks like. The hole surgical systems, presented on the Figure 8 is called Da Vinci R and consists of a high-resolution 3D endoscope coupled with two 3-chip cameras and a console helping by visualizing the camera records and by repositioning the surgical camera inside the patient. For more technical details about this system, please refer to the producer’s webpage [daV]. (a) (b) Figure 8: (a): Da Vinci R 3D endoscope with two cameras (b): Da Vinci R console helping the surgeon by positioning the instruments in the patient’s body and visualizing the camera records.
  19. 19. REFERENCES 17 References [art] ART Systems homepage. 49.0.html [BB82] BALLARD, Dana H. ; B ROWN, Christopher M.: Computer Vision. 2nd edition. Prentice Hall, 1982 [daV] Da Vinci Surgical System homepage. http://www.intuitivesurgical. com/products/davinci_surgicalsystem/3d.aspx [FP03] F ORSYTH, David A. ; P ONCE, Jean: Computer Vision A Modern Approach. Pren- tice Hall, 2003 [HH81] H.C.L ONGUET-H IGGINS: A Computer Algorithm for Reconstructing a Scene from Two Projections. Nature, 1981 [HZ03] H ARTLEY, Richard ; Z ISSERMAN, Andrew: Multiple View Geometry in Computer Vision. 2nd edition. Cambridge University Press, 2003 [J¨ h05] a ¨ J AHNE, Bernd: Digitale Bildverarbeitung. 6th edition. Springer Verlag, 2005 [KZB01] K ADIR, Timor ; Z ISSERMAN, Andrew ; B RADY, Michael: An affine invariant salient region detector. In: Department of Engineering Science, University of Ox- ford (2001) [MCUP02] M ATAS, J. ; C HUM, O. ; U RBAN, M. ; PAJDLA, T.: Robust Wide Baseline Stereo from Maximally Stable Extremal Regions. In: Center for Machine Perception, Dept. of Cybernetics, CTU Prague, Karlovo (2002) [MS04] M IKOLAJCZYK, Krystian ; S CHMID, Cordelia: Scale and Affine Invariant Inter- est Point Detectors. In: International Journal of Computer Vision 60(1) (2004), January, S. 63–86 [PCF06] PARAGIOS, Nikos ; C HEN, Yunmei ; FAUGERAS, Oliver: Handbook of Mathemat- ical Models in Computer Vision. Springer Verlag, 2006 [pol] NDI homepage. polarisfamily.php [Wol] W OLFRAM R ESEARCH: Wolfram MathWorld. http://mathworld.