1 INTRODUCTION 1
The ability of our brain to percept the relative distance between objects and to particular object
is not a consequence of measuring the exact length. We calculate it with the help of information
collected through our eyes. This approach is widely used in robotics, because all decisions to
be taken are based on knowing the relationships between objects and almost never these could
be measured directly.
One method for computing the location of a point in three dimensional space is by com-
paring two or more images taken from a different points of view. In case of two images, this
method is called triangulation. The human visual perception system is based on this principle,
namely producing two slightly displaced images, which encode information about the positions
in the space.
The next chapter introduces the basic terms of the triangulation. For better a understanding,
the reader should be familiar with the geometry standing behind this concept and the mathe-
matical description of the problem.
2 BASICS 2
In the following section we explain the basic concepts on which the triangulation is based. This
kind of mathematics is often referred to as epipolar geometry. There are some special matrices
involved in the description of relationships between three points in a space, which we are going
to observe in the next few sections. They precisely describe the main principles in stereo vision.
2.1 Epipolar geometry
The epipolar geometry is a subset of projective geometry which helps for searching correspond-
ing points between two images. This kind of geometry is independent of the scene structure,
but only of the used cameras, their internal parameters and relative positions.
The line (also referred to as baseline) connecting the camera centers and intersecting the
image planes in point e , respectively e deﬁnes the epipoles. Another important term is the
epipolar plane π. Three points are needed to deﬁne this plane - an external object point C and
its projections on the images. This statement can be reformulated by saying that the plane is
also deﬁned by the object point and the camera centers, because the projection of the 3D point
lies on the line CA, respectively CB. From Figure 1 it is obvious, that π is not the same for
all three dimensional points, but all epipolar planes contain the baseline. If we know c - the
projection of C on the ﬁrst image and additionally the plane π , then the projection on the other
plane is still ambiguous. That means c is not ﬁxed. However, instead of searching the whole
second image plane, we can reduce the computational time by searching the projected point on
only one line - the epipolar line e c .
A baseline B
Figure 1: The 3D point C is projected on the both images in c and c . The points A and B indicate the
centers of two pinhole cameras. By deﬁnition they are located on the epipolar plane π.
2 BASICS 3
2.2 Fundamental matrix
The fundamental 3 × 3 matrix F represents the connection between one point in the ﬁrst image
and a line in the second. This kind of mapping is very important, because as long as the funda-
mental matrix is computed, we would be able to estimate easily the rest correspondences. As
shown in Figure 1 any possible corresponding point of the projected c belongs to the epipolar
line of the other image. There are different ways for computing F depending on the information
we have. If both cameras are calibrated, which means their intrinsic parameters are given and
furthermore information about the transfer plane π is available, then the correlation between the
points in the image planes is already deﬁned. This correlation is mathematically described by
The most important property of the fundamental Matrix F is that for all pairs of points c
and c the following equation must be valid:
c Fc = 0 (1)
Well-known as a correspondence condition, this equation implies that there is another way
of computing the correspondences between two sets of points. In difference to already dis-
cussed possibility, now we see that the knowledge of the camera matrices P and P is not
necessary. The fundamental matrix can be estimated if we have the coordinates of at least seven
corresponding point pairs [HZ03].
From the correspondence condition another interesting properties of the fundamental matrix
F can be derived. For example the deﬁnition for the epipolar line:
l = Fc (2)
where l is the epipolar line in the second image. We can convert the equation similarly for
the another epipole line, because this relation is transposable. There is no image or camera
prioritization, the both pictures are treated equally.
F is a homogeneous matrix. Therefore it should have eight degree of freedom. Actually it
has seven degree of freedom, because its determinant is null by default. This observation has a
direct connection to the number of point correspondences needed for the computation of F .
Finally we have to mention that assigning one line to a point is unidirectional. Trying to
ﬁnd the corresponding point to a line is meaningless and not possible.
2.3 Camera matrices
The camera matrices P and P describe the projective properties of the cameras. One important
issue, discussed in [HZ03] , is how these matrices relate to the fundamental matrix F . In one
direction, both P and P result in an unique fundamental matrix F , but in the other direction
these camera matrices can be determined from the fundamental matrix F up to a projective
transformation of 3D space. The resulted ambiguity can be solved by adding an additional
constraint to the product P T F P , that is the resulting matrix P T F P should be skew-symmetric.
Skew symmetric matrices (also known as antisymmetric matrices [Wol]) have the form:
0 a12 a13
−a12 0 a23 (3)
−a13 −a23 0
or in other words, matrices satisfying the condition: A = −AT .
2 BASICS 4
As we stated in the previous section: (1) c T F c = 0. Knowing the relations c = P C and
c = P C, we could now prove, that the matrix P T F P equivalent to C T P T F P C should be
2.4 Essential matrix
The matrix already discussed in section 2.2 is a generalization of another matrix called essential
matrix E. The both matrices represent the epipolar constraint deﬁned in the same section, but
in the case of E, we have information about the intrinsic camera parameters, the cameras are
said to be calibrated. The intrinsic camera parameters are for example the focal length of the
camera, image format, principle point and the radial distortion coefﬁcient of the lens. This
additional information reduces the degrees of freedom of the essential matrix to ﬁve: three
degrees of freedom of the rotation matrix R and two degrees of freedom of vector t, where t
is the coordinate vector of the translation AB separating the two cameras’ coordinate systems
(more information available in [FP03]).
The epipolar constraint is satisﬁed also by Essential matrix.
c Ec = 0 (4)
The relation between F and E can be expressed with the following equation:
F =P EP −1 (5)
where P and P are already discussed in section calibration matrices.
If the intrinsic camera parameters are given, then we need to know only ﬁve, and not seven
point correspondences. However, the most difﬁcult part of the triangulation approach is exactly
ﬁnding the corresponding points in the two images.
3 RECONSTRUCTING 3D POINTS FROM AN IMAGE PAIR 5
3 Reconstructing 3D points from an image pair
3.1 General approach
One simple algorithm for reconstruction of a 3D point from an image pair is proposed in
[BB82]. The simple technique involves taking two images of a scene, separated by a base-
line, then identifying the correspondences and applying triangulation rules for deﬁning the two
lines, on which the world point lies. The intersection between these lines give us as result the
values of the 3D point world coordinates
Unfortunately, ﬁnding the corresponding point pairs is not a trivial work. This usually
happens via pattern matching. The main idea is to ﬁnd a correlation between the pixels of the
both images. For this purpose pixel areas from the ﬁrst image are compared to pixel areas from
the second and if a pattern has been found, we compute the disparity (displacement) between
the positions of these patterns in the both images.
The correlation of two images is very expensive operation, which means it requires huge
amount of computational power (the complexity of this operation is O(n2 m2 ) for m × m patch
and n × n pixel image). But the biggest disadvantage of correlation is that some parts of the
3D scene could not be matched properly. For example, when a point exists in the ﬁrst view,
but in the second view it lies hidden behind some object. The bigger the distance between
the camera centers, the higher the possibility of such an error. Otherwise, we can choose a
deﬁnitely smaller distance to place the both cameras, but in this case, the accuracy of the depth
computation decreases also.
Supposing enough point correspondences are found, then the algorithm for determining the
world point proposed by [HZ03] involves the following steps:
• Computing the fundamental matrix F from the point pairs. At least eight corresponding
point pairs are necessary for building a liner system with unknown F . The result of this
linear system will be the coefﬁcients of the fundamental matrix.
• Using F for determining the camera matrices P and P . In case when the both cameras
have the same intrinsic parameters, we simply use the equation P T F P = 0. In practice,
we actually deal with calibrated cameras, which is to say, we have computed the essential
• Reconstructing the three-dimensional point C for every pair of corresponding points c
and c with the help of both equations: c = P C and c = P C given in section 2.4. The
special case with world point C laying on the baseline, can not be calculated, because all
points on the baseline are projected in the epipoles and thus not uniquely deﬁned.
If the intrinsic camera parameters are given, then instead computing the fundamental matrix,
of course, it is better to be found the essential matrix. This information makes the second step
useless, because the essential matrix E contains the camera calibration parameters.
The described method give a solution only for the idealized case of the problem. Which
means, that in a real situation, where the images are distorted by a different kinds of noise,
the general approach will not be error resistant. Therefore some further methods with better
practical results are proposed, for example in the section 3.2.3.
3.2 Computation of the fundamental matrix
The importance of the fundamental matrix F estimation is clear from the previous sections.
Having this matrix computed give us the possibility to ﬁnd not only the 3D points from the
3 RECONSTRUCTING 3D POINTS FROM AN IMAGE PAIR 6
scene but also the camera calibrations. Therefore various computational methods are being
invented for its determination.
3.2.1 Normalized eight point algorithm
Beginning with the most simple method, which fundamentals were described in the section
2.2. The equation (1) holds for every point pair c and c , which means, that in theory every
eight such pairs deﬁne an uniquely F up to scaling (because the fundamental matrix has eight
degrees of freedom). We assume, that the homogeneous coordinates of the points c and c are,
respectively (x, y, 1) and (x , y , 1) ([HZ03]). Then every point pair deﬁnes an equation, which
solution contains the nine coefﬁcients of the fundamental matrix:
x xf11 + x yf12 + x f13 + y xf21 + y yf22 + y f23 + xf31 + yf32 + f33 = 0 (6)
But in section 2.2 we mentioned, that we actually need only seven point correspondences.
In fact, there is no mistake. We can really compute the fundamental matrix out of seven known
point pairs, but in this case, the method is less stable and needs more computational time.
Another important issue ist the singularity property of F . Which means, additional infor-
mation that det(A) = 0 is given. In other words, if the found F appears to be not singular, we
use the Frobenius norm ||F − F || to replace it with the closest singular matrix to F , namely
F . Forcing the singularity of F is necessary, because otherwise there could be discrepancies
between epipolar lines - they all could not meet in the epipole.
The normalized eight point algorithm has been proposed for the ﬁrst time in [HH81]. It is
nothing more than improvement of the already described approach based on eight point cor-
respondences. The important part of the normalized eight point algorithm is the cleverer con-
struction of the linear equations (6). As pointed out by [HZ03], the normalization consists in
translating and scaling the image, in order to organize the reference points around the origin
of the coordinates system before solving the linear equations. The following normalization,
suggested in [PCF06], for example, is a good solution of the problem: c i = K −1 c , where
KN = 0 2 2
0 0 1
and h is the height, w is the width of the image. This transformation makes the normalized
eight point algorithm showing better performance and stability of the result.
Unfortunately, in the reality this idealized situation is very rare, which means, that most
often, we have to deal with noisy measurements. For this reason, other statistically more stable
algorithms are invented.
3.2.2 Algebraic minimization algorithm
The algebraic minimization algorithm is based on the previous simple eight point algorithm for
estimating the fundamental matrix. The difference between those two approaches is following:
after ﬁnding F from the previous eight point algorithm, we try to minimize the algebraic error.
The linear system build from the equation (6) for every point pair can be written in the form:
Af = 0 (8)
where A is the matrix derived from the coordinates of the two corresponding points and f is
the vector containing the coefﬁcients of F . The fundamental matrix F could be written as a
3 RECONSTRUCTING 3D POINTS FROM AN IMAGE PAIR 7
product of any non singular matrix M and e corresponding to homogeneous coordinates of the
epipole in one of the images. Decomposing F to f = E m gives us the possibility to present
the minimization problem as follow: min ||AE m|| subject to ||E m|| = 1, where E is a 9 × 9
matrix, computed iteratively from e and m contains the coefﬁcients of M .
Although iterative, this algorithm, proposed by [HZ03] is effective and simple for imple-
3.2.3 Gold standard algorithm
The gold standard algorithm belongs to the group of algorithms trying to minimize the geomet-
ric image distance. It uses as basis some of the previous methods and brings the most important
improvement of performing very well in real situations. Usually, the most common type of
noise appearing in real measurements is Gaussian. Therefore we should rather use the advance
of statistical models, than pursuing exact results. There are two things we have to assume. The
ﬁrst assumption is that we are dealing with erroneous measurements, which in fact describes the
real situation. Secondly, we suppose, that the noise in our images has a Gaussian distribution.
Under these assumption, our model has been reduced to a minimization problem. That is to say,
we can calculate the fundamental matrix, by minimizing the Likelihood function:
ˆ ˆ 2
p(ci , ci )2 + p(c i , c i ) = 0 (9)
The terms p(ci , ci ) and p(c i , c i ) express the probability of observation ci , respectively c i
when in fact the exact (correct) corresponding points are ci and cˆi .
The gold standard algorithm provides the best results from all discussed methods in terms
of being stable in those systems distorted by Gaussian noise. In fact, this is the case for almost
every reality based model, therefore one can be sure, that using the gold standard algorithm will
give back the most accurate results.
3.2.4 Automatic computation of the fundamental matrix
If we want to use triangulation method in robotics, there is one very important step, we should
not miss. We have already some very useful algorithms for computing the fundamental ma-
trix, but this is only one part of the whole measurement process. The robot vision functions
on the following principle: given two input images as sensor data and the robot must some-
how acquire the knowledge of exact object position. The missing part of this process is an
answer of the question: how can a robot detect those point correspondences, which he needs
in order to compute the fundamental matrix? An algorithm, able to automatically detect point
correspondences should be invented.
Meanwhile, there are available a lot of algorithms for extracting key features from images.
For example, the Harris detector can be used to ﬁnd the corners in one image. It is a simple
approach, and has the biggest disadvantages of being scaling dependent. However adapting
Harris detector to be invariant to afﬁne transformation is not impossible task. Very successful
combination of Harris and Laplacian detectors is presented in [MS04]. There are, of course,
a great number of algorithms detecting so called ”points of interest”. For example Laplacian
and Difference of Gaussian (DoG) detectors work on the principle of ﬁnding areas with fast
changing color value. They are scale invariant, because they ﬁlter the image with Gaussian
kernel and this way deﬁne regions with structures of interest.
Another very interesting approach for detecting key structures in a picture is submitted in
the paper [KZB01] and it is called Salient region detector. The main idea of this method is to
3 RECONSTRUCTING 3D POINTS FROM AN IMAGE PAIR 8
use the local complexity as a measure of saliency. One area can be marked as salient, only if
the local attributes in this area show unpredictability over certain set of scales. The procedure
consists of three steps. First, the Shannon entropy H(s) is calculated for different scales and
in the second step, the optimal scales are selected as the scales with the highest entropy. In the
next step, magnitude change of the probability density function W (s) as a function of scale at
each peak is calculated and ﬁnally the result is formed as product of both: H(s) and W (s) of
each circular window with radius s. This method could be further extended in order to become
Specially for the needs of stereo problem analysis, an algorithm calculating the so called
maximally stable extremal regions (MSER) was developed and suggested in the paper [MCUP02].
On the basis of local binarization of the image, these maximally stable extremal regions are de-
tected and an exploration of their properties shows some very positive characteristics. They are
invariance to afﬁne transformation, stable and allow multi-scale detection, which means the ﬁne
structures are detected, as well as the very large ones. The informal explanation of the MSER
concept is following: all pixels of one image are divided according to some varying threshold
in two groups. Shifting the threshold from the one end of the intensity scale to the other makes
our binary images change. This way we can deﬁne our regions of maximum intensity and in-
verting the image gives us, respectively the minimum regions. The authors of the paper propose
an algorithm running with complexity O n log log n which guarantees fast liner performance
with increasing pixel number.
3.3 Image rectiﬁcation
Image rectiﬁcation is an often used method in computer vision, simplifying the search of match-
ing points between the images. The simple idea behind image rectiﬁcation is to project the both
images on another plane, so that they are forced to share a common plane. The beneﬁts of these
transformations are signiﬁcant. If we want to ﬁnd a matching point c of c then we don’t need to
search the whole plane, but only a line of it and this line is however parallel to the x-axis. The
implementation of this idea can be done by projecting the both images on another plane, so that
their epipolar lines are becoming scanlines of the new image, and they are also parallel to the
An important point to be mentioned, is that the image rectiﬁcation algorithms are based
on the already discussed methods for ﬁnding corresponding points. It is an advantage when
the underlying point correspondences detector performs automatically. The next steps involve
mapping the epipole to a point in inﬁnity and then applying it to the other image, so that it
matches the epilpolar line. This algorithm is explained in [HZ03].
4 TRIANGULATION METHODS 9
4 Triangulation methods
In this chapter, we are going to state the problems by triangulation and their solutions. Assuming
that the fundamental matrix F and the both camera matrices P and P are given and we can
rely on their correctness, the ﬁrst idea, that come immediately in mind is to back-project the
rays from the corresponding image points c and c . The point in 3D space, where these rays
will intersect each other, is exactly what we search. At ﬁrst, this idea seems to work, but in
practice, we can never be sure, that the images contain perfect measurements. In the case of
noise-distorted image pair, the previously discussed idea will fail, because the back-projected
rays won’t intersect in a 3D point at all.
One possible solution of this problem, already discussed in section 3.2.3 is to estimate the
fundamental matrix and the world point C simultaneously, using the Gold standard algorithm.
The second possibility is by obtaining an optimal Maximum Likelihood estimator for the point.
In the following section, we are going to discuss the second possibility. For the ﬁrst one, please
refer to section 3.2.3.
4.1 Linear triangulation methods
The fact, that two rays calculated from the image points don’t cross at a world point, can be
geometrically represented with the statement: c = P C and c = P C are not satisﬁed for any
C. We can remodel and combine these two equation to become one equation of the form:
AC = 0, where A is a matrix, derived from the homogeneous coordinates of the points c and c ,
as well as the columns of the camera matrices p1 , p2 , p3 , p 1 , p 2 , p 3 . As suggested in [HZ03]:
xp3 T − p1 T
yp3 T − p2 T
A =x p 3 T − p 1 T
y p 3T − p 2T
This way we have a linear system from four equation, in order to ﬁnd the four homogeneous
coordinates (X, Y, Z, 1)T of the world point C.
There are two linear methods for ﬁnding the best solution for C. The homogeneous method
tries to ﬁnd the solution as the unit singular vector corresponding to the smallest singular value
of A. The alternative inhomogeneous method turns the set of equations into a inhomogeneous
set of linear equations.
All linear methods have the same disadvantage - they are not projective invariant, which
means, that objects like c, c , P and P do not remain the same by transformation under the
laws of projective geoetry. In other words, there is no such transformation H, for which
τ (c, c , P, P ) = H −1 τ (c, c , P H −1 , P H −1 ), where τ () marks the triangulation function. Thus,
there are more suitable methods for solving the same problem, discussed in the following sec-
4.2 Minimization of geometric error
As we assumed in the previous section, the measured image points c and c don’t satisfy the
epipolar constraint, because they are noise distorted. If we mark the corresponding points,
which satisfy the epipolar constraint with c and c , then we can turn the problem into minimiza-
ˆ ˆ 2
min d(c, c)2 + d(c , c ) (11)
4 TRIANGULATION METHODS 10
where d(a, b) stays for the Euclidean distance and the constraint c F c = 0 holds. Once we ﬁnd
the points c and c , the solution for C is easy and can be calculated by any triangulation method.
4.3 Sampson approximation
An alternative to the minimization of the geometric error method is the so called Sampson
approximation. Without examining it in small details, we will make an overview of the method.
The Sampson correction δc of the world point C is expressed by (x, y, x , y ), where (x, y)T
and (x , y )T are the coordinates of the points c, respectively c . Logically C could be presented
as the calculated C from the faulty measurements plus the Sampson correction δc . After some
transformations (for details, please refer to [HZ03]), the end result looks like:
xˆ x (F T c )1
ˆ = − c TFc (F T c )
x x (F c)2 + (F c)2 + (F T c )2 + (F T c )2 (F c)1
1 2 1 2
yˆ y (F c)2
where, for example the expression (F T c )1 replaces the polynomial f11 x + f21 y + f31 . The
Sampson approximation is accurate only in case the needed correction is very small. Otherwise,
there is a more stable algorithm, presented in the next section, which results satisfy the epipolar
4.4 The optimal solution
The optimal algorithm tries to return an accurate result by ﬁnding a global minimum in a cost
function, similar to the Likelihood function (9) presented in the previous chapter. Using the
knowledge that the corresponding point always lies on the corresponding epipolar line, we
deﬁne the cost function as:
d(c, l)2 + d(c , l ) (13)
where l and l are corresponding polar lines to the points c , respectively c. With a proper
parameterization of the epipolar pencils in the images, the solution of this minimization problem
5 PRACTICAL EXAMPLES OF TRIANGULATION 11
5 Practical Examples of Triangulation
5.1 Triangulation with structured light
In all triangulation methods with structured light, one of the two cameras is replaced by light
source. Therefore these technics are often referred as active triangulation. In the following
sections are presented the most basic principles of triangulation via structured light. There are,
of course, a lot of variations and improvements, but the basic idea remains always the same.
5.1.1 Light spot technique
The light spot technique is based on a simple construction with a laser ray, object lens and
detector, which can be either charge-coupled device (CCD) or position-sensing detector (PSD).
As shown in the Figure 2, the laser ray points on the object’s surface and the lens projection of
this point plays the role of photodiode. It produces differences in the electric current on the light
sensitive area of the PSD. On the basis of this difference, we can measure the exact position of
the point on the sensor, respective to calculate the position of its image on the object. Scanning
of a surface succeed via sample points.
Figure 2: The picture visualize how depth information can be gained via light spot technique.
This technique has a lot of advantages. The result is fast, accurate and additionally inde-
pendent from the surface color. But there is one constraint for this method, namely, the surface
must be no ideal mirror, because part of the light should reﬂect in the direction of the objective.
There are also some problems to be solved. For example, if part of the surface is hidden by an-
other structure from the same surface, then it is impossible for the laser ray to reach the hidden
5 PRACTICAL EXAMPLES OF TRIANGULATION 12
5.1.2 Stripe projection
The main idea of stripe projection is to show how the object’s surface modulate the input signal.
In this particular case, the input signal is one laser line. Where the line intersects an object, on
the image taken from the camera, we can see displacements in the light stripe proportional to
the distance of the object. For this purpose we need to know previously where would the line be
projected, if no object was placed in front of the camera. Having this information and knowing
how the measured object impacts the light line, we can easily estimate the position of almost all
3D points, lying on the object. Figure 3 illustrates the geometry of this approach:
Figure 3: (a): The camera registers the point displacement, so we can now calculate its position in 3D
space (b): Demonstration of the method’s implementation.
We know exactly where our point on the reference surface should be projected and the
distance to the reference surface r is also previously known. This means, if we manage to ﬁnd
h, then the distance to the object is simple the difference between r and h. Finding h is a simple
task with the knowledge of the displacement d and the angle θ on which our laser ray is inclined:
5.1.3 Projection of a static line pattern
One obvious disadvantage of the stripe projection method is, that all objects are scanned line
by line, which means slow performance, requiring one image per every single line. In order to
make the approach faster, we can project more lines simultaneously. The end result is static line
pattern, deformed by the object’s surface. Although this improvement shows better results, it
has additional disadvantage. By surfaces with fast changing forms, the run of every single line
is very difﬁcult to be followed. Therefore projection of a static line patter should be further ex-
tended to encode every single point from the surface uniquely. A lot of methods were developed
in order to accomplish this task. Some of them are discussed in the following section.
5 PRACTICAL EXAMPLES OF TRIANGULATION 13
5.1.4 Projection of encoded patterns
In order to encode each surface point uniquely, the static stripe pattern should be extended. This
can happen either with adding more colors in the projected pattern, or just taking more pictures
of the scene lighted by slightly changing pattern.
Figure 4 illustrates one example solution for projected pattern. Stripe patterns with different
wavelengths are projected successively, building an unique code for every point on the surface.
The same procedure can be repeated with horizontal lines also. This approach, called binary
coding via sequence of fringe patterns is very successful but it fails when the stripe pattern needs
to be very ﬁne. If high resolution position information required, then a better approach could be
projection of phase shifted pattern with the same wavelength. But this is suitable only for really
ﬁne structures, therefore most often hybrid methods are used, which means one mixture of the
both methods bringing the quality of precisely encoded rough, as well as ﬁne object structures.
Figure 4: Active triangulation by projecting stripe patterns on the object’s surface. The different wave-
length of the pattern encodes uniquely every point. The images are taken from [J¨ h05].
The second possibility of encoding one point uniquely is via colored pattern. Certain vari-
ation of the pattern is possible, for example they could be differentiates not only by color, but
also by width and the pattern itself. Because the projected pattern is previously known, ﬁnally
the result and the expected pattern are compared. In this way, if some occlusions are present,
they could be very easily detected.
5.1.5 Light spot stereo analysis
The light spot stereo analysis is a method inspired by the human binocular vision. Two cameras
take pictures from the scene. A laser ray is projected over the object’s surface and registered by
the camera pair. The disparity between the laser points in the both images functions as basis for
computing the distance to that point. This approach is a mixture between active triangulation
method and triangulation from stereo vision.
5 PRACTICAL EXAMPLES OF TRIANGULATION 14
5.2 Triangulation from stereo vision
Similar to the human vision system, the triangulation from stereo vision functions with two
cameras. The distance between the plane, on which the both cameras are positioned and the 3D
point in the space can be easily measured. As shown in Figure 5, the distance vector between
the cameras called stereoscopic basis is marked with b. Assuming, that the distance X3 is at
least two times greater than the focal length d , we can express the relationship between these
three quantities as:
The newly introduced quantity p is referred as parallax or disparity and its geometrical meaning
is the offset between the two projections of the world point on the image plane, deﬁned by the
parallel optical axes of the camera pair. For more details and derivation of the ﬁnal equation
(15), please refer to [J¨ h05].
d’ x3 x1
Figure 5: The graphic represents the view angles of the both cameras and geometrically visualizes how
disparity depends on the focal length d of the cameras, stereoscopic basis b and the distance to the object
There are some interesting consequences of the equation (15). Firstly, one can conclude,
that the disparity p is proportional to the stereoscopic base b. Secondly, the disparity is in-
versely proportional to the distance to the measured object X3 . Summarizing this observation,
the greater distance to the measured object means loss of accuracy by estimating the depth
information and the bigger stereoscopic base invokes on the contrary higher precision.
6 CONCLUSION 15
One very interesting issue is the usage of triangulation methods in medicine. Therefore instead
of summarizing all written till now, as conclusion, we would like to mention some real examples
taken from the medical area. This would perfectly illustrate the importance of such methods in
in science as well as in our everyday life.
Triangulation principles are used most often in the optical tracking systems. The most
popular and widely used system is Polaris R produced by Northern Digital Inc. (also known
as NDI). The Polaris R family members (presented in Figure 6) offer passive, active and hybrid
tracking and the points, which are needed for the triangulation itself are implemented as markers
ﬁxed on the surgical instruments.
Figure 6: The picture taken from the NDI webpage [pol] shows two members of Polaris family.
Another example of optical tracking systems, used rather for research purposes are the
ART R systems produced by Advanced Realtime Tracking GmbH (ART GmbH). The exam-
ple system smARTtrack R presented on the Figure 7 consists of two cameras ﬁxed on a rigid
bar, so that no calibration is needed. Different conﬁgurations are possible, depending on param-
eters like focal length, baseline length and angle between the cameras. The ART R trademark
is very popular, because it allows the building of multiple cameras systems, for example with
three, four and ﬁve cameras.
Figure 7: The picture from the ART webpage [art] illustrates the smARTtrack R stereo vision system.
3D vision system could be implemented also in endoscopic instruments. For this purpose
two very small cameras are embedded in a tube, with relatively small stereoscopic base, which
is no problem, because the measured distances by endoscopy are also very limited. Figure 8 a)
6 CONCLUSION 16
shows how such device looks like. The hole surgical systems, presented on the Figure 8 is called
Da Vinci R and consists of a high-resolution 3D endoscope coupled with two 3-chip cameras and
a console helping by visualizing the camera records and by repositioning the surgical camera
inside the patient. For more technical details about this system, please refer to the producer’s
Figure 8: (a): Da Vinci R 3D endoscope with two cameras (b): Da Vinci R console helping the surgeon
by positioning the instruments in the patient’s body and visualizing the camera records.
[art] ART Systems homepage. http://www.ar-tracking.de/smARTtrack.
[BB82] BALLARD, Dana H. ; B ROWN, Christopher M.: Computer Vision. 2nd edition.
Prentice Hall, 1982
[daV] Da Vinci Surgical System homepage. http://www.intuitivesurgical.
[FP03] F ORSYTH, David A. ; P ONCE, Jean: Computer Vision A Modern Approach. Pren-
tice Hall, 2003
[HH81] H.C.L ONGUET-H IGGINS: A Computer Algorithm for Reconstructing a Scene from
Two Projections. Nature, 1981
[HZ03] H ARTLEY, Richard ; Z ISSERMAN, Andrew: Multiple View Geometry in Computer
Vision. 2nd edition. Cambridge University Press, 2003
J AHNE, Bernd: Digitale Bildverarbeitung. 6th edition. Springer Verlag, 2005
[KZB01] K ADIR, Timor ; Z ISSERMAN, Andrew ; B RADY, Michael: An afﬁne invariant
salient region detector. In: Department of Engineering Science, University of Ox-
[MCUP02] M ATAS, J. ; C HUM, O. ; U RBAN, M. ; PAJDLA, T.: Robust Wide Baseline Stereo
from Maximally Stable Extremal Regions. In: Center for Machine Perception,
Dept. of Cybernetics, CTU Prague, Karlovo (2002)
[MS04] M IKOLAJCZYK, Krystian ; S CHMID, Cordelia: Scale and Afﬁne Invariant Inter-
est Point Detectors. In: International Journal of Computer Vision 60(1) (2004),
January, S. 63–86
[PCF06] PARAGIOS, Nikos ; C HEN, Yunmei ; FAUGERAS, Oliver: Handbook of Mathemat-
ical Models in Computer Vision. Springer Verlag, 2006
[pol] NDI homepage. http://www.ndigital.com/medical/
[Wol] W OLFRAM R ESEARCH: Wolfram MathWorld. http://mathworld.